DeutschLernen/docs/features/ai-services.md
Lasse Rune Hansen 76e8af4987 Add complete solution: documentation, frontend, and project files
- Add comprehensive documentation in docs/ (architecture, features, roadmap)
- Add german-app-frontend with Vite, TypeScript, ESLint configuration
- Add AGENTS.md and .gitignore

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-05-31 18:20:53 +02:00

31 KiB

Feature: AI Services Integration

Status: Planned
Priority: High
Complexity: High
Estimate: 10-16 hours
Assignee: -
Created: May 31, 2025
Target Completion: -
PR: -
Related Features: Story Integration, Vocabulary System, Quiz System, Lesson Management


📌 Overview

Purpose

Integrate three AI services into the application: Mistral-Medium for text generation (stories, feedback), Vosk for speech recognition (speaking exercises), and Coqui TTS for text-to-speech (vocabulary, stories, quizzes).

User Story

As a learner, I want AI-powered features like generated stories, speech recognition for speaking practice, and TTS for audio content so that I can have an immersive and interactive learning experience.

Acceptance Criteria

  • Mistral-Medium API is integrated for story generation
  • Mistral-Medium API is integrated for writing feedback
  • Vosk speech recognition is integrated for speaking exercises
  • Coqui TTS is integrated for audio generation
  • All AI services are configurable via appsettings.json
  • Error handling for AI service failures
  • Rate limiting/caching for AI API calls

📋 Requirements

Functional Requirements

ID Requirement Priority
FR-001 Generate stories using Mistral-Medium High
FR-002 Generate writing feedback using Mistral-Medium High
FR-003 Transcribe speech using Vosk High
FR-004 Generate audio using Coqui TTS High
FR-005 Configure all services via configuration High
FR-006 Handle AI service errors gracefully High
FR-007 Cache/rate limit AI API calls Medium
FR-008 Validate AI outputs before use Medium

Non-Functional Requirements

  • Performance: TTS generation < 2 seconds per sentence
  • Performance: Speech recognition < 3 seconds
  • Performance: AI API calls < 5 seconds
  • Reliability: Services should degrade gracefully on failure
  • Cost: Minimize API call costs (caching, batching)

🏗️ Technical Design

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    AI Services Layer                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  Mistral-Medium  │  │      Vosk       │  │    Coqui TTS    │  │
│  │   (Text Gen)    │  │ (Speech Recog.) │  │   (Audio Gen)   │  │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘  │
│           │                     │                    │         │
│           ▼                     ▼                    ▼         │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │              Application Services                         │  │
│  │  - StoryGenerationService                                │  │
│  │  - WritingFeedbackService                                 │  │
│  │  - VoskService (Speech Recognition)                      │  │
│  │  - TtsService (Text-to-Speech)                           │  │
│  └─────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Components Involved

  • Backend Services:
    • IMistralService / MistralService - Text generation
    • IVoskService / VoskService - Speech recognition
    • ITtsService / TtsService - Text-to-speech
  • Configuration: appsettings.json with AI settings
  • External Dependencies:
    • Mistral-Medium API
    • Vosk Python library + German model
    • Coqui TTS Python library + German model

Data Flow

Story Generation Flow

1. StoryGenerationService receives request with vocabulary list and level
2. Service constructs prompt for Mistral-Medium
3. MistralService sends prompt to Mistral API
4. Mistral API returns generated story text
5. StoryGenerationService validates and returns story
6. StoryService saves story and triggers audio generation

Speech Recognition Flow

1. User records speech in frontend
2. Frontend sends audio file to /api/speech/recognize
3. VoskService receives audio bytes
4. VoskService calls Vosk Python CLI with German model
5. Vosk returns transcribed text
6. Backend validates transcription and returns to frontend

TTS Flow

1. TtsService receives text to synthesize
2. Service calls Coqui TTS Python CLI
3. Coqui generates audio file
4. Audio file saved to filesystem
5. Audio URL returned to caller

🚀 Implementation Plan

Phase 1: Configuration & Interfaces (2 hours)

  • Add AI configuration section to appsettings.json
  • Create configuration classes (MistralConfig, VoskConfig, CoquiConfig)
  • Define service interfaces (IMistralService, IVoskService, ITtsService)
  • Register services in Program.cs
  • Set up configuration validation

Phase 2: Mistral-Medium Integration (2-3 hours)

  • Create MistralService implementation
  • Implement Mistral API client
  • Create request/response models
  • Implement retry logic for API calls
  • Add rate limiting (e.g., max 10 requests/minute)
  • Add response caching for similar prompts
  • Create prompt templates for different use cases

Phase 3: Vosk Speech Recognition (2-3 hours)

  • Create VoskService implementation
  • Set up Vosk Python environment
  • Download and configure German model (vosk-model-de-0.22)
  • Implement audio processing
  • Handle different audio formats
  • Add error handling for recognition failures
  • Create /api/speech/recognize endpoint

Phase 4: Coqui TTS Integration (2-3 hours)

  • Create TtsService implementation
  • Set up Coqui TTS Python environment
  • Download and configure German model
  • Implement audio generation
  • Add audio file management (storage, cleanup)
  • Create audio serving endpoints
  • Implement batch audio generation

Phase 5: Service Integration (2 hours)

  • Create StoryGenerationService (uses MistralService)
  • Create WritingFeedbackService (uses MistralService)
  • Create SpeechExerciseService (uses VoskService)
  • Create AudioGenerationService (uses TtsService)
  • Add health checks for all AI services
  • Implement fallback mechanisms for service failures

Milestones

Milestone Date Status
Configuration & Interfaces -
Mistral Integration -
Vosk Integration -
Coqui TTS Integration -
Service Integration -

Tasks

Backend - Configuration

  • Add Mistral settings to appsettings.json
  • Add Vosk settings to appsettings.json
  • Add Coqui settings to appsettings.json
  • Create Configuration/MistralConfig.cs
  • Create Configuration/VoskConfig.cs
  • Create Configuration/CoquiConfig.cs
  • Register all AI services in Program.cs
  • Add health checks for AI services

Backend - Mistral Service

  • Create Domain/Interfaces/IMistralService.cs
  • Create Infrastructure/Services/MistralService.cs
  • Implement Mistral API client
  • Create Models/MistralRequest.cs
  • Create Models/MistralResponse.cs
  • Add retry logic
  • Add rate limiting
  • Add response caching
  • Write unit tests

Backend - Vosk Service

  • Create Domain/Interfaces/IVoskService.cs
  • Create Infrastructure/Services/VoskService.cs
  • Set up Python process execution
  • Download and configure vosk-model-de-0.22
  • Implement audio recognition
  • Create /api/speech/recognize endpoint
  • Create Presentation/Controllers/SpeechController.cs
  • Write unit tests

Backend - Coqui TTS Service

  • Create Domain/Interfaces/ITtsService.cs
  • Create Infrastructure/Services/TtsService.cs
  • Set up Python process execution
  • Download and configure Coqui German model
  • Implement audio generation
  • Create audio file storage mechanism
  • Create /api/tts/generate endpoint
  • Create Presentation/Controllers/TtsController.cs
  • Write unit tests

Backend - Higher-Level Services

  • Create Application/Services/StoryGenerationService.cs
  • Create Application/Services/WritingFeedbackService.cs
  • Integrate with MistralService
  • Add validation for AI outputs
  • Write integration tests

Infrastructure Setup

  • Install Python 3.8+
  • Install Vosk Python package
  • Download vosk-model-de-0.22
  • Install Coqui TTS package
  • Download Coqui German model
  • Set up file storage for audio
  • Configure permissions

Frontend Integration

  • Create services/speechService.ts
  • Create services/ttsService.ts
  • Create services/aiService.ts
  • Integrate with Recorder component
  • Integrate with AudioPlayer component
  • Add error handling for AI failures

Definition of Done

General Criteria (All Features)

  • All acceptance criteria met and verified
  • All tasks in this document completed
  • Code follows Clean Architecture principles
  • Code reviewed and approved by at least 1 team member
  • All tests passing (unit, integration)
  • Documentation updated (README, AGENTS.md if applicable)
  • Feature works in development environment
  • Feature deployed to staging environment
  • Performance meets defined targets
  • Security review completed
  • No critical bugs or blockers

AI-Specific Criteria

  • All AI services functional in development
  • Mistral API integration tested with valid API key
  • Vosk speech recognition tested with German model
  • Coqui TTS tested with German model
  • Error handling tested (invalid inputs, service failures)
  • Fallback mechanisms implemented and tested
  • Rate limiting configured and tested
  • Audio file generation and storage verified
  • Health checks for all AI services passing

🧪 Testing Strategy

Testing Approach

Test Type Coverage Tools Responsibility
Unit Tests 80%+ code coverage MsTest, Moq Backend Dev
Integration Tests All service interactions MsTest, TestContainers Backend Dev
API Tests All endpoints MsTest, HttpClient Backend Dev
Frontend Unit Tests Component logic Vitest Frontend Dev
Frontend Integration Service integration Vitest Frontend Dev
E2E Tests Critical user journeys Playwright QA/Dev
Manual Testing Exploratory, edge cases BrowserStack QA
Load Testing AI service performance k6/JMeter DevOps

AI-Specific Tests

Mistral Service Tests

  • Test successful text generation
  • Test API error handling (429, 500, 503)
  • Test rate limiting (max requests per minute)
  • Test response caching
  • Test retry logic on failures
  • Test timeout handling
  • Test invalid API key handling

Vosk Service Tests

  • Test successful speech recognition (clear audio)
  • Test speech recognition with background noise
  • Test speech recognition with different accents
  • Test empty audio handling
  • Test invalid audio format handling
  • Test Python process failure handling
  • Test model not found error handling
  • Test confidence threshold validation

Coqui TTS Service Tests

  • Test successful audio generation
  • Test audio generation with long text
  • Test audio generation with special characters
  • Test invalid text handling
  • Test Python process failure handling
  • Test model not found error handling
  • Test audio file format validation
  • Test audio quality validation

Test Data

  • Sample audio files for Vosk testing (clear German speech, noisy audio, non-German speech)
  • Sample texts for TTS testing (short, long, with special characters, with German umlauts)
  • Sample prompts for Mistral testing (A1, A2, B1 levels)

🚨 Risks & Mitigations

Technical Risks

Risk Likelihood Impact Mitigation Owner
Python-.NET integration failures High High Use Process class with proper error handling, implement process pooling, add timeouts Backend Dev
Vosk model compatibility issues Medium High Test with vosk-model-de-0.22 before implementation, have fallback to vosk-model-small-de-0.15 Backend Dev
Coqui model quality issues Medium Medium Test with sample German text, have alternative TTS service as fallback Backend Dev
Mistral API rate limits High Medium Implement caching (1h TTL), request queue, exponential backoff Backend Dev
Mistral API costs exceed budget Medium High Set budget alerts, implement cost tracking, cache aggressively Backend Dev
AI services slow performance High Medium Implement async processing, use background jobs for batch operations Backend Dev
Audio files too large Medium Medium Compress audio (16kHz, mono), implement streaming for large files Backend Dev
Model files too large for deployment Medium Medium Use Docker volumes, separate storage for models, consider cloud storage DevOps
Memory leaks in Python processes Medium High Implement process lifecycle management, add memory monitoring, use process pooling Backend Dev
Different Python versions cause issues Medium Medium Use Docker to pin Python version, document exact version in README DevOps

Operational Risks

Risk Likelihood Impact Mitigation Owner
AI service downtime Medium High Implement health checks, circuit breakers, fallback responses DevOps
Model files corrupted Low High Implement checksum validation, store backups, automated recovery DevOps
API key exposure Medium High Use GitHub secrets, Azure Key Vault, never commit to repo Security
Audio storage fills up Medium Medium Implement cleanup job, set size quotas, use cloud storage DevOps

Business Risks

Risk Likelihood Impact Mitigation Owner
User data privacy concerns Medium High Anonymize audio before processing, document data handling policy, comply with GDPR Legal
AI generates inappropriate content Low High Implement content moderation, add user reporting, use system prompts to prevent Backend Dev
AI services become too expensive Medium Medium Monitor costs, set budget caps, evaluate open-source alternatives Product

🔗 Dependencies

Feature Dependencies

Technical Dependencies

  • Python 3.8+
  • Vosk Python library
  • vosk-model-de-0.22 (German model)
  • Coqui TTS Python library
  • Coqui German TTS model
  • Mistral-Medium API key

External Services

Service Purpose Configuration
Mistral-Medium API Text generation (stories, feedback) API key, endpoint URL
Vosk Speech recognition Python path, model path
Coqui TTS Text-to-speech Python path, model name

Blockers

  • Infrastructure Setup must be complete
  • Python environment must be configured
  • AI models must be downloaded
  • Mistral API key must be obtained

🔧 Technical Deep Dive: Python-.NET Integration

Integration Patterns

// Simple approach - spawn Python process for each request
public async Task<string> RecognizeSpeechAsync(byte[] audioData)
{
    var tempFile = Path.GetTempFileName() + ".wav";
    await File.WriteAllBytesAsync(tempFile, audioData);
    
    var process = new Process
    {
        StartInfo = new ProcessStartInfo
        {
            FileName = "python",
            Arguments = $"-m vosk.transcribe --model {_modelPath} --input {tempFile}",
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false,
            CreateNoWindow = true,
            // Prevent process from hanging
            EnvironmentVariables = new Dictionary<string, string>
            {
                ["PYTHONPATH"] = "/path/to/vosk"
            }
        }
    };
    
    process.Start();
    
    // Read output with timeout
    var output = await process.StandardOutput.ReadToEndAsync();
    var error = await process.StandardError.ReadToEndAsync();
    
    await process.WaitForExitAsync();
    
    if (process.ExitCode != 0)
    {
        throw new AiServiceException($"Vosk failed: {error}");
    }
    
    return output.Trim();
}

Pros: Simple, easy to implement, no additional dependencies
Cons: Process startup overhead (~100-500ms per call), resource-intensive

// Maintain a pool of persistent Python processes
public class PythonProcessPool : IDisposable
{
    private readonly ConcurrentQueue<Process> _pool = new();
    private readonly SemaphoreSlim _semaphore;
    private readonly string _pythonPath;
    private readonly string _scriptPath;
    
    public PythonProcessPool(int size, string pythonPath, string scriptPath)
    {
        _semaphore = new SemaphoreSlim(size);
        _pythonPath = pythonPath;
        _scriptPath = scriptPath;
        
        // Pre-warm the pool
        for (int i = 0; i < size; i++)
        {
            _pool.Enqueue(StartProcess());
        }
    }
    
    public async Task<string> ExecuteAsync(string input)
    {
        await _semaphore.WaitAsync();
        
        if (!_pool.TryDequeue(out var process))
        {
            process = StartProcess();
        }
        
        try
        {
            // Send input to stdin
            await process.StandardInput.WriteLineAsync(input);
            await process.StandardInput.FlushAsync();
            
            // Read response from stdout
            var response = await process.StandardOutput.ReadLineAsync();
            
            return response;
        }
        finally
        {
            _pool.Enqueue(process);
            _semaphore.Release();
        }
    }
    
    private Process StartProcess()
    {
        return new Process
        {
            StartInfo = new ProcessStartInfo
            {
                FileName = _pythonPath,
                Arguments = _scriptPath,
                RedirectStandardInput = true,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                UseShellExecute = false,
                CreateNoWindow = true
            }
        }.Start();
    }
    
    public void Dispose()
    {
        foreach (var process in _pool)
        {
            try { process.Kill(); } catch { }
            process.Dispose();
        }
    }
}

Pros: Eliminates process startup overhead, much faster for repeated calls
Cons: More complex, need to handle process lifecycle, stdin/stdout parsing

Option 3: gRPC (Best for Production)

  • Create Python gRPC server for AI services
  • .NET client calls gRPC methods
  • Single persistent Python process
  • Type-safe, high-performance

Pros: Best performance, type-safe, production-ready
Cons: Most complex to set up, requires gRPC knowledge

Error Handling Strategy

// Comprehensive error handling for AI services
public async Task<T> ExecuteWithRetryAsync<T>(
    Func<Task<T>> action,
    string operationName,
    int maxRetries = 3,
    TimeSpan? timeout = null)
{
    var retryCount = 0;
    timeout ??= TimeSpan.FromSeconds(30);
    
    while (true)
    {
        try
        {
            using var cts = new CancellationTokenSource(timeout.Value);
            return await action();
        }
        catch (OperationCanceledException) when (retryCount < maxRetries)
        {
            retryCount++;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            _logger.LogWarning(
                "{Operation} timed out (attempt {Attempt}), retrying in {Delay}s...",
                operationName, retryCount, delay.TotalSeconds);
            await Task.Delay(delay);
        }
        catch (AiServiceException ex) when (IsRetryable(ex) && retryCount < maxRetries)
        {
            retryCount++;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            _logger.LogWarning(ex, 
                "{Operation} failed (attempt {Attempt}), retrying in {Delay}s...",
                operationName, retryCount, delay.TotalSeconds);
            await Task.Delay(delay);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "{Operation} failed permanently after {Attempts} attempts",
                operationName, retryCount + 1);
            throw new AiServiceException($"{operationName} failed: {ex.Message}", ex);
        }
    }
    
    bool IsRetryable(AiServiceException ex) => 
        ex.ErrorCode switch
        {
            AiErrorCode.RateLimited => true,
            AiErrorCode.Temporary => true,
            AiErrorCode.Timeout => true,
            _ => false
        };
}

Health Check Implementation

// Health check for AI services
public class AiServicesHealthCheck : IHealthCheck
{
    private readonly IMistralService _mistral;
    private readonly IVoskService _vosk;
    private readonly ITtsService _tts;
    
    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        var checks = new Dictionary<string, HealthStatus>();
        
        // Check Mistral
        try
        {
            await _mistral.TestConnectionAsync(cancellationToken);
            checks["Mistral"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Mistral"] = HealthStatus.Unhealthy;
        }
        
        // Check Vosk
        try
        {
            await _vosk.TestModelAsync(cancellationToken);
            checks["Vosk"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Vosk"] = HealthStatus.Unhealthy;
        }
        
        // Check Coqui TTS
        try
        {
            await _tts.TestModelAsync(cancellationToken);
            checks["Coqui TTS"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Coqui TTS"] = HealthStatus.Unhealthy;
        }
        
        var allHealthy = checks.Values.All(s => s == HealthStatus.Healthy);
        var status = allHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy;
        
        return new HealthCheckResult(
            status,
            "AI Services health check",
            data: checks);
    }
}

Audio File Management

// Audio file storage service
public class AudioFileService
{
    private readonly string _basePath;
    private readonly ILogger<AudioFileService> _logger;
    
    public AudioFileService(IConfiguration config, ILogger<AudioFileService> logger)
    {
        _basePath = config["Audio:StoragePath"] ?? "/var/audio";
        _logger = logger;
        
        Directory.CreateDirectory(_basePath);
    }
    
    public async Task<string> SaveAudioAsync(byte[] audioData, string category, int entityId)
    {
        // Validate audio data
        if (audioData == null || audioData.Length == 0)
            throw new ArgumentException("Audio data cannot be empty");
        
        if (audioData.Length > 10 * 1024 * 1024) // 10MB limit
            throw new ArgumentException("Audio file too large");
        
        // Create category directory
        var categoryPath = Path.Combine(_basePath, category);
        Directory.CreateDirectory(categoryPath);
        
        // Generate unique filename
        var extension = ".wav"; // or detect from data
        var filename = $"{entityId}{extension}";
        var fullPath = Path.Combine(categoryPath, filename);
        
        // Check for existing file
        if (File.Exists(fullPath))
            File.Delete(fullPath);
        
        // Save file
        await File.WriteAllBytesAsync(fullPath, audioData);
        
        // Return relative path
        return $"/audio/{category}/{filename}";
    }
    
    public async Task CleanupOldFilesAsync(TimeSpan olderThan)
    {
        var cutoff = DateTime.UtcNow - olderThan;
        
        foreach (var categoryDir in Directory.GetDirectories(_basePath))
        {
            foreach (var file in Directory.GetFiles(categoryDir))
            {
                var fileInfo = new FileInfo(file);
                if (fileInfo.LastWriteTimeUtc < cutoff)
                {
                    try
                    {
                        File.Delete(file);
                        _logger.LogInformation("Deleted old audio file: {File}", file);
                    }
                    catch (Exception ex)
                    {
                        _logger.LogError(ex, "Failed to delete audio file: {File}", file);
                    }
                }
            }
        }
    }
}

Rate Limiting Implementation

// Rate limiter for AI services
public class AiRateLimiter
{
    private readonly ConcurrentDictionary<string, RateLimitEntry> _limits = new();
    private readonly int _maxRequests;
    private readonly TimeSpan _window;
    
    public AiRateLimiter(int maxRequestsPerWindow, TimeSpan window)
    {
        _maxRequests = maxRequestsPerWindow;
        _window = window;
    }
    
    public bool TryAcquire(string serviceName)
    {
        var now = DateTime.UtcNow;
        
        var entry = _limits.GetOrAdd(serviceName, _ => new RateLimitEntry());
        
        lock (entry)
        {
            // Remove old requests
            entry.Requests.RemoveAll(r => now - r > _window);
            
            // Check if limit exceeded
            if (entry.Requests.Count >= _maxRequests)
                return false;
            
            // Add new request
            entry.Requests.Add(now);
            return true;
        }
    }
    
    private class RateLimitEntry
    {
        public List<DateTime> Requests { get; } = new();
    }
}

// Usage in controller
[HttpPost("recognize")]
public async Task<IActionResult> RecognizeSpeech([FromBody] AudioRequest request)
{
    if (!_rateLimiter.TryAcquire("Vosk"))
    {
        return StatusCode(429, "Too many requests");
    }
    
    // ... process request
}

📝 Notes & Decisions

Date Decision Rationale
May 31, 2025 Use Mistral-Medium Best balance of quality and cost for this use case
May 31, 2025 Use Vosk for speech recognition Open-source, supports German, self-hostable
May 31, 2025 Use Coqui TTS Open-source, good quality, supports German
May 31, 2025 Self-host AI services More control, no external API dependencies (except Mistral)
May 31, 2025 Use Python CLI wrappers Easier integration with .NET, well-supported libraries

Technical Notes

Vosk Configuration

{
  "Vosk": {
    "PythonPath": "/usr/bin/python3",
    "ModelPath": "/models/vosk-model-de-0.22",
    "SampleRate": 16000
  }
}

Coqui TTS Configuration

{
  "Coqui": {
    "PythonPath": "/usr/bin/python3",
    "ModelName": "tts_models/de/deu/fairseq/vits",
    "AudioOutputFormat": "wav",
    "SampleRate": 22050
  }
}

Mistral Configuration

{
  "Mistral": {
    "ApiKey": "your-api-key",
    "BaseUrl": "https://api.mistral.ai/v1/",
    "DefaultModel": "mistral-medium",
    "TimeoutSeconds": 30,
    "MaxRetries": 3
  }
}

Error Handling Strategy

  1. Transient errors: Retry with exponential backoff
  2. Rate limits: Return 429 to client, suggest retry
  3. Service unavailable: Return 503, log error
  4. Invalid response: Validate output, return meaningful error
  5. Timeout: Return 504, suggest retry

Caching Strategy

  • Mistral responses: Cache for 1 hour (stories unlikely to change)
  • TTS audio: Cache files permanently (regenerate only if text changes)
  • Vosk: No caching (each audio is unique)

Gotchas

  • ⚠️ Vosk model is ~500MB - ensure enough disk space
  • ⚠️ Coqui model is ~1.5GB - ensure enough disk space
  • ⚠️ Python processes may have memory leaks - monitor and restart
  • ⚠️ AI services may fail silently - implement health checks
  • ⚠️ Mistral API has costs - implement budget tracking
  • ⚠️ Audio generation can be CPU-intensive - consider separate service
  • ⚠️ Different Python versions may have compatibility issues

File Storage Structure

/public/
├── audio/
│   ├── vocabulary/       # Vocabulary word audio
│   │   └── {id}.wav
│   ├── story/           # Story segment audio
│   │   └── {levelId}-{order}.wav
│   └── quiz/            # Quiz question audio
│       └── {questionId}.wav
└── models/              # AI models
    ├── vosk/
    │   └── vosk-model-de-0.22/
    └── coqui/
        └── tts_models/

Performance Considerations

  • TTS generation: ~1-2 seconds per sentence
  • Speech recognition: ~1-3 seconds per audio clip
  • Mistral API: ~2-5 seconds per request
  • Consider async/background processing for batch operations

📊 Progress History

Date Status Change Notes
May 31, 2025 Created Initial plan based on application-plan.md


Feature created from application-plan.md