# Feature: AI Services Integration > **Status**: โณ Planned > **Priority**: High > **Complexity**: High > **Estimate**: 10-16 hours > **Assignee**: - > **Created**: May 31, 2025 > **Target Completion**: - > **PR**: - > **Related Features**: Story Integration, Vocabulary System, Quiz System, Lesson Management --- ## ๐Ÿ“Œ Overview ### Purpose Integrate three AI services into the application: Mistral-Medium for text generation (stories, feedback), Vosk for speech recognition (speaking exercises), and Coqui TTS for text-to-speech (vocabulary, stories, quizzes). ### User Story As a learner, I want AI-powered features like generated stories, speech recognition for speaking practice, and TTS for audio content so that I can have an immersive and interactive learning experience. ### Acceptance Criteria - [ ] Mistral-Medium API is integrated for story generation - [ ] Mistral-Medium API is integrated for writing feedback - [ ] Vosk speech recognition is integrated for speaking exercises - [ ] Coqui TTS is integrated for audio generation - [ ] All AI services are configurable via appsettings.json - [ ] Error handling for AI service failures - [ ] Rate limiting/caching for AI API calls --- ## ๐Ÿ“‹ Requirements ### Functional Requirements | ID | Requirement | Priority | |----|-------------|----------| | FR-001 | Generate stories using Mistral-Medium | High | | FR-002 | Generate writing feedback using Mistral-Medium | High | | FR-003 | Transcribe speech using Vosk | High | | FR-004 | Generate audio using Coqui TTS | High | | FR-005 | Configure all services via configuration | High | | FR-006 | Handle AI service errors gracefully | High | | FR-007 | Cache/rate limit AI API calls | Medium | | FR-008 | Validate AI outputs before use | Medium | ### Non-Functional Requirements - Performance: TTS generation < 2 seconds per sentence - Performance: Speech recognition < 3 seconds - Performance: AI API calls < 5 seconds - Reliability: Services should degrade gracefully on failure - Cost: Minimize API call costs (caching, batching) --- ## ๐Ÿ—๏ธ Technical Design ### Architecture Overview ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ AI Services Layer โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Mistral-Medium โ”‚ โ”‚ Vosk โ”‚ โ”‚ Coqui TTS โ”‚ โ”‚ โ”‚ โ”‚ (Text Gen) โ”‚ โ”‚ (Speech Recog.) โ”‚ โ”‚ (Audio Gen) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ”‚ โ–ผ โ–ผ โ–ผ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Application Services โ”‚ โ”‚ โ”‚ โ”‚ - StoryGenerationService โ”‚ โ”‚ โ”‚ โ”‚ - WritingFeedbackService โ”‚ โ”‚ โ”‚ โ”‚ - VoskService (Speech Recognition) โ”‚ โ”‚ โ”‚ โ”‚ - TtsService (Text-to-Speech) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### Components Involved - **Backend Services**: - `IMistralService` / `MistralService` - Text generation - `IVoskService` / `VoskService` - Speech recognition - `ITtsService` / `TtsService` - Text-to-speech - **Configuration**: appsettings.json with AI settings - **External Dependencies**: - Mistral-Medium API - Vosk Python library + German model - Coqui TTS Python library + German model ### Data Flow #### Story Generation Flow ``` 1. StoryGenerationService receives request with vocabulary list and level 2. Service constructs prompt for Mistral-Medium 3. MistralService sends prompt to Mistral API 4. Mistral API returns generated story text 5. StoryGenerationService validates and returns story 6. StoryService saves story and triggers audio generation ``` #### Speech Recognition Flow ``` 1. User records speech in frontend 2. Frontend sends audio file to /api/speech/recognize 3. VoskService receives audio bytes 4. VoskService calls Vosk Python CLI with German model 5. Vosk returns transcribed text 6. Backend validates transcription and returns to frontend ``` #### TTS Flow ``` 1. TtsService receives text to synthesize 2. Service calls Coqui TTS Python CLI 3. Coqui generates audio file 4. Audio file saved to filesystem 5. Audio URL returned to caller ``` --- ## ๐Ÿš€ Implementation Plan ### Phase 1: Configuration & Interfaces (2 hours) - [ ] Add AI configuration section to appsettings.json - [ ] Create configuration classes (MistralConfig, VoskConfig, CoquiConfig) - [ ] Define service interfaces (IMistralService, IVoskService, ITtsService) - [ ] Register services in Program.cs - [ ] Set up configuration validation ### Phase 2: Mistral-Medium Integration (2-3 hours) - [ ] Create MistralService implementation - [ ] Implement Mistral API client - [ ] Create request/response models - [ ] Implement retry logic for API calls - [ ] Add rate limiting (e.g., max 10 requests/minute) - [ ] Add response caching for similar prompts - [ ] Create prompt templates for different use cases ### Phase 3: Vosk Speech Recognition (2-3 hours) - [ ] Create VoskService implementation - [ ] Set up Vosk Python environment - [ ] Download and configure German model (vosk-model-de-0.22) - [ ] Implement audio processing - [ ] Handle different audio formats - [ ] Add error handling for recognition failures - [ ] Create /api/speech/recognize endpoint ### Phase 4: Coqui TTS Integration (2-3 hours) - [ ] Create TtsService implementation - [ ] Set up Coqui TTS Python environment - [ ] Download and configure German model - [ ] Implement audio generation - [ ] Add audio file management (storage, cleanup) - [ ] Create audio serving endpoints - [ ] Implement batch audio generation ### Phase 5: Service Integration (2 hours) - [ ] Create StoryGenerationService (uses MistralService) - [ ] Create WritingFeedbackService (uses MistralService) - [ ] Create SpeechExerciseService (uses VoskService) - [ ] Create AudioGenerationService (uses TtsService) - [ ] Add health checks for all AI services - [ ] Implement fallback mechanisms for service failures ### Milestones | Milestone | Date | Status | |-----------|------|--------| | Configuration & Interfaces | - | โณ | | Mistral Integration | - | โณ | | Vosk Integration | - | โณ | | Coqui TTS Integration | - | โณ | | Service Integration | - | โณ | --- ## โœ… Tasks ### Backend - Configuration - [ ] Add Mistral settings to appsettings.json - [ ] Add Vosk settings to appsettings.json - [ ] Add Coqui settings to appsettings.json - [ ] Create Configuration/MistralConfig.cs - [ ] Create Configuration/VoskConfig.cs - [ ] Create Configuration/CoquiConfig.cs - [ ] Register all AI services in Program.cs - [ ] Add health checks for AI services ### Backend - Mistral Service - [ ] Create Domain/Interfaces/IMistralService.cs - [ ] Create Infrastructure/Services/MistralService.cs - [ ] Implement Mistral API client - [ ] Create Models/MistralRequest.cs - [ ] Create Models/MistralResponse.cs - [ ] Add retry logic - [ ] Add rate limiting - [ ] Add response caching - [ ] Write unit tests ### Backend - Vosk Service - [ ] Create Domain/Interfaces/IVoskService.cs - [ ] Create Infrastructure/Services/VoskService.cs - [ ] Set up Python process execution - [ ] Download and configure vosk-model-de-0.22 - [ ] Implement audio recognition - [ ] Create /api/speech/recognize endpoint - [ ] Create Presentation/Controllers/SpeechController.cs - [ ] Write unit tests ### Backend - Coqui TTS Service - [ ] Create Domain/Interfaces/ITtsService.cs - [ ] Create Infrastructure/Services/TtsService.cs - [ ] Set up Python process execution - [ ] Download and configure Coqui German model - [ ] Implement audio generation - [ ] Create audio file storage mechanism - [ ] Create /api/tts/generate endpoint - [ ] Create Presentation/Controllers/TtsController.cs - [ ] Write unit tests ### Backend - Higher-Level Services - [ ] Create Application/Services/StoryGenerationService.cs - [ ] Create Application/Services/WritingFeedbackService.cs - [ ] Integrate with MistralService - [ ] Add validation for AI outputs - [ ] Write integration tests ### Infrastructure Setup - [ ] Install Python 3.8+ - [ ] Install Vosk Python package - [ ] Download vosk-model-de-0.22 - [ ] Install Coqui TTS package - [ ] Download Coqui German model - [ ] Set up file storage for audio - [ ] Configure permissions ### Frontend Integration - [ ] Create services/speechService.ts - [ ] Create services/ttsService.ts - [ ] Create services/aiService.ts - [ ] Integrate with Recorder component - [ ] Integrate with AudioPlayer component - [ ] Add error handling for AI failures --- ## โœ… Definition of Done ### General Criteria (All Features) - [ ] All acceptance criteria met and verified - [ ] All tasks in this document completed - [ ] Code follows Clean Architecture principles - [ ] Code reviewed and approved by at least 1 team member - [ ] All tests passing (unit, integration) - [ ] Documentation updated (README, AGENTS.md if applicable) - [ ] Feature works in development environment - [ ] Feature deployed to staging environment - [ ] Performance meets defined targets - [ ] Security review completed - [ ] No critical bugs or blockers ### AI-Specific Criteria - [ ] All AI services functional in development - [ ] Mistral API integration tested with valid API key - [ ] Vosk speech recognition tested with German model - [ ] Coqui TTS tested with German model - [ ] Error handling tested (invalid inputs, service failures) - [ ] Fallback mechanisms implemented and tested - [ ] Rate limiting configured and tested - [ ] Audio file generation and storage verified - [ ] Health checks for all AI services passing --- ## ๐Ÿงช Testing Strategy ### Testing Approach | Test Type | Coverage | Tools | Responsibility | |-----------|----------|-------|----------------| | Unit Tests | 80%+ code coverage | MsTest, Moq | Backend Dev | | Integration Tests | All service interactions | MsTest, TestContainers | Backend Dev | | API Tests | All endpoints | MsTest, HttpClient | Backend Dev | | Frontend Unit Tests | Component logic | Vitest | Frontend Dev | | Frontend Integration | Service integration | Vitest | Frontend Dev | | E2E Tests | Critical user journeys | Playwright | QA/Dev | | Manual Testing | Exploratory, edge cases | BrowserStack | QA | | Load Testing | AI service performance | k6/JMeter | DevOps | ### AI-Specific Tests #### Mistral Service Tests - [ ] Test successful text generation - [ ] Test API error handling (429, 500, 503) - [ ] Test rate limiting (max requests per minute) - [ ] Test response caching - [ ] Test retry logic on failures - [ ] Test timeout handling - [ ] Test invalid API key handling #### Vosk Service Tests - [ ] Test successful speech recognition (clear audio) - [ ] Test speech recognition with background noise - [ ] Test speech recognition with different accents - [ ] Test empty audio handling - [ ] Test invalid audio format handling - [ ] Test Python process failure handling - [ ] Test model not found error handling - [ ] Test confidence threshold validation #### Coqui TTS Service Tests - [ ] Test successful audio generation - [ ] Test audio generation with long text - [ ] Test audio generation with special characters - [ ] Test invalid text handling - [ ] Test Python process failure handling - [ ] Test model not found error handling - [ ] Test audio file format validation - [ ] Test audio quality validation ### Test Data - Sample audio files for Vosk testing (clear German speech, noisy audio, non-German speech) - Sample texts for TTS testing (short, long, with special characters, with German umlauts) - Sample prompts for Mistral testing (A1, A2, B1 levels) --- ## ๐Ÿšจ Risks & Mitigations ### Technical Risks | Risk | Likelihood | Impact | Mitigation | Owner | |------|------------|--------|------------|-------| | Python-.NET integration failures | High | High | Use Process class with proper error handling, implement process pooling, add timeouts | Backend Dev | | Vosk model compatibility issues | Medium | High | Test with vosk-model-de-0.22 before implementation, have fallback to vosk-model-small-de-0.15 | Backend Dev | | Coqui model quality issues | Medium | Medium | Test with sample German text, have alternative TTS service as fallback | Backend Dev | | Mistral API rate limits | High | Medium | Implement caching (1h TTL), request queue, exponential backoff | Backend Dev | | Mistral API costs exceed budget | Medium | High | Set budget alerts, implement cost tracking, cache aggressively | Backend Dev | | AI services slow performance | High | Medium | Implement async processing, use background jobs for batch operations | Backend Dev | | Audio files too large | Medium | Medium | Compress audio (16kHz, mono), implement streaming for large files | Backend Dev | | Model files too large for deployment | Medium | Medium | Use Docker volumes, separate storage for models, consider cloud storage | DevOps | | Memory leaks in Python processes | Medium | High | Implement process lifecycle management, add memory monitoring, use process pooling | Backend Dev | | Different Python versions cause issues | Medium | Medium | Use Docker to pin Python version, document exact version in README | DevOps | ### Operational Risks | Risk | Likelihood | Impact | Mitigation | Owner | |------|------------|--------|------------|-------| | AI service downtime | Medium | High | Implement health checks, circuit breakers, fallback responses | DevOps | | Model files corrupted | Low | High | Implement checksum validation, store backups, automated recovery | DevOps | | API key exposure | Medium | High | Use GitHub secrets, Azure Key Vault, never commit to repo | Security | | Audio storage fills up | Medium | Medium | Implement cleanup job, set size quotas, use cloud storage | DevOps | ### Business Risks | Risk | Likelihood | Impact | Mitigation | Owner | |------|------------|--------|------------|-------| | User data privacy concerns | Medium | High | Anonymize audio before processing, document data handling policy, comply with GDPR | Legal | | AI generates inappropriate content | Low | High | Implement content moderation, add user reporting, use system prompts to prevent | Backend Dev | | AI services become too expensive | Medium | Medium | Monitor costs, set budget caps, evaluate open-source alternatives | Product | --- ## ๐Ÿ”— Dependencies ### Feature Dependencies - [Infrastructure Setup](infrastructure-setup.md) - Required (backend project) ### Technical Dependencies - Python 3.8+ - Vosk Python library - vosk-model-de-0.22 (German model) - Coqui TTS Python library - Coqui German TTS model - Mistral-Medium API key ### External Services | Service | Purpose | Configuration | |---------|---------|---------------| | Mistral-Medium API | Text generation (stories, feedback) | API key, endpoint URL | | Vosk | Speech recognition | Python path, model path | | Coqui TTS | Text-to-speech | Python path, model name | ### Blockers - [ ] Infrastructure Setup must be complete - [ ] Python environment must be configured - [ ] AI models must be downloaded - [ ] Mistral API key must be obtained --- ## ๐Ÿ”ง Technical Deep Dive: Python-.NET Integration ### Integration Patterns #### Option 1: Process.Start (Recommended for MVP) ```csharp // Simple approach - spawn Python process for each request public async Task RecognizeSpeechAsync(byte[] audioData) { var tempFile = Path.GetTempFileName() + ".wav"; await File.WriteAllBytesAsync(tempFile, audioData); var process = new Process { StartInfo = new ProcessStartInfo { FileName = "python", Arguments = $"-m vosk.transcribe --model {_modelPath} --input {tempFile}", RedirectStandardOutput = true, RedirectStandardError = true, UseShellExecute = false, CreateNoWindow = true, // Prevent process from hanging EnvironmentVariables = new Dictionary { ["PYTHONPATH"] = "/path/to/vosk" } } }; process.Start(); // Read output with timeout var output = await process.StandardOutput.ReadToEndAsync(); var error = await process.StandardError.ReadToEndAsync(); await process.WaitForExitAsync(); if (process.ExitCode != 0) { throw new AiServiceException($"Vosk failed: {error}"); } return output.Trim(); } ``` **Pros:** Simple, easy to implement, no additional dependencies **Cons:** Process startup overhead (~100-500ms per call), resource-intensive #### Option 2: Process Pooling (Recommended for Production) ```csharp // Maintain a pool of persistent Python processes public class PythonProcessPool : IDisposable { private readonly ConcurrentQueue _pool = new(); private readonly SemaphoreSlim _semaphore; private readonly string _pythonPath; private readonly string _scriptPath; public PythonProcessPool(int size, string pythonPath, string scriptPath) { _semaphore = new SemaphoreSlim(size); _pythonPath = pythonPath; _scriptPath = scriptPath; // Pre-warm the pool for (int i = 0; i < size; i++) { _pool.Enqueue(StartProcess()); } } public async Task ExecuteAsync(string input) { await _semaphore.WaitAsync(); if (!_pool.TryDequeue(out var process)) { process = StartProcess(); } try { // Send input to stdin await process.StandardInput.WriteLineAsync(input); await process.StandardInput.FlushAsync(); // Read response from stdout var response = await process.StandardOutput.ReadLineAsync(); return response; } finally { _pool.Enqueue(process); _semaphore.Release(); } } private Process StartProcess() { return new Process { StartInfo = new ProcessStartInfo { FileName = _pythonPath, Arguments = _scriptPath, RedirectStandardInput = true, RedirectStandardOutput = true, RedirectStandardError = true, UseShellExecute = false, CreateNoWindow = true } }.Start(); } public void Dispose() { foreach (var process in _pool) { try { process.Kill(); } catch { } process.Dispose(); } } } ``` **Pros:** Eliminates process startup overhead, much faster for repeated calls **Cons:** More complex, need to handle process lifecycle, stdin/stdout parsing #### Option 3: gRPC (Best for Production) - Create Python gRPC server for AI services - .NET client calls gRPC methods - Single persistent Python process - Type-safe, high-performance **Pros:** Best performance, type-safe, production-ready **Cons:** Most complex to set up, requires gRPC knowledge ### Error Handling Strategy ```csharp // Comprehensive error handling for AI services public async Task ExecuteWithRetryAsync( Func> action, string operationName, int maxRetries = 3, TimeSpan? timeout = null) { var retryCount = 0; timeout ??= TimeSpan.FromSeconds(30); while (true) { try { using var cts = new CancellationTokenSource(timeout.Value); return await action(); } catch (OperationCanceledException) when (retryCount < maxRetries) { retryCount++; var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount)); _logger.LogWarning( "{Operation} timed out (attempt {Attempt}), retrying in {Delay}s...", operationName, retryCount, delay.TotalSeconds); await Task.Delay(delay); } catch (AiServiceException ex) when (IsRetryable(ex) && retryCount < maxRetries) { retryCount++; var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount)); _logger.LogWarning(ex, "{Operation} failed (attempt {Attempt}), retrying in {Delay}s...", operationName, retryCount, delay.TotalSeconds); await Task.Delay(delay); } catch (Exception ex) { _logger.LogError(ex, "{Operation} failed permanently after {Attempts} attempts", operationName, retryCount + 1); throw new AiServiceException($"{operationName} failed: {ex.Message}", ex); } } bool IsRetryable(AiServiceException ex) => ex.ErrorCode switch { AiErrorCode.RateLimited => true, AiErrorCode.Temporary => true, AiErrorCode.Timeout => true, _ => false }; } ``` ### Health Check Implementation ```csharp // Health check for AI services public class AiServicesHealthCheck : IHealthCheck { private readonly IMistralService _mistral; private readonly IVoskService _vosk; private readonly ITtsService _tts; public async Task CheckHealthAsync( HealthCheckContext context, CancellationToken cancellationToken = default) { var checks = new Dictionary(); // Check Mistral try { await _mistral.TestConnectionAsync(cancellationToken); checks["Mistral"] = HealthStatus.Healthy; } catch (Exception ex) { checks["Mistral"] = HealthStatus.Unhealthy; } // Check Vosk try { await _vosk.TestModelAsync(cancellationToken); checks["Vosk"] = HealthStatus.Healthy; } catch (Exception ex) { checks["Vosk"] = HealthStatus.Unhealthy; } // Check Coqui TTS try { await _tts.TestModelAsync(cancellationToken); checks["Coqui TTS"] = HealthStatus.Healthy; } catch (Exception ex) { checks["Coqui TTS"] = HealthStatus.Unhealthy; } var allHealthy = checks.Values.All(s => s == HealthStatus.Healthy); var status = allHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy; return new HealthCheckResult( status, "AI Services health check", data: checks); } } ``` ### Audio File Management ```csharp // Audio file storage service public class AudioFileService { private readonly string _basePath; private readonly ILogger _logger; public AudioFileService(IConfiguration config, ILogger logger) { _basePath = config["Audio:StoragePath"] ?? "/var/audio"; _logger = logger; Directory.CreateDirectory(_basePath); } public async Task SaveAudioAsync(byte[] audioData, string category, int entityId) { // Validate audio data if (audioData == null || audioData.Length == 0) throw new ArgumentException("Audio data cannot be empty"); if (audioData.Length > 10 * 1024 * 1024) // 10MB limit throw new ArgumentException("Audio file too large"); // Create category directory var categoryPath = Path.Combine(_basePath, category); Directory.CreateDirectory(categoryPath); // Generate unique filename var extension = ".wav"; // or detect from data var filename = $"{entityId}{extension}"; var fullPath = Path.Combine(categoryPath, filename); // Check for existing file if (File.Exists(fullPath)) File.Delete(fullPath); // Save file await File.WriteAllBytesAsync(fullPath, audioData); // Return relative path return $"/audio/{category}/{filename}"; } public async Task CleanupOldFilesAsync(TimeSpan olderThan) { var cutoff = DateTime.UtcNow - olderThan; foreach (var categoryDir in Directory.GetDirectories(_basePath)) { foreach (var file in Directory.GetFiles(categoryDir)) { var fileInfo = new FileInfo(file); if (fileInfo.LastWriteTimeUtc < cutoff) { try { File.Delete(file); _logger.LogInformation("Deleted old audio file: {File}", file); } catch (Exception ex) { _logger.LogError(ex, "Failed to delete audio file: {File}", file); } } } } } } ``` ### Rate Limiting Implementation ```csharp // Rate limiter for AI services public class AiRateLimiter { private readonly ConcurrentDictionary _limits = new(); private readonly int _maxRequests; private readonly TimeSpan _window; public AiRateLimiter(int maxRequestsPerWindow, TimeSpan window) { _maxRequests = maxRequestsPerWindow; _window = window; } public bool TryAcquire(string serviceName) { var now = DateTime.UtcNow; var entry = _limits.GetOrAdd(serviceName, _ => new RateLimitEntry()); lock (entry) { // Remove old requests entry.Requests.RemoveAll(r => now - r > _window); // Check if limit exceeded if (entry.Requests.Count >= _maxRequests) return false; // Add new request entry.Requests.Add(now); return true; } } private class RateLimitEntry { public List Requests { get; } = new(); } } // Usage in controller [HttpPost("recognize")] public async Task RecognizeSpeech([FromBody] AudioRequest request) { if (!_rateLimiter.TryAcquire("Vosk")) { return StatusCode(429, "Too many requests"); } // ... process request } ``` ## ๐Ÿ“ Notes & Decisions | Date | Decision | Rationale | |------|----------|-----------| | May 31, 2025 | Use Mistral-Medium | Best balance of quality and cost for this use case | | May 31, 2025 | Use Vosk for speech recognition | Open-source, supports German, self-hostable | | May 31, 2025 | Use Coqui TTS | Open-source, good quality, supports German | | May 31, 2025 | Self-host AI services | More control, no external API dependencies (except Mistral) | | May 31, 2025 | Use Python CLI wrappers | Easier integration with .NET, well-supported libraries | ### Technical Notes #### Vosk Configuration ```json { "Vosk": { "PythonPath": "/usr/bin/python3", "ModelPath": "/models/vosk-model-de-0.22", "SampleRate": 16000 } } ``` #### Coqui TTS Configuration ```json { "Coqui": { "PythonPath": "/usr/bin/python3", "ModelName": "tts_models/de/deu/fairseq/vits", "AudioOutputFormat": "wav", "SampleRate": 22050 } } ``` #### Mistral Configuration ```json { "Mistral": { "ApiKey": "your-api-key", "BaseUrl": "https://api.mistral.ai/v1/", "DefaultModel": "mistral-medium", "TimeoutSeconds": 30, "MaxRetries": 3 } } ``` ### Error Handling Strategy 1. **Transient errors**: Retry with exponential backoff 2. **Rate limits**: Return 429 to client, suggest retry 3. **Service unavailable**: Return 503, log error 4. **Invalid response**: Validate output, return meaningful error 5. **Timeout**: Return 504, suggest retry ### Caching Strategy - **Mistral responses**: Cache for 1 hour (stories unlikely to change) - **TTS audio**: Cache files permanently (regenerate only if text changes) - **Vosk**: No caching (each audio is unique) ### Gotchas - โš ๏ธ Vosk model is ~500MB - ensure enough disk space - โš ๏ธ Coqui model is ~1.5GB - ensure enough disk space - โš ๏ธ Python processes may have memory leaks - monitor and restart - โš ๏ธ AI services may fail silently - implement health checks - โš ๏ธ Mistral API has costs - implement budget tracking - โš ๏ธ Audio generation can be CPU-intensive - consider separate service - โš ๏ธ Different Python versions may have compatibility issues ### File Storage Structure ``` /public/ โ”œโ”€โ”€ audio/ โ”‚ โ”œโ”€โ”€ vocabulary/ # Vocabulary word audio โ”‚ โ”‚ โ””โ”€โ”€ {id}.wav โ”‚ โ”œโ”€โ”€ story/ # Story segment audio โ”‚ โ”‚ โ””โ”€โ”€ {levelId}-{order}.wav โ”‚ โ””โ”€โ”€ quiz/ # Quiz question audio โ”‚ โ””โ”€โ”€ {questionId}.wav โ””โ”€โ”€ models/ # AI models โ”œโ”€โ”€ vosk/ โ”‚ โ””โ”€โ”€ vosk-model-de-0.22/ โ””โ”€โ”€ coqui/ โ””โ”€โ”€ tts_models/ ``` ### Performance Considerations - TTS generation: ~1-2 seconds per sentence - Speech recognition: ~1-3 seconds per audio clip - Mistral API: ~2-5 seconds per request - Consider async/background processing for batch operations --- ## ๐Ÿ“Š Progress History | Date | Status Change | Notes | |------|---------------|-------| | May 31, 2025 | Created | Initial plan based on application-plan.md | --- ## ๐Ÿ“Ž Related Files & Links - Architecture: [Backend Structure](../architecture/backend-structure.md) - Architecture: [Application Plan](../architecture/application-plan.md) - Feature: [Story Integration](story-integration.md) - Feature: [Vocabulary System](vocabulary-system.md) - Feature: [Quiz System](quiz-system.md) - Reference: [Mistral AI API Docs](https://docs.mistral.ai/) - Reference: [Vosk Documentation](https://alphacephei.com/vosk/) - Reference: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS) - Reference: [vosk-model-de-0.22](https://alphacephei.com/vosk/models) - Reference: [Coqui German Model](https://github.com/coqui-ai/TTS/wiki/Multilingual-support) --- *Feature created from application-plan.md*