# Feature: AI Services Integration

> **Status**: ⏳ Planned  
> **Priority**: High  
> **Complexity**: High  
> **Estimate**: 10-16 hours  
> **Assignee**: -  
> **Created**: May 31, 2025  
> **Target Completion**: -  
> **PR**: -  
> **Related Features**: Story Integration, Vocabulary System, Quiz System, Lesson Management

---

## 📌 Overview

### Purpose
Integrate three AI services into the application: Mistral-Medium for text generation (stories, feedback), Vosk for speech recognition (speaking exercises), and Coqui TTS for text-to-speech (vocabulary, stories, quizzes).

### User Story
As a learner, I want AI-powered features like generated stories, speech recognition for speaking practice, and TTS for audio content so that I can have an immersive and interactive learning experience.

### Acceptance Criteria
- [ ] Mistral-Medium API is integrated for story generation
- [ ] Mistral-Medium API is integrated for writing feedback
- [ ] Vosk speech recognition is integrated for speaking exercises
- [ ] Coqui TTS is integrated for audio generation
- [ ] All AI services are configurable via appsettings.json
- [ ] Error handling for AI service failures
- [ ] Rate limiting/caching for AI API calls

---

## 📋 Requirements

### Functional Requirements
| ID | Requirement | Priority |
|----|-------------|----------|
| FR-001 | Generate stories using Mistral-Medium | High |
| FR-002 | Generate writing feedback using Mistral-Medium | High |
| FR-003 | Transcribe speech using Vosk | High |
| FR-004 | Generate audio using Coqui TTS | High |
| FR-005 | Configure all services via configuration | High |
| FR-006 | Handle AI service errors gracefully | High |
| FR-007 | Cache/rate limit AI API calls | Medium |
| FR-008 | Validate AI outputs before use | Medium |

### Non-Functional Requirements
- Performance: TTS generation < 2 seconds per sentence
- Performance: Speech recognition < 3 seconds
- Performance: AI API calls < 5 seconds
- Reliability: Services should degrade gracefully on failure
- Cost: Minimize API call costs (caching, batching)

---

## 🏗️ Technical Design

### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│                    AI Services Layer                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  Mistral-Medium  │  │      Vosk       │  │    Coqui TTS    │  │
│  │   (Text Gen)    │  │ (Speech Recog.) │  │   (Audio Gen)   │  │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘  │
│           │                     │                    │         │
│           ▼                     ▼                    ▼         │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │              Application Services                         │  │
│  │  - StoryGenerationService                                │  │
│  │  - WritingFeedbackService                                 │  │
│  │  - VoskService (Speech Recognition)                      │  │
│  │  - TtsService (Text-to-Speech)                           │  │
│  └─────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
```

### Components Involved
- **Backend Services**:
  - `IMistralService` / `MistralService` - Text generation
  - `IVoskService` / `VoskService` - Speech recognition
  - `ITtsService` / `TtsService` - Text-to-speech
- **Configuration**: appsettings.json with AI settings
- **External Dependencies**:
  - Mistral-Medium API
  - Vosk Python library + German model
  - Coqui TTS Python library + German model

### Data Flow

#### Story Generation Flow
```
1. StoryGenerationService receives request with vocabulary list and level
2. Service constructs prompt for Mistral-Medium
3. MistralService sends prompt to Mistral API
4. Mistral API returns generated story text
5. StoryGenerationService validates and returns story
6. StoryService saves story and triggers audio generation
```

#### Speech Recognition Flow
```
1. User records speech in frontend
2. Frontend sends audio file to /api/speech/recognize
3. VoskService receives audio bytes
4. VoskService calls Vosk Python CLI with German model
5. Vosk returns transcribed text
6. Backend validates transcription and returns to frontend
```

#### TTS Flow
```
1. TtsService receives text to synthesize
2. Service calls Coqui TTS Python CLI
3. Coqui generates audio file
4. Audio file saved to filesystem
5. Audio URL returned to caller
```

---

## 🚀 Implementation Plan

### Phase 1: Configuration & Interfaces (2 hours)
- [ ] Add AI configuration section to appsettings.json
- [ ] Create configuration classes (MistralConfig, VoskConfig, CoquiConfig)
- [ ] Define service interfaces (IMistralService, IVoskService, ITtsService)
- [ ] Register services in Program.cs
- [ ] Set up configuration validation

### Phase 2: Mistral-Medium Integration (2-3 hours)
- [ ] Create MistralService implementation
- [ ] Implement Mistral API client
- [ ] Create request/response models
- [ ] Implement retry logic for API calls
- [ ] Add rate limiting (e.g., max 10 requests/minute)
- [ ] Add response caching for similar prompts
- [ ] Create prompt templates for different use cases

### Phase 3: Vosk Speech Recognition (2-3 hours)
- [ ] Create VoskService implementation
- [ ] Set up Vosk Python environment
- [ ] Download and configure German model (vosk-model-de-0.22)
- [ ] Implement audio processing
- [ ] Handle different audio formats
- [ ] Add error handling for recognition failures
- [ ] Create /api/speech/recognize endpoint

### Phase 4: Coqui TTS Integration (2-3 hours)
- [ ] Create TtsService implementation
- [ ] Set up Coqui TTS Python environment
- [ ] Download and configure German model
- [ ] Implement audio generation
- [ ] Add audio file management (storage, cleanup)
- [ ] Create audio serving endpoints
- [ ] Implement batch audio generation

### Phase 5: Service Integration (2 hours)
- [ ] Create StoryGenerationService (uses MistralService)
- [ ] Create WritingFeedbackService (uses MistralService)
- [ ] Create SpeechExerciseService (uses VoskService)
- [ ] Create AudioGenerationService (uses TtsService)
- [ ] Add health checks for all AI services
- [ ] Implement fallback mechanisms for service failures

### Milestones
| Milestone | Date | Status |
|-----------|------|--------|
| Configuration & Interfaces | - | ⏳ |
| Mistral Integration | - | ⏳ |
| Vosk Integration | - | ⏳ |
| Coqui TTS Integration | - | ⏳ |
| Service Integration | - | ⏳ |

---

## ✅ Tasks

### Backend - Configuration
- [ ] Add Mistral settings to appsettings.json
- [ ] Add Vosk settings to appsettings.json
- [ ] Add Coqui settings to appsettings.json
- [ ] Create Configuration/MistralConfig.cs
- [ ] Create Configuration/VoskConfig.cs
- [ ] Create Configuration/CoquiConfig.cs
- [ ] Register all AI services in Program.cs
- [ ] Add health checks for AI services

### Backend - Mistral Service
- [ ] Create Domain/Interfaces/IMistralService.cs
- [ ] Create Infrastructure/Services/MistralService.cs
- [ ] Implement Mistral API client
- [ ] Create Models/MistralRequest.cs
- [ ] Create Models/MistralResponse.cs
- [ ] Add retry logic
- [ ] Add rate limiting
- [ ] Add response caching
- [ ] Write unit tests

### Backend - Vosk Service
- [ ] Create Domain/Interfaces/IVoskService.cs
- [ ] Create Infrastructure/Services/VoskService.cs
- [ ] Set up Python process execution
- [ ] Download and configure vosk-model-de-0.22
- [ ] Implement audio recognition
- [ ] Create /api/speech/recognize endpoint
- [ ] Create Presentation/Controllers/SpeechController.cs
- [ ] Write unit tests

### Backend - Coqui TTS Service
- [ ] Create Domain/Interfaces/ITtsService.cs
- [ ] Create Infrastructure/Services/TtsService.cs
- [ ] Set up Python process execution
- [ ] Download and configure Coqui German model
- [ ] Implement audio generation
- [ ] Create audio file storage mechanism
- [ ] Create /api/tts/generate endpoint
- [ ] Create Presentation/Controllers/TtsController.cs
- [ ] Write unit tests

### Backend - Higher-Level Services
- [ ] Create Application/Services/StoryGenerationService.cs
- [ ] Create Application/Services/WritingFeedbackService.cs
- [ ] Integrate with MistralService
- [ ] Add validation for AI outputs
- [ ] Write integration tests

### Infrastructure Setup
- [ ] Install Python 3.8+
- [ ] Install Vosk Python package
- [ ] Download vosk-model-de-0.22
- [ ] Install Coqui TTS package
- [ ] Download Coqui German model
- [ ] Set up file storage for audio
- [ ] Configure permissions

### Frontend Integration
- [ ] Create services/speechService.ts
- [ ] Create services/ttsService.ts
- [ ] Create services/aiService.ts
- [ ] Integrate with Recorder component
- [ ] Integrate with AudioPlayer component
- [ ] Add error handling for AI failures

---

## ✅ Definition of Done

### General Criteria (All Features)
- [ ] All acceptance criteria met and verified
- [ ] All tasks in this document completed
- [ ] Code follows Clean Architecture principles
- [ ] Code reviewed and approved by at least 1 team member
- [ ] All tests passing (unit, integration)
- [ ] Documentation updated (README, AGENTS.md if applicable)
- [ ] Feature works in development environment
- [ ] Feature deployed to staging environment
- [ ] Performance meets defined targets
- [ ] Security review completed
- [ ] No critical bugs or blockers

### AI-Specific Criteria
- [ ] All AI services functional in development
- [ ] Mistral API integration tested with valid API key
- [ ] Vosk speech recognition tested with German model
- [ ] Coqui TTS tested with German model
- [ ] Error handling tested (invalid inputs, service failures)
- [ ] Fallback mechanisms implemented and tested
- [ ] Rate limiting configured and tested
- [ ] Audio file generation and storage verified
- [ ] Health checks for all AI services passing

---

## 🧪 Testing Strategy

### Testing Approach

| Test Type | Coverage | Tools | Responsibility |
|-----------|----------|-------|----------------|
| Unit Tests | 80%+ code coverage | MsTest, Moq | Backend Dev |
| Integration Tests | All service interactions | MsTest, TestContainers | Backend Dev |
| API Tests | All endpoints | MsTest, HttpClient | Backend Dev |
| Frontend Unit Tests | Component logic | Vitest | Frontend Dev |
| Frontend Integration | Service integration | Vitest | Frontend Dev |
| E2E Tests | Critical user journeys | Playwright | QA/Dev |
| Manual Testing | Exploratory, edge cases | BrowserStack | QA |
| Load Testing | AI service performance | k6/JMeter | DevOps |

### AI-Specific Tests

#### Mistral Service Tests
- [ ] Test successful text generation
- [ ] Test API error handling (429, 500, 503)
- [ ] Test rate limiting (max requests per minute)
- [ ] Test response caching
- [ ] Test retry logic on failures
- [ ] Test timeout handling
- [ ] Test invalid API key handling

#### Vosk Service Tests
- [ ] Test successful speech recognition (clear audio)
- [ ] Test speech recognition with background noise
- [ ] Test speech recognition with different accents
- [ ] Test empty audio handling
- [ ] Test invalid audio format handling
- [ ] Test Python process failure handling
- [ ] Test model not found error handling
- [ ] Test confidence threshold validation

#### Coqui TTS Service Tests
- [ ] Test successful audio generation
- [ ] Test audio generation with long text
- [ ] Test audio generation with special characters
- [ ] Test invalid text handling
- [ ] Test Python process failure handling
- [ ] Test model not found error handling
- [ ] Test audio file format validation
- [ ] Test audio quality validation

### Test Data
- Sample audio files for Vosk testing (clear German speech, noisy audio, non-German speech)
- Sample texts for TTS testing (short, long, with special characters, with German umlauts)
- Sample prompts for Mistral testing (A1, A2, B1 levels)

---

## 🚨 Risks & Mitigations

### Technical Risks

| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| Python-.NET integration failures | High | High | Use Process class with proper error handling, implement process pooling, add timeouts | Backend Dev |
| Vosk model compatibility issues | Medium | High | Test with vosk-model-de-0.22 before implementation, have fallback to vosk-model-small-de-0.15 | Backend Dev |
| Coqui model quality issues | Medium | Medium | Test with sample German text, have alternative TTS service as fallback | Backend Dev |
| Mistral API rate limits | High | Medium | Implement caching (1h TTL), request queue, exponential backoff | Backend Dev |
| Mistral API costs exceed budget | Medium | High | Set budget alerts, implement cost tracking, cache aggressively | Backend Dev |
| AI services slow performance | High | Medium | Implement async processing, use background jobs for batch operations | Backend Dev |
| Audio files too large | Medium | Medium | Compress audio (16kHz, mono), implement streaming for large files | Backend Dev |
| Model files too large for deployment | Medium | Medium | Use Docker volumes, separate storage for models, consider cloud storage | DevOps |
| Memory leaks in Python processes | Medium | High | Implement process lifecycle management, add memory monitoring, use process pooling | Backend Dev |
| Different Python versions cause issues | Medium | Medium | Use Docker to pin Python version, document exact version in README | DevOps |

### Operational Risks

| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| AI service downtime | Medium | High | Implement health checks, circuit breakers, fallback responses | DevOps |
| Model files corrupted | Low | High | Implement checksum validation, store backups, automated recovery | DevOps |
| API key exposure | Medium | High | Use GitHub secrets, Azure Key Vault, never commit to repo | Security |
| Audio storage fills up | Medium | Medium | Implement cleanup job, set size quotas, use cloud storage | DevOps |

### Business Risks

| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| User data privacy concerns | Medium | High | Anonymize audio before processing, document data handling policy, comply with GDPR | Legal |
| AI generates inappropriate content | Low | High | Implement content moderation, add user reporting, use system prompts to prevent | Backend Dev |
| AI services become too expensive | Medium | Medium | Monitor costs, set budget caps, evaluate open-source alternatives | Product |

---

## 🔗 Dependencies

### Feature Dependencies
- [Infrastructure Setup](infrastructure-setup.md) - Required (backend project)

### Technical Dependencies
- Python 3.8+
- Vosk Python library
- vosk-model-de-0.22 (German model)
- Coqui TTS Python library
- Coqui German TTS model
- Mistral-Medium API key

### External Services
| Service | Purpose | Configuration |
|---------|---------|---------------|
| Mistral-Medium API | Text generation (stories, feedback) | API key, endpoint URL |
| Vosk | Speech recognition | Python path, model path |
| Coqui TTS | Text-to-speech | Python path, model name |

### Blockers
- [ ] Infrastructure Setup must be complete
- [ ] Python environment must be configured
- [ ] AI models must be downloaded
- [ ] Mistral API key must be obtained

---

## 🔧 Technical Deep Dive: Python-.NET Integration

### Integration Patterns

#### Option 1: Process.Start (Recommended for MVP)
```csharp
// Simple approach - spawn Python process for each request
public async Task<string> RecognizeSpeechAsync(byte[] audioData)
{
    var tempFile = Path.GetTempFileName() + ".wav";
    await File.WriteAllBytesAsync(tempFile, audioData);
    
    var process = new Process
    {
        StartInfo = new ProcessStartInfo
        {
            FileName = "python",
            Arguments = $"-m vosk.transcribe --model {_modelPath} --input {tempFile}",
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false,
            CreateNoWindow = true,
            // Prevent process from hanging
            EnvironmentVariables = new Dictionary<string, string>
            {
                ["PYTHONPATH"] = "/path/to/vosk"
            }
        }
    };
    
    process.Start();
    
    // Read output with timeout
    var output = await process.StandardOutput.ReadToEndAsync();
    var error = await process.StandardError.ReadToEndAsync();
    
    await process.WaitForExitAsync();
    
    if (process.ExitCode != 0)
    {
        throw new AiServiceException($"Vosk failed: {error}");
    }
    
    return output.Trim();
}
```

**Pros:** Simple, easy to implement, no additional dependencies  
**Cons:** Process startup overhead (~100-500ms per call), resource-intensive

#### Option 2: Process Pooling (Recommended for Production)
```csharp
// Maintain a pool of persistent Python processes
public class PythonProcessPool : IDisposable
{
    private readonly ConcurrentQueue<Process> _pool = new();
    private readonly SemaphoreSlim _semaphore;
    private readonly string _pythonPath;
    private readonly string _scriptPath;
    
    public PythonProcessPool(int size, string pythonPath, string scriptPath)
    {
        _semaphore = new SemaphoreSlim(size);
        _pythonPath = pythonPath;
        _scriptPath = scriptPath;
        
        // Pre-warm the pool
        for (int i = 0; i < size; i++)
        {
            _pool.Enqueue(StartProcess());
        }
    }
    
    public async Task<string> ExecuteAsync(string input)
    {
        await _semaphore.WaitAsync();
        
        if (!_pool.TryDequeue(out var process))
        {
            process = StartProcess();
        }
        
        try
        {
            // Send input to stdin
            await process.StandardInput.WriteLineAsync(input);
            await process.StandardInput.FlushAsync();
            
            // Read response from stdout
            var response = await process.StandardOutput.ReadLineAsync();
            
            return response;
        }
        finally
        {
            _pool.Enqueue(process);
            _semaphore.Release();
        }
    }
    
    private Process StartProcess()
    {
        return new Process
        {
            StartInfo = new ProcessStartInfo
            {
                FileName = _pythonPath,
                Arguments = _scriptPath,
                RedirectStandardInput = true,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                UseShellExecute = false,
                CreateNoWindow = true
            }
        }.Start();
    }
    
    public void Dispose()
    {
        foreach (var process in _pool)
        {
            try { process.Kill(); } catch { }
            process.Dispose();
        }
    }
}
```

**Pros:** Eliminates process startup overhead, much faster for repeated calls  
**Cons:** More complex, need to handle process lifecycle, stdin/stdout parsing

#### Option 3: gRPC (Best for Production)
- Create Python gRPC server for AI services
- .NET client calls gRPC methods
- Single persistent Python process
- Type-safe, high-performance

**Pros:** Best performance, type-safe, production-ready  
**Cons:** Most complex to set up, requires gRPC knowledge

### Error Handling Strategy

```csharp
// Comprehensive error handling for AI services
public async Task<T> ExecuteWithRetryAsync<T>(
    Func<Task<T>> action,
    string operationName,
    int maxRetries = 3,
    TimeSpan? timeout = null)
{
    var retryCount = 0;
    timeout ??= TimeSpan.FromSeconds(30);
    
    while (true)
    {
        try
        {
            using var cts = new CancellationTokenSource(timeout.Value);
            return await action();
        }
        catch (OperationCanceledException) when (retryCount < maxRetries)
        {
            retryCount++;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            _logger.LogWarning(
                "{Operation} timed out (attempt {Attempt}), retrying in {Delay}s...",
                operationName, retryCount, delay.TotalSeconds);
            await Task.Delay(delay);
        }
        catch (AiServiceException ex) when (IsRetryable(ex) && retryCount < maxRetries)
        {
            retryCount++;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            _logger.LogWarning(ex, 
                "{Operation} failed (attempt {Attempt}), retrying in {Delay}s...",
                operationName, retryCount, delay.TotalSeconds);
            await Task.Delay(delay);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "{Operation} failed permanently after {Attempts} attempts",
                operationName, retryCount + 1);
            throw new AiServiceException($"{operationName} failed: {ex.Message}", ex);
        }
    }
    
    bool IsRetryable(AiServiceException ex) => 
        ex.ErrorCode switch
        {
            AiErrorCode.RateLimited => true,
            AiErrorCode.Temporary => true,
            AiErrorCode.Timeout => true,
            _ => false
        };
}
```

### Health Check Implementation

```csharp
// Health check for AI services
public class AiServicesHealthCheck : IHealthCheck
{
    private readonly IMistralService _mistral;
    private readonly IVoskService _vosk;
    private readonly ITtsService _tts;
    
    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        var checks = new Dictionary<string, HealthStatus>();
        
        // Check Mistral
        try
        {
            await _mistral.TestConnectionAsync(cancellationToken);
            checks["Mistral"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Mistral"] = HealthStatus.Unhealthy;
        }
        
        // Check Vosk
        try
        {
            await _vosk.TestModelAsync(cancellationToken);
            checks["Vosk"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Vosk"] = HealthStatus.Unhealthy;
        }
        
        // Check Coqui TTS
        try
        {
            await _tts.TestModelAsync(cancellationToken);
            checks["Coqui TTS"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Coqui TTS"] = HealthStatus.Unhealthy;
        }
        
        var allHealthy = checks.Values.All(s => s == HealthStatus.Healthy);
        var status = allHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy;
        
        return new HealthCheckResult(
            status,
            "AI Services health check",
            data: checks);
    }
}
```

### Audio File Management

```csharp
// Audio file storage service
public class AudioFileService
{
    private readonly string _basePath;
    private readonly ILogger<AudioFileService> _logger;
    
    public AudioFileService(IConfiguration config, ILogger<AudioFileService> logger)
    {
        _basePath = config["Audio:StoragePath"] ?? "/var/audio";
        _logger = logger;
        
        Directory.CreateDirectory(_basePath);
    }
    
    public async Task<string> SaveAudioAsync(byte[] audioData, string category, int entityId)
    {
        // Validate audio data
        if (audioData == null || audioData.Length == 0)
            throw new ArgumentException("Audio data cannot be empty");
        
        if (audioData.Length > 10 * 1024 * 1024) // 10MB limit
            throw new ArgumentException("Audio file too large");
        
        // Create category directory
        var categoryPath = Path.Combine(_basePath, category);
        Directory.CreateDirectory(categoryPath);
        
        // Generate unique filename
        var extension = ".wav"; // or detect from data
        var filename = $"{entityId}{extension}";
        var fullPath = Path.Combine(categoryPath, filename);
        
        // Check for existing file
        if (File.Exists(fullPath))
            File.Delete(fullPath);
        
        // Save file
        await File.WriteAllBytesAsync(fullPath, audioData);
        
        // Return relative path
        return $"/audio/{category}/{filename}";
    }
    
    public async Task CleanupOldFilesAsync(TimeSpan olderThan)
    {
        var cutoff = DateTime.UtcNow - olderThan;
        
        foreach (var categoryDir in Directory.GetDirectories(_basePath))
        {
            foreach (var file in Directory.GetFiles(categoryDir))
            {
                var fileInfo = new FileInfo(file);
                if (fileInfo.LastWriteTimeUtc < cutoff)
                {
                    try
                    {
                        File.Delete(file);
                        _logger.LogInformation("Deleted old audio file: {File}", file);
                    }
                    catch (Exception ex)
                    {
                        _logger.LogError(ex, "Failed to delete audio file: {File}", file);
                    }
                }
            }
        }
    }
}
```

### Rate Limiting Implementation

```csharp
// Rate limiter for AI services
public class AiRateLimiter
{
    private readonly ConcurrentDictionary<string, RateLimitEntry> _limits = new();
    private readonly int _maxRequests;
    private readonly TimeSpan _window;
    
    public AiRateLimiter(int maxRequestsPerWindow, TimeSpan window)
    {
        _maxRequests = maxRequestsPerWindow;
        _window = window;
    }
    
    public bool TryAcquire(string serviceName)
    {
        var now = DateTime.UtcNow;
        
        var entry = _limits.GetOrAdd(serviceName, _ => new RateLimitEntry());
        
        lock (entry)
        {
            // Remove old requests
            entry.Requests.RemoveAll(r => now - r > _window);
            
            // Check if limit exceeded
            if (entry.Requests.Count >= _maxRequests)
                return false;
            
            // Add new request
            entry.Requests.Add(now);
            return true;
        }
    }
    
    private class RateLimitEntry
    {
        public List<DateTime> Requests { get; } = new();
    }
}

// Usage in controller
[HttpPost("recognize")]
public async Task<IActionResult> RecognizeSpeech([FromBody] AudioRequest request)
{
    if (!_rateLimiter.TryAcquire("Vosk"))
    {
        return StatusCode(429, "Too many requests");
    }
    
    // ... process request
}
```

## 📝 Notes & Decisions
| Date | Decision | Rationale |
|------|----------|-----------|
| May 31, 2025 | Use Mistral-Medium | Best balance of quality and cost for this use case |
| May 31, 2025 | Use Vosk for speech recognition | Open-source, supports German, self-hostable |
| May 31, 2025 | Use Coqui TTS | Open-source, good quality, supports German |
| May 31, 2025 | Self-host AI services | More control, no external API dependencies (except Mistral) |
| May 31, 2025 | Use Python CLI wrappers | Easier integration with .NET, well-supported libraries |

### Technical Notes

#### Vosk Configuration
```json
{
  "Vosk": {
    "PythonPath": "/usr/bin/python3",
    "ModelPath": "/models/vosk-model-de-0.22",
    "SampleRate": 16000
  }
}
```

#### Coqui TTS Configuration
```json
{
  "Coqui": {
    "PythonPath": "/usr/bin/python3",
    "ModelName": "tts_models/de/deu/fairseq/vits",
    "AudioOutputFormat": "wav",
    "SampleRate": 22050
  }
}
```

#### Mistral Configuration
```json
{
  "Mistral": {
    "ApiKey": "your-api-key",
    "BaseUrl": "https://api.mistral.ai/v1/",
    "DefaultModel": "mistral-medium",
    "TimeoutSeconds": 30,
    "MaxRetries": 3
  }
}
```

### Error Handling Strategy
1. **Transient errors**: Retry with exponential backoff
2. **Rate limits**: Return 429 to client, suggest retry
3. **Service unavailable**: Return 503, log error
4. **Invalid response**: Validate output, return meaningful error
5. **Timeout**: Return 504, suggest retry

### Caching Strategy
- **Mistral responses**: Cache for 1 hour (stories unlikely to change)
- **TTS audio**: Cache files permanently (regenerate only if text changes)
- **Vosk**: No caching (each audio is unique)

### Gotchas
- ⚠️ Vosk model is ~500MB - ensure enough disk space
- ⚠️ Coqui model is ~1.5GB - ensure enough disk space
- ⚠️ Python processes may have memory leaks - monitor and restart
- ⚠️ AI services may fail silently - implement health checks
- ⚠️ Mistral API has costs - implement budget tracking
- ⚠️ Audio generation can be CPU-intensive - consider separate service
- ⚠️ Different Python versions may have compatibility issues

### File Storage Structure
```
/public/
├── audio/
│   ├── vocabulary/       # Vocabulary word audio
│   │   └── {id}.wav
│   ├── story/           # Story segment audio
│   │   └── {levelId}-{order}.wav
│   └── quiz/            # Quiz question audio
│       └── {questionId}.wav
└── models/              # AI models
    ├── vosk/
    │   └── vosk-model-de-0.22/
    └── coqui/
        └── tts_models/
```

### Performance Considerations
- TTS generation: ~1-2 seconds per sentence
- Speech recognition: ~1-3 seconds per audio clip
- Mistral API: ~2-5 seconds per request
- Consider async/background processing for batch operations

---

## 📊 Progress History

| Date | Status Change | Notes |
|------|---------------|-------|
| May 31, 2025 | Created | Initial plan based on application-plan.md |

---

## 📎 Related Files & Links

- Architecture: [Backend Structure](../architecture/backend-structure.md)
- Architecture: [Application Plan](../architecture/application-plan.md)
- Feature: [Story Integration](story-integration.md)
- Feature: [Vocabulary System](vocabulary-system.md)
- Feature: [Quiz System](quiz-system.md)
- Reference: [Mistral AI API Docs](https://docs.mistral.ai/)
- Reference: [Vosk Documentation](https://alphacephei.com/vosk/)
- Reference: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS)
- Reference: [vosk-model-de-0.22](https://alphacephei.com/vosk/models)
- Reference: [Coqui German Model](https://github.com/coqui-ai/TTS/wiki/Multilingual-support)

---

*Feature created from application-plan.md*