DeutschLernen/docs/features/ai-services.md
Lasse Rune Hansen 76e8af4987 Add complete solution: documentation, frontend, and project files
- Add comprehensive documentation in docs/ (architecture, features, roadmap)
- Add german-app-frontend with Vite, TypeScript, ESLint configuration
- Add AGENTS.md and .gitignore

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-05-31 18:20:53 +02:00

896 lines
31 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Feature: AI Services Integration
> **Status**: ⏳ Planned
> **Priority**: High
> **Complexity**: High
> **Estimate**: 10-16 hours
> **Assignee**: -
> **Created**: May 31, 2025
> **Target Completion**: -
> **PR**: -
> **Related Features**: Story Integration, Vocabulary System, Quiz System, Lesson Management
---
## 📌 Overview
### Purpose
Integrate three AI services into the application: Mistral-Medium for text generation (stories, feedback), Vosk for speech recognition (speaking exercises), and Coqui TTS for text-to-speech (vocabulary, stories, quizzes).
### User Story
As a learner, I want AI-powered features like generated stories, speech recognition for speaking practice, and TTS for audio content so that I can have an immersive and interactive learning experience.
### Acceptance Criteria
- [ ] Mistral-Medium API is integrated for story generation
- [ ] Mistral-Medium API is integrated for writing feedback
- [ ] Vosk speech recognition is integrated for speaking exercises
- [ ] Coqui TTS is integrated for audio generation
- [ ] All AI services are configurable via appsettings.json
- [ ] Error handling for AI service failures
- [ ] Rate limiting/caching for AI API calls
---
## 📋 Requirements
### Functional Requirements
| ID | Requirement | Priority |
|----|-------------|----------|
| FR-001 | Generate stories using Mistral-Medium | High |
| FR-002 | Generate writing feedback using Mistral-Medium | High |
| FR-003 | Transcribe speech using Vosk | High |
| FR-004 | Generate audio using Coqui TTS | High |
| FR-005 | Configure all services via configuration | High |
| FR-006 | Handle AI service errors gracefully | High |
| FR-007 | Cache/rate limit AI API calls | Medium |
| FR-008 | Validate AI outputs before use | Medium |
### Non-Functional Requirements
- Performance: TTS generation < 2 seconds per sentence
- Performance: Speech recognition < 3 seconds
- Performance: AI API calls < 5 seconds
- Reliability: Services should degrade gracefully on failure
- Cost: Minimize API call costs (caching, batching)
---
## 🏗️ Technical Design
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ AI Services Layer │
├─────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Mistral-Medium │ │ Vosk │ │ Coqui TTS │ │
│ │ (Text Gen) │ │ (Speech Recog.) │ │ (Audio Gen) │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Application Services │ │
│ │ - StoryGenerationService │ │
│ │ - WritingFeedbackService │ │
│ │ - VoskService (Speech Recognition) │ │
│ │ - TtsService (Text-to-Speech) │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Components Involved
- **Backend Services**:
- `IMistralService` / `MistralService` - Text generation
- `IVoskService` / `VoskService` - Speech recognition
- `ITtsService` / `TtsService` - Text-to-speech
- **Configuration**: appsettings.json with AI settings
- **External Dependencies**:
- Mistral-Medium API
- Vosk Python library + German model
- Coqui TTS Python library + German model
### Data Flow
#### Story Generation Flow
```
1. StoryGenerationService receives request with vocabulary list and level
2. Service constructs prompt for Mistral-Medium
3. MistralService sends prompt to Mistral API
4. Mistral API returns generated story text
5. StoryGenerationService validates and returns story
6. StoryService saves story and triggers audio generation
```
#### Speech Recognition Flow
```
1. User records speech in frontend
2. Frontend sends audio file to /api/speech/recognize
3. VoskService receives audio bytes
4. VoskService calls Vosk Python CLI with German model
5. Vosk returns transcribed text
6. Backend validates transcription and returns to frontend
```
#### TTS Flow
```
1. TtsService receives text to synthesize
2. Service calls Coqui TTS Python CLI
3. Coqui generates audio file
4. Audio file saved to filesystem
5. Audio URL returned to caller
```
---
## 🚀 Implementation Plan
### Phase 1: Configuration & Interfaces (2 hours)
- [ ] Add AI configuration section to appsettings.json
- [ ] Create configuration classes (MistralConfig, VoskConfig, CoquiConfig)
- [ ] Define service interfaces (IMistralService, IVoskService, ITtsService)
- [ ] Register services in Program.cs
- [ ] Set up configuration validation
### Phase 2: Mistral-Medium Integration (2-3 hours)
- [ ] Create MistralService implementation
- [ ] Implement Mistral API client
- [ ] Create request/response models
- [ ] Implement retry logic for API calls
- [ ] Add rate limiting (e.g., max 10 requests/minute)
- [ ] Add response caching for similar prompts
- [ ] Create prompt templates for different use cases
### Phase 3: Vosk Speech Recognition (2-3 hours)
- [ ] Create VoskService implementation
- [ ] Set up Vosk Python environment
- [ ] Download and configure German model (vosk-model-de-0.22)
- [ ] Implement audio processing
- [ ] Handle different audio formats
- [ ] Add error handling for recognition failures
- [ ] Create /api/speech/recognize endpoint
### Phase 4: Coqui TTS Integration (2-3 hours)
- [ ] Create TtsService implementation
- [ ] Set up Coqui TTS Python environment
- [ ] Download and configure German model
- [ ] Implement audio generation
- [ ] Add audio file management (storage, cleanup)
- [ ] Create audio serving endpoints
- [ ] Implement batch audio generation
### Phase 5: Service Integration (2 hours)
- [ ] Create StoryGenerationService (uses MistralService)
- [ ] Create WritingFeedbackService (uses MistralService)
- [ ] Create SpeechExerciseService (uses VoskService)
- [ ] Create AudioGenerationService (uses TtsService)
- [ ] Add health checks for all AI services
- [ ] Implement fallback mechanisms for service failures
### Milestones
| Milestone | Date | Status |
|-----------|------|--------|
| Configuration & Interfaces | - | |
| Mistral Integration | - | |
| Vosk Integration | - | |
| Coqui TTS Integration | - | |
| Service Integration | - | |
---
## ✅ Tasks
### Backend - Configuration
- [ ] Add Mistral settings to appsettings.json
- [ ] Add Vosk settings to appsettings.json
- [ ] Add Coqui settings to appsettings.json
- [ ] Create Configuration/MistralConfig.cs
- [ ] Create Configuration/VoskConfig.cs
- [ ] Create Configuration/CoquiConfig.cs
- [ ] Register all AI services in Program.cs
- [ ] Add health checks for AI services
### Backend - Mistral Service
- [ ] Create Domain/Interfaces/IMistralService.cs
- [ ] Create Infrastructure/Services/MistralService.cs
- [ ] Implement Mistral API client
- [ ] Create Models/MistralRequest.cs
- [ ] Create Models/MistralResponse.cs
- [ ] Add retry logic
- [ ] Add rate limiting
- [ ] Add response caching
- [ ] Write unit tests
### Backend - Vosk Service
- [ ] Create Domain/Interfaces/IVoskService.cs
- [ ] Create Infrastructure/Services/VoskService.cs
- [ ] Set up Python process execution
- [ ] Download and configure vosk-model-de-0.22
- [ ] Implement audio recognition
- [ ] Create /api/speech/recognize endpoint
- [ ] Create Presentation/Controllers/SpeechController.cs
- [ ] Write unit tests
### Backend - Coqui TTS Service
- [ ] Create Domain/Interfaces/ITtsService.cs
- [ ] Create Infrastructure/Services/TtsService.cs
- [ ] Set up Python process execution
- [ ] Download and configure Coqui German model
- [ ] Implement audio generation
- [ ] Create audio file storage mechanism
- [ ] Create /api/tts/generate endpoint
- [ ] Create Presentation/Controllers/TtsController.cs
- [ ] Write unit tests
### Backend - Higher-Level Services
- [ ] Create Application/Services/StoryGenerationService.cs
- [ ] Create Application/Services/WritingFeedbackService.cs
- [ ] Integrate with MistralService
- [ ] Add validation for AI outputs
- [ ] Write integration tests
### Infrastructure Setup
- [ ] Install Python 3.8+
- [ ] Install Vosk Python package
- [ ] Download vosk-model-de-0.22
- [ ] Install Coqui TTS package
- [ ] Download Coqui German model
- [ ] Set up file storage for audio
- [ ] Configure permissions
### Frontend Integration
- [ ] Create services/speechService.ts
- [ ] Create services/ttsService.ts
- [ ] Create services/aiService.ts
- [ ] Integrate with Recorder component
- [ ] Integrate with AudioPlayer component
- [ ] Add error handling for AI failures
---
## ✅ Definition of Done
### General Criteria (All Features)
- [ ] All acceptance criteria met and verified
- [ ] All tasks in this document completed
- [ ] Code follows Clean Architecture principles
- [ ] Code reviewed and approved by at least 1 team member
- [ ] All tests passing (unit, integration)
- [ ] Documentation updated (README, AGENTS.md if applicable)
- [ ] Feature works in development environment
- [ ] Feature deployed to staging environment
- [ ] Performance meets defined targets
- [ ] Security review completed
- [ ] No critical bugs or blockers
### AI-Specific Criteria
- [ ] All AI services functional in development
- [ ] Mistral API integration tested with valid API key
- [ ] Vosk speech recognition tested with German model
- [ ] Coqui TTS tested with German model
- [ ] Error handling tested (invalid inputs, service failures)
- [ ] Fallback mechanisms implemented and tested
- [ ] Rate limiting configured and tested
- [ ] Audio file generation and storage verified
- [ ] Health checks for all AI services passing
---
## 🧪 Testing Strategy
### Testing Approach
| Test Type | Coverage | Tools | Responsibility |
|-----------|----------|-------|----------------|
| Unit Tests | 80%+ code coverage | MsTest, Moq | Backend Dev |
| Integration Tests | All service interactions | MsTest, TestContainers | Backend Dev |
| API Tests | All endpoints | MsTest, HttpClient | Backend Dev |
| Frontend Unit Tests | Component logic | Vitest | Frontend Dev |
| Frontend Integration | Service integration | Vitest | Frontend Dev |
| E2E Tests | Critical user journeys | Playwright | QA/Dev |
| Manual Testing | Exploratory, edge cases | BrowserStack | QA |
| Load Testing | AI service performance | k6/JMeter | DevOps |
### AI-Specific Tests
#### Mistral Service Tests
- [ ] Test successful text generation
- [ ] Test API error handling (429, 500, 503)
- [ ] Test rate limiting (max requests per minute)
- [ ] Test response caching
- [ ] Test retry logic on failures
- [ ] Test timeout handling
- [ ] Test invalid API key handling
#### Vosk Service Tests
- [ ] Test successful speech recognition (clear audio)
- [ ] Test speech recognition with background noise
- [ ] Test speech recognition with different accents
- [ ] Test empty audio handling
- [ ] Test invalid audio format handling
- [ ] Test Python process failure handling
- [ ] Test model not found error handling
- [ ] Test confidence threshold validation
#### Coqui TTS Service Tests
- [ ] Test successful audio generation
- [ ] Test audio generation with long text
- [ ] Test audio generation with special characters
- [ ] Test invalid text handling
- [ ] Test Python process failure handling
- [ ] Test model not found error handling
- [ ] Test audio file format validation
- [ ] Test audio quality validation
### Test Data
- Sample audio files for Vosk testing (clear German speech, noisy audio, non-German speech)
- Sample texts for TTS testing (short, long, with special characters, with German umlauts)
- Sample prompts for Mistral testing (A1, A2, B1 levels)
---
## 🚨 Risks & Mitigations
### Technical Risks
| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| Python-.NET integration failures | High | High | Use Process class with proper error handling, implement process pooling, add timeouts | Backend Dev |
| Vosk model compatibility issues | Medium | High | Test with vosk-model-de-0.22 before implementation, have fallback to vosk-model-small-de-0.15 | Backend Dev |
| Coqui model quality issues | Medium | Medium | Test with sample German text, have alternative TTS service as fallback | Backend Dev |
| Mistral API rate limits | High | Medium | Implement caching (1h TTL), request queue, exponential backoff | Backend Dev |
| Mistral API costs exceed budget | Medium | High | Set budget alerts, implement cost tracking, cache aggressively | Backend Dev |
| AI services slow performance | High | Medium | Implement async processing, use background jobs for batch operations | Backend Dev |
| Audio files too large | Medium | Medium | Compress audio (16kHz, mono), implement streaming for large files | Backend Dev |
| Model files too large for deployment | Medium | Medium | Use Docker volumes, separate storage for models, consider cloud storage | DevOps |
| Memory leaks in Python processes | Medium | High | Implement process lifecycle management, add memory monitoring, use process pooling | Backend Dev |
| Different Python versions cause issues | Medium | Medium | Use Docker to pin Python version, document exact version in README | DevOps |
### Operational Risks
| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| AI service downtime | Medium | High | Implement health checks, circuit breakers, fallback responses | DevOps |
| Model files corrupted | Low | High | Implement checksum validation, store backups, automated recovery | DevOps |
| API key exposure | Medium | High | Use GitHub secrets, Azure Key Vault, never commit to repo | Security |
| Audio storage fills up | Medium | Medium | Implement cleanup job, set size quotas, use cloud storage | DevOps |
### Business Risks
| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| User data privacy concerns | Medium | High | Anonymize audio before processing, document data handling policy, comply with GDPR | Legal |
| AI generates inappropriate content | Low | High | Implement content moderation, add user reporting, use system prompts to prevent | Backend Dev |
| AI services become too expensive | Medium | Medium | Monitor costs, set budget caps, evaluate open-source alternatives | Product |
---
## 🔗 Dependencies
### Feature Dependencies
- [Infrastructure Setup](infrastructure-setup.md) - Required (backend project)
### Technical Dependencies
- Python 3.8+
- Vosk Python library
- vosk-model-de-0.22 (German model)
- Coqui TTS Python library
- Coqui German TTS model
- Mistral-Medium API key
### External Services
| Service | Purpose | Configuration |
|---------|---------|---------------|
| Mistral-Medium API | Text generation (stories, feedback) | API key, endpoint URL |
| Vosk | Speech recognition | Python path, model path |
| Coqui TTS | Text-to-speech | Python path, model name |
### Blockers
- [ ] Infrastructure Setup must be complete
- [ ] Python environment must be configured
- [ ] AI models must be downloaded
- [ ] Mistral API key must be obtained
---
## 🔧 Technical Deep Dive: Python-.NET Integration
### Integration Patterns
#### Option 1: Process.Start (Recommended for MVP)
```csharp
// Simple approach - spawn Python process for each request
public async Task<string> RecognizeSpeechAsync(byte[] audioData)
{
var tempFile = Path.GetTempFileName() + ".wav";
await File.WriteAllBytesAsync(tempFile, audioData);
var process = new Process
{
StartInfo = new ProcessStartInfo
{
FileName = "python",
Arguments = $"-m vosk.transcribe --model {_modelPath} --input {tempFile}",
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true,
// Prevent process from hanging
EnvironmentVariables = new Dictionary<string, string>
{
["PYTHONPATH"] = "/path/to/vosk"
}
}
};
process.Start();
// Read output with timeout
var output = await process.StandardOutput.ReadToEndAsync();
var error = await process.StandardError.ReadToEndAsync();
await process.WaitForExitAsync();
if (process.ExitCode != 0)
{
throw new AiServiceException($"Vosk failed: {error}");
}
return output.Trim();
}
```
**Pros:** Simple, easy to implement, no additional dependencies
**Cons:** Process startup overhead (~100-500ms per call), resource-intensive
#### Option 2: Process Pooling (Recommended for Production)
```csharp
// Maintain a pool of persistent Python processes
public class PythonProcessPool : IDisposable
{
private readonly ConcurrentQueue<Process> _pool = new();
private readonly SemaphoreSlim _semaphore;
private readonly string _pythonPath;
private readonly string _scriptPath;
public PythonProcessPool(int size, string pythonPath, string scriptPath)
{
_semaphore = new SemaphoreSlim(size);
_pythonPath = pythonPath;
_scriptPath = scriptPath;
// Pre-warm the pool
for (int i = 0; i < size; i++)
{
_pool.Enqueue(StartProcess());
}
}
public async Task<string> ExecuteAsync(string input)
{
await _semaphore.WaitAsync();
if (!_pool.TryDequeue(out var process))
{
process = StartProcess();
}
try
{
// Send input to stdin
await process.StandardInput.WriteLineAsync(input);
await process.StandardInput.FlushAsync();
// Read response from stdout
var response = await process.StandardOutput.ReadLineAsync();
return response;
}
finally
{
_pool.Enqueue(process);
_semaphore.Release();
}
}
private Process StartProcess()
{
return new Process
{
StartInfo = new ProcessStartInfo
{
FileName = _pythonPath,
Arguments = _scriptPath,
RedirectStandardInput = true,
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true
}
}.Start();
}
public void Dispose()
{
foreach (var process in _pool)
{
try { process.Kill(); } catch { }
process.Dispose();
}
}
}
```
**Pros:** Eliminates process startup overhead, much faster for repeated calls
**Cons:** More complex, need to handle process lifecycle, stdin/stdout parsing
#### Option 3: gRPC (Best for Production)
- Create Python gRPC server for AI services
- .NET client calls gRPC methods
- Single persistent Python process
- Type-safe, high-performance
**Pros:** Best performance, type-safe, production-ready
**Cons:** Most complex to set up, requires gRPC knowledge
### Error Handling Strategy
```csharp
// Comprehensive error handling for AI services
public async Task<T> ExecuteWithRetryAsync<T>(
Func<Task<T>> action,
string operationName,
int maxRetries = 3,
TimeSpan? timeout = null)
{
var retryCount = 0;
timeout ??= TimeSpan.FromSeconds(30);
while (true)
{
try
{
using var cts = new CancellationTokenSource(timeout.Value);
return await action();
}
catch (OperationCanceledException) when (retryCount < maxRetries)
{
retryCount++;
var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
_logger.LogWarning(
"{Operation} timed out (attempt {Attempt}), retrying in {Delay}s...",
operationName, retryCount, delay.TotalSeconds);
await Task.Delay(delay);
}
catch (AiServiceException ex) when (IsRetryable(ex) && retryCount < maxRetries)
{
retryCount++;
var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
_logger.LogWarning(ex,
"{Operation} failed (attempt {Attempt}), retrying in {Delay}s...",
operationName, retryCount, delay.TotalSeconds);
await Task.Delay(delay);
}
catch (Exception ex)
{
_logger.LogError(ex, "{Operation} failed permanently after {Attempts} attempts",
operationName, retryCount + 1);
throw new AiServiceException($"{operationName} failed: {ex.Message}", ex);
}
}
bool IsRetryable(AiServiceException ex) =>
ex.ErrorCode switch
{
AiErrorCode.RateLimited => true,
AiErrorCode.Temporary => true,
AiErrorCode.Timeout => true,
_ => false
};
}
```
### Health Check Implementation
```csharp
// Health check for AI services
public class AiServicesHealthCheck : IHealthCheck
{
private readonly IMistralService _mistral;
private readonly IVoskService _vosk;
private readonly ITtsService _tts;
public async Task<HealthCheckResult> CheckHealthAsync(
HealthCheckContext context,
CancellationToken cancellationToken = default)
{
var checks = new Dictionary<string, HealthStatus>();
// Check Mistral
try
{
await _mistral.TestConnectionAsync(cancellationToken);
checks["Mistral"] = HealthStatus.Healthy;
}
catch (Exception ex)
{
checks["Mistral"] = HealthStatus.Unhealthy;
}
// Check Vosk
try
{
await _vosk.TestModelAsync(cancellationToken);
checks["Vosk"] = HealthStatus.Healthy;
}
catch (Exception ex)
{
checks["Vosk"] = HealthStatus.Unhealthy;
}
// Check Coqui TTS
try
{
await _tts.TestModelAsync(cancellationToken);
checks["Coqui TTS"] = HealthStatus.Healthy;
}
catch (Exception ex)
{
checks["Coqui TTS"] = HealthStatus.Unhealthy;
}
var allHealthy = checks.Values.All(s => s == HealthStatus.Healthy);
var status = allHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy;
return new HealthCheckResult(
status,
"AI Services health check",
data: checks);
}
}
```
### Audio File Management
```csharp
// Audio file storage service
public class AudioFileService
{
private readonly string _basePath;
private readonly ILogger<AudioFileService> _logger;
public AudioFileService(IConfiguration config, ILogger<AudioFileService> logger)
{
_basePath = config["Audio:StoragePath"] ?? "/var/audio";
_logger = logger;
Directory.CreateDirectory(_basePath);
}
public async Task<string> SaveAudioAsync(byte[] audioData, string category, int entityId)
{
// Validate audio data
if (audioData == null || audioData.Length == 0)
throw new ArgumentException("Audio data cannot be empty");
if (audioData.Length > 10 * 1024 * 1024) // 10MB limit
throw new ArgumentException("Audio file too large");
// Create category directory
var categoryPath = Path.Combine(_basePath, category);
Directory.CreateDirectory(categoryPath);
// Generate unique filename
var extension = ".wav"; // or detect from data
var filename = $"{entityId}{extension}";
var fullPath = Path.Combine(categoryPath, filename);
// Check for existing file
if (File.Exists(fullPath))
File.Delete(fullPath);
// Save file
await File.WriteAllBytesAsync(fullPath, audioData);
// Return relative path
return $"/audio/{category}/{filename}";
}
public async Task CleanupOldFilesAsync(TimeSpan olderThan)
{
var cutoff = DateTime.UtcNow - olderThan;
foreach (var categoryDir in Directory.GetDirectories(_basePath))
{
foreach (var file in Directory.GetFiles(categoryDir))
{
var fileInfo = new FileInfo(file);
if (fileInfo.LastWriteTimeUtc < cutoff)
{
try
{
File.Delete(file);
_logger.LogInformation("Deleted old audio file: {File}", file);
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to delete audio file: {File}", file);
}
}
}
}
}
}
```
### Rate Limiting Implementation
```csharp
// Rate limiter for AI services
public class AiRateLimiter
{
private readonly ConcurrentDictionary<string, RateLimitEntry> _limits = new();
private readonly int _maxRequests;
private readonly TimeSpan _window;
public AiRateLimiter(int maxRequestsPerWindow, TimeSpan window)
{
_maxRequests = maxRequestsPerWindow;
_window = window;
}
public bool TryAcquire(string serviceName)
{
var now = DateTime.UtcNow;
var entry = _limits.GetOrAdd(serviceName, _ => new RateLimitEntry());
lock (entry)
{
// Remove old requests
entry.Requests.RemoveAll(r => now - r > _window);
// Check if limit exceeded
if (entry.Requests.Count >= _maxRequests)
return false;
// Add new request
entry.Requests.Add(now);
return true;
}
}
private class RateLimitEntry
{
public List<DateTime> Requests { get; } = new();
}
}
// Usage in controller
[HttpPost("recognize")]
public async Task<IActionResult> RecognizeSpeech([FromBody] AudioRequest request)
{
if (!_rateLimiter.TryAcquire("Vosk"))
{
return StatusCode(429, "Too many requests");
}
// ... process request
}
```
## 📝 Notes & Decisions
| Date | Decision | Rationale |
|------|----------|-----------|
| May 31, 2025 | Use Mistral-Medium | Best balance of quality and cost for this use case |
| May 31, 2025 | Use Vosk for speech recognition | Open-source, supports German, self-hostable |
| May 31, 2025 | Use Coqui TTS | Open-source, good quality, supports German |
| May 31, 2025 | Self-host AI services | More control, no external API dependencies (except Mistral) |
| May 31, 2025 | Use Python CLI wrappers | Easier integration with .NET, well-supported libraries |
### Technical Notes
#### Vosk Configuration
```json
{
"Vosk": {
"PythonPath": "/usr/bin/python3",
"ModelPath": "/models/vosk-model-de-0.22",
"SampleRate": 16000
}
}
```
#### Coqui TTS Configuration
```json
{
"Coqui": {
"PythonPath": "/usr/bin/python3",
"ModelName": "tts_models/de/deu/fairseq/vits",
"AudioOutputFormat": "wav",
"SampleRate": 22050
}
}
```
#### Mistral Configuration
```json
{
"Mistral": {
"ApiKey": "your-api-key",
"BaseUrl": "https://api.mistral.ai/v1/",
"DefaultModel": "mistral-medium",
"TimeoutSeconds": 30,
"MaxRetries": 3
}
}
```
### Error Handling Strategy
1. **Transient errors**: Retry with exponential backoff
2. **Rate limits**: Return 429 to client, suggest retry
3. **Service unavailable**: Return 503, log error
4. **Invalid response**: Validate output, return meaningful error
5. **Timeout**: Return 504, suggest retry
### Caching Strategy
- **Mistral responses**: Cache for 1 hour (stories unlikely to change)
- **TTS audio**: Cache files permanently (regenerate only if text changes)
- **Vosk**: No caching (each audio is unique)
### Gotchas
- Vosk model is ~500MB - ensure enough disk space
- Coqui model is ~1.5GB - ensure enough disk space
- Python processes may have memory leaks - monitor and restart
- AI services may fail silently - implement health checks
- Mistral API has costs - implement budget tracking
- Audio generation can be CPU-intensive - consider separate service
- Different Python versions may have compatibility issues
### File Storage Structure
```
/public/
├── audio/
│ ├── vocabulary/ # Vocabulary word audio
│ │ └── {id}.wav
│ ├── story/ # Story segment audio
│ │ └── {levelId}-{order}.wav
│ └── quiz/ # Quiz question audio
│ └── {questionId}.wav
└── models/ # AI models
├── vosk/
│ └── vosk-model-de-0.22/
└── coqui/
└── tts_models/
```
### Performance Considerations
- TTS generation: ~1-2 seconds per sentence
- Speech recognition: ~1-3 seconds per audio clip
- Mistral API: ~2-5 seconds per request
- Consider async/background processing for batch operations
---
## 📊 Progress History
| Date | Status Change | Notes |
|------|---------------|-------|
| May 31, 2025 | Created | Initial plan based on application-plan.md |
---
## 📎 Related Files & Links
- Architecture: [Backend Structure](../architecture/backend-structure.md)
- Architecture: [Application Plan](../architecture/application-plan.md)
- Feature: [Story Integration](story-integration.md)
- Feature: [Vocabulary System](vocabulary-system.md)
- Feature: [Quiz System](quiz-system.md)
- Reference: [Mistral AI API Docs](https://docs.mistral.ai/)
- Reference: [Vosk Documentation](https://alphacephei.com/vosk/)
- Reference: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS)
- Reference: [vosk-model-de-0.22](https://alphacephei.com/vosk/models)
- Reference: [Coqui German Model](https://github.com/coqui-ai/TTS/wiki/Multilingual-support)
---
*Feature created from application-plan.md*