- Add comprehensive documentation in docs/ (architecture, features, roadmap) - Add german-app-frontend with Vite, TypeScript, ESLint configuration - Add AGENTS.md and .gitignore Generated by Mistral Vibe. Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
896 lines
31 KiB
Markdown
896 lines
31 KiB
Markdown
# Feature: AI Services Integration
|
||
|
||
> **Status**: ⏳ Planned
|
||
> **Priority**: High
|
||
> **Complexity**: High
|
||
> **Estimate**: 10-16 hours
|
||
> **Assignee**: -
|
||
> **Created**: May 31, 2025
|
||
> **Target Completion**: -
|
||
> **PR**: -
|
||
> **Related Features**: Story Integration, Vocabulary System, Quiz System, Lesson Management
|
||
|
||
---
|
||
|
||
## 📌 Overview
|
||
|
||
### Purpose
|
||
Integrate three AI services into the application: Mistral-Medium for text generation (stories, feedback), Vosk for speech recognition (speaking exercises), and Coqui TTS for text-to-speech (vocabulary, stories, quizzes).
|
||
|
||
### User Story
|
||
As a learner, I want AI-powered features like generated stories, speech recognition for speaking practice, and TTS for audio content so that I can have an immersive and interactive learning experience.
|
||
|
||
### Acceptance Criteria
|
||
- [ ] Mistral-Medium API is integrated for story generation
|
||
- [ ] Mistral-Medium API is integrated for writing feedback
|
||
- [ ] Vosk speech recognition is integrated for speaking exercises
|
||
- [ ] Coqui TTS is integrated for audio generation
|
||
- [ ] All AI services are configurable via appsettings.json
|
||
- [ ] Error handling for AI service failures
|
||
- [ ] Rate limiting/caching for AI API calls
|
||
|
||
---
|
||
|
||
## 📋 Requirements
|
||
|
||
### Functional Requirements
|
||
| ID | Requirement | Priority |
|
||
|----|-------------|----------|
|
||
| FR-001 | Generate stories using Mistral-Medium | High |
|
||
| FR-002 | Generate writing feedback using Mistral-Medium | High |
|
||
| FR-003 | Transcribe speech using Vosk | High |
|
||
| FR-004 | Generate audio using Coqui TTS | High |
|
||
| FR-005 | Configure all services via configuration | High |
|
||
| FR-006 | Handle AI service errors gracefully | High |
|
||
| FR-007 | Cache/rate limit AI API calls | Medium |
|
||
| FR-008 | Validate AI outputs before use | Medium |
|
||
|
||
### Non-Functional Requirements
|
||
- Performance: TTS generation < 2 seconds per sentence
|
||
- Performance: Speech recognition < 3 seconds
|
||
- Performance: AI API calls < 5 seconds
|
||
- Reliability: Services should degrade gracefully on failure
|
||
- Cost: Minimize API call costs (caching, batching)
|
||
|
||
---
|
||
|
||
## 🏗️ Technical Design
|
||
|
||
### Architecture Overview
|
||
```
|
||
┌─────────────────────────────────────────────────────────────┐
|
||
│ AI Services Layer │
|
||
├─────────────────────────────────────────────────────────────┤
|
||
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
|
||
│ │ Mistral-Medium │ │ Vosk │ │ Coqui TTS │ │
|
||
│ │ (Text Gen) │ │ (Speech Recog.) │ │ (Audio Gen) │ │
|
||
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
|
||
│ │ │ │ │
|
||
│ ▼ ▼ ▼ │
|
||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||
│ │ Application Services │ │
|
||
│ │ - StoryGenerationService │ │
|
||
│ │ - WritingFeedbackService │ │
|
||
│ │ - VoskService (Speech Recognition) │ │
|
||
│ │ - TtsService (Text-to-Speech) │ │
|
||
│ └─────────────────────────────────────────────────────────┘ │
|
||
└─────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
### Components Involved
|
||
- **Backend Services**:
|
||
- `IMistralService` / `MistralService` - Text generation
|
||
- `IVoskService` / `VoskService` - Speech recognition
|
||
- `ITtsService` / `TtsService` - Text-to-speech
|
||
- **Configuration**: appsettings.json with AI settings
|
||
- **External Dependencies**:
|
||
- Mistral-Medium API
|
||
- Vosk Python library + German model
|
||
- Coqui TTS Python library + German model
|
||
|
||
### Data Flow
|
||
|
||
#### Story Generation Flow
|
||
```
|
||
1. StoryGenerationService receives request with vocabulary list and level
|
||
2. Service constructs prompt for Mistral-Medium
|
||
3. MistralService sends prompt to Mistral API
|
||
4. Mistral API returns generated story text
|
||
5. StoryGenerationService validates and returns story
|
||
6. StoryService saves story and triggers audio generation
|
||
```
|
||
|
||
#### Speech Recognition Flow
|
||
```
|
||
1. User records speech in frontend
|
||
2. Frontend sends audio file to /api/speech/recognize
|
||
3. VoskService receives audio bytes
|
||
4. VoskService calls Vosk Python CLI with German model
|
||
5. Vosk returns transcribed text
|
||
6. Backend validates transcription and returns to frontend
|
||
```
|
||
|
||
#### TTS Flow
|
||
```
|
||
1. TtsService receives text to synthesize
|
||
2. Service calls Coqui TTS Python CLI
|
||
3. Coqui generates audio file
|
||
4. Audio file saved to filesystem
|
||
5. Audio URL returned to caller
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 Implementation Plan
|
||
|
||
### Phase 1: Configuration & Interfaces (2 hours)
|
||
- [ ] Add AI configuration section to appsettings.json
|
||
- [ ] Create configuration classes (MistralConfig, VoskConfig, CoquiConfig)
|
||
- [ ] Define service interfaces (IMistralService, IVoskService, ITtsService)
|
||
- [ ] Register services in Program.cs
|
||
- [ ] Set up configuration validation
|
||
|
||
### Phase 2: Mistral-Medium Integration (2-3 hours)
|
||
- [ ] Create MistralService implementation
|
||
- [ ] Implement Mistral API client
|
||
- [ ] Create request/response models
|
||
- [ ] Implement retry logic for API calls
|
||
- [ ] Add rate limiting (e.g., max 10 requests/minute)
|
||
- [ ] Add response caching for similar prompts
|
||
- [ ] Create prompt templates for different use cases
|
||
|
||
### Phase 3: Vosk Speech Recognition (2-3 hours)
|
||
- [ ] Create VoskService implementation
|
||
- [ ] Set up Vosk Python environment
|
||
- [ ] Download and configure German model (vosk-model-de-0.22)
|
||
- [ ] Implement audio processing
|
||
- [ ] Handle different audio formats
|
||
- [ ] Add error handling for recognition failures
|
||
- [ ] Create /api/speech/recognize endpoint
|
||
|
||
### Phase 4: Coqui TTS Integration (2-3 hours)
|
||
- [ ] Create TtsService implementation
|
||
- [ ] Set up Coqui TTS Python environment
|
||
- [ ] Download and configure German model
|
||
- [ ] Implement audio generation
|
||
- [ ] Add audio file management (storage, cleanup)
|
||
- [ ] Create audio serving endpoints
|
||
- [ ] Implement batch audio generation
|
||
|
||
### Phase 5: Service Integration (2 hours)
|
||
- [ ] Create StoryGenerationService (uses MistralService)
|
||
- [ ] Create WritingFeedbackService (uses MistralService)
|
||
- [ ] Create SpeechExerciseService (uses VoskService)
|
||
- [ ] Create AudioGenerationService (uses TtsService)
|
||
- [ ] Add health checks for all AI services
|
||
- [ ] Implement fallback mechanisms for service failures
|
||
|
||
### Milestones
|
||
| Milestone | Date | Status |
|
||
|-----------|------|--------|
|
||
| Configuration & Interfaces | - | ⏳ |
|
||
| Mistral Integration | - | ⏳ |
|
||
| Vosk Integration | - | ⏳ |
|
||
| Coqui TTS Integration | - | ⏳ |
|
||
| Service Integration | - | ⏳ |
|
||
|
||
---
|
||
|
||
## ✅ Tasks
|
||
|
||
### Backend - Configuration
|
||
- [ ] Add Mistral settings to appsettings.json
|
||
- [ ] Add Vosk settings to appsettings.json
|
||
- [ ] Add Coqui settings to appsettings.json
|
||
- [ ] Create Configuration/MistralConfig.cs
|
||
- [ ] Create Configuration/VoskConfig.cs
|
||
- [ ] Create Configuration/CoquiConfig.cs
|
||
- [ ] Register all AI services in Program.cs
|
||
- [ ] Add health checks for AI services
|
||
|
||
### Backend - Mistral Service
|
||
- [ ] Create Domain/Interfaces/IMistralService.cs
|
||
- [ ] Create Infrastructure/Services/MistralService.cs
|
||
- [ ] Implement Mistral API client
|
||
- [ ] Create Models/MistralRequest.cs
|
||
- [ ] Create Models/MistralResponse.cs
|
||
- [ ] Add retry logic
|
||
- [ ] Add rate limiting
|
||
- [ ] Add response caching
|
||
- [ ] Write unit tests
|
||
|
||
### Backend - Vosk Service
|
||
- [ ] Create Domain/Interfaces/IVoskService.cs
|
||
- [ ] Create Infrastructure/Services/VoskService.cs
|
||
- [ ] Set up Python process execution
|
||
- [ ] Download and configure vosk-model-de-0.22
|
||
- [ ] Implement audio recognition
|
||
- [ ] Create /api/speech/recognize endpoint
|
||
- [ ] Create Presentation/Controllers/SpeechController.cs
|
||
- [ ] Write unit tests
|
||
|
||
### Backend - Coqui TTS Service
|
||
- [ ] Create Domain/Interfaces/ITtsService.cs
|
||
- [ ] Create Infrastructure/Services/TtsService.cs
|
||
- [ ] Set up Python process execution
|
||
- [ ] Download and configure Coqui German model
|
||
- [ ] Implement audio generation
|
||
- [ ] Create audio file storage mechanism
|
||
- [ ] Create /api/tts/generate endpoint
|
||
- [ ] Create Presentation/Controllers/TtsController.cs
|
||
- [ ] Write unit tests
|
||
|
||
### Backend - Higher-Level Services
|
||
- [ ] Create Application/Services/StoryGenerationService.cs
|
||
- [ ] Create Application/Services/WritingFeedbackService.cs
|
||
- [ ] Integrate with MistralService
|
||
- [ ] Add validation for AI outputs
|
||
- [ ] Write integration tests
|
||
|
||
### Infrastructure Setup
|
||
- [ ] Install Python 3.8+
|
||
- [ ] Install Vosk Python package
|
||
- [ ] Download vosk-model-de-0.22
|
||
- [ ] Install Coqui TTS package
|
||
- [ ] Download Coqui German model
|
||
- [ ] Set up file storage for audio
|
||
- [ ] Configure permissions
|
||
|
||
### Frontend Integration
|
||
- [ ] Create services/speechService.ts
|
||
- [ ] Create services/ttsService.ts
|
||
- [ ] Create services/aiService.ts
|
||
- [ ] Integrate with Recorder component
|
||
- [ ] Integrate with AudioPlayer component
|
||
- [ ] Add error handling for AI failures
|
||
|
||
---
|
||
|
||
## ✅ Definition of Done
|
||
|
||
### General Criteria (All Features)
|
||
- [ ] All acceptance criteria met and verified
|
||
- [ ] All tasks in this document completed
|
||
- [ ] Code follows Clean Architecture principles
|
||
- [ ] Code reviewed and approved by at least 1 team member
|
||
- [ ] All tests passing (unit, integration)
|
||
- [ ] Documentation updated (README, AGENTS.md if applicable)
|
||
- [ ] Feature works in development environment
|
||
- [ ] Feature deployed to staging environment
|
||
- [ ] Performance meets defined targets
|
||
- [ ] Security review completed
|
||
- [ ] No critical bugs or blockers
|
||
|
||
### AI-Specific Criteria
|
||
- [ ] All AI services functional in development
|
||
- [ ] Mistral API integration tested with valid API key
|
||
- [ ] Vosk speech recognition tested with German model
|
||
- [ ] Coqui TTS tested with German model
|
||
- [ ] Error handling tested (invalid inputs, service failures)
|
||
- [ ] Fallback mechanisms implemented and tested
|
||
- [ ] Rate limiting configured and tested
|
||
- [ ] Audio file generation and storage verified
|
||
- [ ] Health checks for all AI services passing
|
||
|
||
---
|
||
|
||
## 🧪 Testing Strategy
|
||
|
||
### Testing Approach
|
||
|
||
| Test Type | Coverage | Tools | Responsibility |
|
||
|-----------|----------|-------|----------------|
|
||
| Unit Tests | 80%+ code coverage | MsTest, Moq | Backend Dev |
|
||
| Integration Tests | All service interactions | MsTest, TestContainers | Backend Dev |
|
||
| API Tests | All endpoints | MsTest, HttpClient | Backend Dev |
|
||
| Frontend Unit Tests | Component logic | Vitest | Frontend Dev |
|
||
| Frontend Integration | Service integration | Vitest | Frontend Dev |
|
||
| E2E Tests | Critical user journeys | Playwright | QA/Dev |
|
||
| Manual Testing | Exploratory, edge cases | BrowserStack | QA |
|
||
| Load Testing | AI service performance | k6/JMeter | DevOps |
|
||
|
||
### AI-Specific Tests
|
||
|
||
#### Mistral Service Tests
|
||
- [ ] Test successful text generation
|
||
- [ ] Test API error handling (429, 500, 503)
|
||
- [ ] Test rate limiting (max requests per minute)
|
||
- [ ] Test response caching
|
||
- [ ] Test retry logic on failures
|
||
- [ ] Test timeout handling
|
||
- [ ] Test invalid API key handling
|
||
|
||
#### Vosk Service Tests
|
||
- [ ] Test successful speech recognition (clear audio)
|
||
- [ ] Test speech recognition with background noise
|
||
- [ ] Test speech recognition with different accents
|
||
- [ ] Test empty audio handling
|
||
- [ ] Test invalid audio format handling
|
||
- [ ] Test Python process failure handling
|
||
- [ ] Test model not found error handling
|
||
- [ ] Test confidence threshold validation
|
||
|
||
#### Coqui TTS Service Tests
|
||
- [ ] Test successful audio generation
|
||
- [ ] Test audio generation with long text
|
||
- [ ] Test audio generation with special characters
|
||
- [ ] Test invalid text handling
|
||
- [ ] Test Python process failure handling
|
||
- [ ] Test model not found error handling
|
||
- [ ] Test audio file format validation
|
||
- [ ] Test audio quality validation
|
||
|
||
### Test Data
|
||
- Sample audio files for Vosk testing (clear German speech, noisy audio, non-German speech)
|
||
- Sample texts for TTS testing (short, long, with special characters, with German umlauts)
|
||
- Sample prompts for Mistral testing (A1, A2, B1 levels)
|
||
|
||
---
|
||
|
||
## 🚨 Risks & Mitigations
|
||
|
||
### Technical Risks
|
||
|
||
| Risk | Likelihood | Impact | Mitigation | Owner |
|
||
|------|------------|--------|------------|-------|
|
||
| Python-.NET integration failures | High | High | Use Process class with proper error handling, implement process pooling, add timeouts | Backend Dev |
|
||
| Vosk model compatibility issues | Medium | High | Test with vosk-model-de-0.22 before implementation, have fallback to vosk-model-small-de-0.15 | Backend Dev |
|
||
| Coqui model quality issues | Medium | Medium | Test with sample German text, have alternative TTS service as fallback | Backend Dev |
|
||
| Mistral API rate limits | High | Medium | Implement caching (1h TTL), request queue, exponential backoff | Backend Dev |
|
||
| Mistral API costs exceed budget | Medium | High | Set budget alerts, implement cost tracking, cache aggressively | Backend Dev |
|
||
| AI services slow performance | High | Medium | Implement async processing, use background jobs for batch operations | Backend Dev |
|
||
| Audio files too large | Medium | Medium | Compress audio (16kHz, mono), implement streaming for large files | Backend Dev |
|
||
| Model files too large for deployment | Medium | Medium | Use Docker volumes, separate storage for models, consider cloud storage | DevOps |
|
||
| Memory leaks in Python processes | Medium | High | Implement process lifecycle management, add memory monitoring, use process pooling | Backend Dev |
|
||
| Different Python versions cause issues | Medium | Medium | Use Docker to pin Python version, document exact version in README | DevOps |
|
||
|
||
### Operational Risks
|
||
|
||
| Risk | Likelihood | Impact | Mitigation | Owner |
|
||
|------|------------|--------|------------|-------|
|
||
| AI service downtime | Medium | High | Implement health checks, circuit breakers, fallback responses | DevOps |
|
||
| Model files corrupted | Low | High | Implement checksum validation, store backups, automated recovery | DevOps |
|
||
| API key exposure | Medium | High | Use GitHub secrets, Azure Key Vault, never commit to repo | Security |
|
||
| Audio storage fills up | Medium | Medium | Implement cleanup job, set size quotas, use cloud storage | DevOps |
|
||
|
||
### Business Risks
|
||
|
||
| Risk | Likelihood | Impact | Mitigation | Owner |
|
||
|------|------------|--------|------------|-------|
|
||
| User data privacy concerns | Medium | High | Anonymize audio before processing, document data handling policy, comply with GDPR | Legal |
|
||
| AI generates inappropriate content | Low | High | Implement content moderation, add user reporting, use system prompts to prevent | Backend Dev |
|
||
| AI services become too expensive | Medium | Medium | Monitor costs, set budget caps, evaluate open-source alternatives | Product |
|
||
|
||
---
|
||
|
||
## 🔗 Dependencies
|
||
|
||
### Feature Dependencies
|
||
- [Infrastructure Setup](infrastructure-setup.md) - Required (backend project)
|
||
|
||
### Technical Dependencies
|
||
- Python 3.8+
|
||
- Vosk Python library
|
||
- vosk-model-de-0.22 (German model)
|
||
- Coqui TTS Python library
|
||
- Coqui German TTS model
|
||
- Mistral-Medium API key
|
||
|
||
### External Services
|
||
| Service | Purpose | Configuration |
|
||
|---------|---------|---------------|
|
||
| Mistral-Medium API | Text generation (stories, feedback) | API key, endpoint URL |
|
||
| Vosk | Speech recognition | Python path, model path |
|
||
| Coqui TTS | Text-to-speech | Python path, model name |
|
||
|
||
### Blockers
|
||
- [ ] Infrastructure Setup must be complete
|
||
- [ ] Python environment must be configured
|
||
- [ ] AI models must be downloaded
|
||
- [ ] Mistral API key must be obtained
|
||
|
||
---
|
||
|
||
## 🔧 Technical Deep Dive: Python-.NET Integration
|
||
|
||
### Integration Patterns
|
||
|
||
#### Option 1: Process.Start (Recommended for MVP)
|
||
```csharp
|
||
// Simple approach - spawn Python process for each request
|
||
public async Task<string> RecognizeSpeechAsync(byte[] audioData)
|
||
{
|
||
var tempFile = Path.GetTempFileName() + ".wav";
|
||
await File.WriteAllBytesAsync(tempFile, audioData);
|
||
|
||
var process = new Process
|
||
{
|
||
StartInfo = new ProcessStartInfo
|
||
{
|
||
FileName = "python",
|
||
Arguments = $"-m vosk.transcribe --model {_modelPath} --input {tempFile}",
|
||
RedirectStandardOutput = true,
|
||
RedirectStandardError = true,
|
||
UseShellExecute = false,
|
||
CreateNoWindow = true,
|
||
// Prevent process from hanging
|
||
EnvironmentVariables = new Dictionary<string, string>
|
||
{
|
||
["PYTHONPATH"] = "/path/to/vosk"
|
||
}
|
||
}
|
||
};
|
||
|
||
process.Start();
|
||
|
||
// Read output with timeout
|
||
var output = await process.StandardOutput.ReadToEndAsync();
|
||
var error = await process.StandardError.ReadToEndAsync();
|
||
|
||
await process.WaitForExitAsync();
|
||
|
||
if (process.ExitCode != 0)
|
||
{
|
||
throw new AiServiceException($"Vosk failed: {error}");
|
||
}
|
||
|
||
return output.Trim();
|
||
}
|
||
```
|
||
|
||
**Pros:** Simple, easy to implement, no additional dependencies
|
||
**Cons:** Process startup overhead (~100-500ms per call), resource-intensive
|
||
|
||
#### Option 2: Process Pooling (Recommended for Production)
|
||
```csharp
|
||
// Maintain a pool of persistent Python processes
|
||
public class PythonProcessPool : IDisposable
|
||
{
|
||
private readonly ConcurrentQueue<Process> _pool = new();
|
||
private readonly SemaphoreSlim _semaphore;
|
||
private readonly string _pythonPath;
|
||
private readonly string _scriptPath;
|
||
|
||
public PythonProcessPool(int size, string pythonPath, string scriptPath)
|
||
{
|
||
_semaphore = new SemaphoreSlim(size);
|
||
_pythonPath = pythonPath;
|
||
_scriptPath = scriptPath;
|
||
|
||
// Pre-warm the pool
|
||
for (int i = 0; i < size; i++)
|
||
{
|
||
_pool.Enqueue(StartProcess());
|
||
}
|
||
}
|
||
|
||
public async Task<string> ExecuteAsync(string input)
|
||
{
|
||
await _semaphore.WaitAsync();
|
||
|
||
if (!_pool.TryDequeue(out var process))
|
||
{
|
||
process = StartProcess();
|
||
}
|
||
|
||
try
|
||
{
|
||
// Send input to stdin
|
||
await process.StandardInput.WriteLineAsync(input);
|
||
await process.StandardInput.FlushAsync();
|
||
|
||
// Read response from stdout
|
||
var response = await process.StandardOutput.ReadLineAsync();
|
||
|
||
return response;
|
||
}
|
||
finally
|
||
{
|
||
_pool.Enqueue(process);
|
||
_semaphore.Release();
|
||
}
|
||
}
|
||
|
||
private Process StartProcess()
|
||
{
|
||
return new Process
|
||
{
|
||
StartInfo = new ProcessStartInfo
|
||
{
|
||
FileName = _pythonPath,
|
||
Arguments = _scriptPath,
|
||
RedirectStandardInput = true,
|
||
RedirectStandardOutput = true,
|
||
RedirectStandardError = true,
|
||
UseShellExecute = false,
|
||
CreateNoWindow = true
|
||
}
|
||
}.Start();
|
||
}
|
||
|
||
public void Dispose()
|
||
{
|
||
foreach (var process in _pool)
|
||
{
|
||
try { process.Kill(); } catch { }
|
||
process.Dispose();
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Pros:** Eliminates process startup overhead, much faster for repeated calls
|
||
**Cons:** More complex, need to handle process lifecycle, stdin/stdout parsing
|
||
|
||
#### Option 3: gRPC (Best for Production)
|
||
- Create Python gRPC server for AI services
|
||
- .NET client calls gRPC methods
|
||
- Single persistent Python process
|
||
- Type-safe, high-performance
|
||
|
||
**Pros:** Best performance, type-safe, production-ready
|
||
**Cons:** Most complex to set up, requires gRPC knowledge
|
||
|
||
### Error Handling Strategy
|
||
|
||
```csharp
|
||
// Comprehensive error handling for AI services
|
||
public async Task<T> ExecuteWithRetryAsync<T>(
|
||
Func<Task<T>> action,
|
||
string operationName,
|
||
int maxRetries = 3,
|
||
TimeSpan? timeout = null)
|
||
{
|
||
var retryCount = 0;
|
||
timeout ??= TimeSpan.FromSeconds(30);
|
||
|
||
while (true)
|
||
{
|
||
try
|
||
{
|
||
using var cts = new CancellationTokenSource(timeout.Value);
|
||
return await action();
|
||
}
|
||
catch (OperationCanceledException) when (retryCount < maxRetries)
|
||
{
|
||
retryCount++;
|
||
var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
|
||
_logger.LogWarning(
|
||
"{Operation} timed out (attempt {Attempt}), retrying in {Delay}s...",
|
||
operationName, retryCount, delay.TotalSeconds);
|
||
await Task.Delay(delay);
|
||
}
|
||
catch (AiServiceException ex) when (IsRetryable(ex) && retryCount < maxRetries)
|
||
{
|
||
retryCount++;
|
||
var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
|
||
_logger.LogWarning(ex,
|
||
"{Operation} failed (attempt {Attempt}), retrying in {Delay}s...",
|
||
operationName, retryCount, delay.TotalSeconds);
|
||
await Task.Delay(delay);
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
_logger.LogError(ex, "{Operation} failed permanently after {Attempts} attempts",
|
||
operationName, retryCount + 1);
|
||
throw new AiServiceException($"{operationName} failed: {ex.Message}", ex);
|
||
}
|
||
}
|
||
|
||
bool IsRetryable(AiServiceException ex) =>
|
||
ex.ErrorCode switch
|
||
{
|
||
AiErrorCode.RateLimited => true,
|
||
AiErrorCode.Temporary => true,
|
||
AiErrorCode.Timeout => true,
|
||
_ => false
|
||
};
|
||
}
|
||
```
|
||
|
||
### Health Check Implementation
|
||
|
||
```csharp
|
||
// Health check for AI services
|
||
public class AiServicesHealthCheck : IHealthCheck
|
||
{
|
||
private readonly IMistralService _mistral;
|
||
private readonly IVoskService _vosk;
|
||
private readonly ITtsService _tts;
|
||
|
||
public async Task<HealthCheckResult> CheckHealthAsync(
|
||
HealthCheckContext context,
|
||
CancellationToken cancellationToken = default)
|
||
{
|
||
var checks = new Dictionary<string, HealthStatus>();
|
||
|
||
// Check Mistral
|
||
try
|
||
{
|
||
await _mistral.TestConnectionAsync(cancellationToken);
|
||
checks["Mistral"] = HealthStatus.Healthy;
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
checks["Mistral"] = HealthStatus.Unhealthy;
|
||
}
|
||
|
||
// Check Vosk
|
||
try
|
||
{
|
||
await _vosk.TestModelAsync(cancellationToken);
|
||
checks["Vosk"] = HealthStatus.Healthy;
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
checks["Vosk"] = HealthStatus.Unhealthy;
|
||
}
|
||
|
||
// Check Coqui TTS
|
||
try
|
||
{
|
||
await _tts.TestModelAsync(cancellationToken);
|
||
checks["Coqui TTS"] = HealthStatus.Healthy;
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
checks["Coqui TTS"] = HealthStatus.Unhealthy;
|
||
}
|
||
|
||
var allHealthy = checks.Values.All(s => s == HealthStatus.Healthy);
|
||
var status = allHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy;
|
||
|
||
return new HealthCheckResult(
|
||
status,
|
||
"AI Services health check",
|
||
data: checks);
|
||
}
|
||
}
|
||
```
|
||
|
||
### Audio File Management
|
||
|
||
```csharp
|
||
// Audio file storage service
|
||
public class AudioFileService
|
||
{
|
||
private readonly string _basePath;
|
||
private readonly ILogger<AudioFileService> _logger;
|
||
|
||
public AudioFileService(IConfiguration config, ILogger<AudioFileService> logger)
|
||
{
|
||
_basePath = config["Audio:StoragePath"] ?? "/var/audio";
|
||
_logger = logger;
|
||
|
||
Directory.CreateDirectory(_basePath);
|
||
}
|
||
|
||
public async Task<string> SaveAudioAsync(byte[] audioData, string category, int entityId)
|
||
{
|
||
// Validate audio data
|
||
if (audioData == null || audioData.Length == 0)
|
||
throw new ArgumentException("Audio data cannot be empty");
|
||
|
||
if (audioData.Length > 10 * 1024 * 1024) // 10MB limit
|
||
throw new ArgumentException("Audio file too large");
|
||
|
||
// Create category directory
|
||
var categoryPath = Path.Combine(_basePath, category);
|
||
Directory.CreateDirectory(categoryPath);
|
||
|
||
// Generate unique filename
|
||
var extension = ".wav"; // or detect from data
|
||
var filename = $"{entityId}{extension}";
|
||
var fullPath = Path.Combine(categoryPath, filename);
|
||
|
||
// Check for existing file
|
||
if (File.Exists(fullPath))
|
||
File.Delete(fullPath);
|
||
|
||
// Save file
|
||
await File.WriteAllBytesAsync(fullPath, audioData);
|
||
|
||
// Return relative path
|
||
return $"/audio/{category}/{filename}";
|
||
}
|
||
|
||
public async Task CleanupOldFilesAsync(TimeSpan olderThan)
|
||
{
|
||
var cutoff = DateTime.UtcNow - olderThan;
|
||
|
||
foreach (var categoryDir in Directory.GetDirectories(_basePath))
|
||
{
|
||
foreach (var file in Directory.GetFiles(categoryDir))
|
||
{
|
||
var fileInfo = new FileInfo(file);
|
||
if (fileInfo.LastWriteTimeUtc < cutoff)
|
||
{
|
||
try
|
||
{
|
||
File.Delete(file);
|
||
_logger.LogInformation("Deleted old audio file: {File}", file);
|
||
}
|
||
catch (Exception ex)
|
||
{
|
||
_logger.LogError(ex, "Failed to delete audio file: {File}", file);
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Rate Limiting Implementation
|
||
|
||
```csharp
|
||
// Rate limiter for AI services
|
||
public class AiRateLimiter
|
||
{
|
||
private readonly ConcurrentDictionary<string, RateLimitEntry> _limits = new();
|
||
private readonly int _maxRequests;
|
||
private readonly TimeSpan _window;
|
||
|
||
public AiRateLimiter(int maxRequestsPerWindow, TimeSpan window)
|
||
{
|
||
_maxRequests = maxRequestsPerWindow;
|
||
_window = window;
|
||
}
|
||
|
||
public bool TryAcquire(string serviceName)
|
||
{
|
||
var now = DateTime.UtcNow;
|
||
|
||
var entry = _limits.GetOrAdd(serviceName, _ => new RateLimitEntry());
|
||
|
||
lock (entry)
|
||
{
|
||
// Remove old requests
|
||
entry.Requests.RemoveAll(r => now - r > _window);
|
||
|
||
// Check if limit exceeded
|
||
if (entry.Requests.Count >= _maxRequests)
|
||
return false;
|
||
|
||
// Add new request
|
||
entry.Requests.Add(now);
|
||
return true;
|
||
}
|
||
}
|
||
|
||
private class RateLimitEntry
|
||
{
|
||
public List<DateTime> Requests { get; } = new();
|
||
}
|
||
}
|
||
|
||
// Usage in controller
|
||
[HttpPost("recognize")]
|
||
public async Task<IActionResult> RecognizeSpeech([FromBody] AudioRequest request)
|
||
{
|
||
if (!_rateLimiter.TryAcquire("Vosk"))
|
||
{
|
||
return StatusCode(429, "Too many requests");
|
||
}
|
||
|
||
// ... process request
|
||
}
|
||
```
|
||
|
||
## 📝 Notes & Decisions
|
||
| Date | Decision | Rationale |
|
||
|------|----------|-----------|
|
||
| May 31, 2025 | Use Mistral-Medium | Best balance of quality and cost for this use case |
|
||
| May 31, 2025 | Use Vosk for speech recognition | Open-source, supports German, self-hostable |
|
||
| May 31, 2025 | Use Coqui TTS | Open-source, good quality, supports German |
|
||
| May 31, 2025 | Self-host AI services | More control, no external API dependencies (except Mistral) |
|
||
| May 31, 2025 | Use Python CLI wrappers | Easier integration with .NET, well-supported libraries |
|
||
|
||
### Technical Notes
|
||
|
||
#### Vosk Configuration
|
||
```json
|
||
{
|
||
"Vosk": {
|
||
"PythonPath": "/usr/bin/python3",
|
||
"ModelPath": "/models/vosk-model-de-0.22",
|
||
"SampleRate": 16000
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Coqui TTS Configuration
|
||
```json
|
||
{
|
||
"Coqui": {
|
||
"PythonPath": "/usr/bin/python3",
|
||
"ModelName": "tts_models/de/deu/fairseq/vits",
|
||
"AudioOutputFormat": "wav",
|
||
"SampleRate": 22050
|
||
}
|
||
}
|
||
```
|
||
|
||
#### Mistral Configuration
|
||
```json
|
||
{
|
||
"Mistral": {
|
||
"ApiKey": "your-api-key",
|
||
"BaseUrl": "https://api.mistral.ai/v1/",
|
||
"DefaultModel": "mistral-medium",
|
||
"TimeoutSeconds": 30,
|
||
"MaxRetries": 3
|
||
}
|
||
}
|
||
```
|
||
|
||
### Error Handling Strategy
|
||
1. **Transient errors**: Retry with exponential backoff
|
||
2. **Rate limits**: Return 429 to client, suggest retry
|
||
3. **Service unavailable**: Return 503, log error
|
||
4. **Invalid response**: Validate output, return meaningful error
|
||
5. **Timeout**: Return 504, suggest retry
|
||
|
||
### Caching Strategy
|
||
- **Mistral responses**: Cache for 1 hour (stories unlikely to change)
|
||
- **TTS audio**: Cache files permanently (regenerate only if text changes)
|
||
- **Vosk**: No caching (each audio is unique)
|
||
|
||
### Gotchas
|
||
- ⚠️ Vosk model is ~500MB - ensure enough disk space
|
||
- ⚠️ Coqui model is ~1.5GB - ensure enough disk space
|
||
- ⚠️ Python processes may have memory leaks - monitor and restart
|
||
- ⚠️ AI services may fail silently - implement health checks
|
||
- ⚠️ Mistral API has costs - implement budget tracking
|
||
- ⚠️ Audio generation can be CPU-intensive - consider separate service
|
||
- ⚠️ Different Python versions may have compatibility issues
|
||
|
||
### File Storage Structure
|
||
```
|
||
/public/
|
||
├── audio/
|
||
│ ├── vocabulary/ # Vocabulary word audio
|
||
│ │ └── {id}.wav
|
||
│ ├── story/ # Story segment audio
|
||
│ │ └── {levelId}-{order}.wav
|
||
│ └── quiz/ # Quiz question audio
|
||
│ └── {questionId}.wav
|
||
└── models/ # AI models
|
||
├── vosk/
|
||
│ └── vosk-model-de-0.22/
|
||
└── coqui/
|
||
└── tts_models/
|
||
```
|
||
|
||
### Performance Considerations
|
||
- TTS generation: ~1-2 seconds per sentence
|
||
- Speech recognition: ~1-3 seconds per audio clip
|
||
- Mistral API: ~2-5 seconds per request
|
||
- Consider async/background processing for batch operations
|
||
|
||
---
|
||
|
||
## 📊 Progress History
|
||
|
||
| Date | Status Change | Notes |
|
||
|------|---------------|-------|
|
||
| May 31, 2025 | Created | Initial plan based on application-plan.md |
|
||
|
||
---
|
||
|
||
## 📎 Related Files & Links
|
||
|
||
- Architecture: [Backend Structure](../architecture/backend-structure.md)
|
||
- Architecture: [Application Plan](../architecture/application-plan.md)
|
||
- Feature: [Story Integration](story-integration.md)
|
||
- Feature: [Vocabulary System](vocabulary-system.md)
|
||
- Feature: [Quiz System](quiz-system.md)
|
||
- Reference: [Mistral AI API Docs](https://docs.mistral.ai/)
|
||
- Reference: [Vosk Documentation](https://alphacephei.com/vosk/)
|
||
- Reference: [Coqui TTS GitHub](https://github.com/coqui-ai/TTS)
|
||
- Reference: [vosk-model-de-0.22](https://alphacephei.com/vosk/models)
|
||
- Reference: [Coqui German Model](https://github.com/coqui-ai/TTS/wiki/Multilingual-support)
|
||
|
||
---
|
||
|
||
*Feature created from application-plan.md*
|