Lasse Rune Hansen 76e8af4987 Add complete solution: documentation, frontend, and project files

- Add comprehensive documentation in docs/ (architecture, features, roadmap)
- Add german-app-frontend with Vite, TypeScript, ESLint configuration
- Add AGENTS.md and .gitignore

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>

2026-05-31 18:20:53 +02:00

31 KiB

Raw Permalink Blame History

Feature: AI Services Integration

Status: ⏳ Planned
Priority: High
Complexity: High
Estimate: 10-16 hours
Assignee: -
Created: May 31, 2025
Target Completion: -
PR: -
Related Features: Story Integration, Vocabulary System, Quiz System, Lesson Management

📌 Overview

Purpose

Integrate three AI services into the application: Mistral-Medium for text generation (stories, feedback), Vosk for speech recognition (speaking exercises), and Coqui TTS for text-to-speech (vocabulary, stories, quizzes).

User Story

As a learner, I want AI-powered features like generated stories, speech recognition for speaking practice, and TTS for audio content so that I can have an immersive and interactive learning experience.

Acceptance Criteria

Mistral-Medium API is integrated for story generation
Mistral-Medium API is integrated for writing feedback
Vosk speech recognition is integrated for speaking exercises
Coqui TTS is integrated for audio generation
All AI services are configurable via appsettings.json
Error handling for AI service failures
Rate limiting/caching for AI API calls

📋 Requirements

Functional Requirements

ID	Requirement	Priority
FR-001	Generate stories using Mistral-Medium	High
FR-002	Generate writing feedback using Mistral-Medium	High
FR-003	Transcribe speech using Vosk	High
FR-004	Generate audio using Coqui TTS	High
FR-005	Configure all services via configuration	High
FR-006	Handle AI service errors gracefully	High
FR-007	Cache/rate limit AI API calls	Medium
FR-008	Validate AI outputs before use	Medium

Non-Functional Requirements

Performance: TTS generation < 2 seconds per sentence
Performance: Speech recognition < 3 seconds
Performance: AI API calls < 5 seconds
Reliability: Services should degrade gracefully on failure
Cost: Minimize API call costs (caching, batching)

🏗️ Technical Design

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    AI Services Layer                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  Mistral-Medium  │  │      Vosk       │  │    Coqui TTS    │  │
│  │   (Text Gen)    │  │ (Speech Recog.) │  │   (Audio Gen)   │  │
│  └────────┬────────┘  └────────┬────────┘  └────────┬────────┘  │
│           │                     │                    │         │
│           ▼                     ▼                    ▼         │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │              Application Services                         │  │
│  │  - StoryGenerationService                                │  │
│  │  - WritingFeedbackService                                 │  │
│  │  - VoskService (Speech Recognition)                      │  │
│  │  - TtsService (Text-to-Speech)                           │  │
│  └─────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Components Involved

Backend Services:
- IMistralService / MistralService - Text generation
- IVoskService / VoskService - Speech recognition
- ITtsService / TtsService - Text-to-speech
Configuration: appsettings.json with AI settings
External Dependencies:
- Mistral-Medium API
- Vosk Python library + German model
- Coqui TTS Python library + German model

Data Flow

Story Generation Flow

1. StoryGenerationService receives request with vocabulary list and level
2. Service constructs prompt for Mistral-Medium
3. MistralService sends prompt to Mistral API
4. Mistral API returns generated story text
5. StoryGenerationService validates and returns story
6. StoryService saves story and triggers audio generation

Speech Recognition Flow

1. User records speech in frontend
2. Frontend sends audio file to /api/speech/recognize
3. VoskService receives audio bytes
4. VoskService calls Vosk Python CLI with German model
5. Vosk returns transcribed text
6. Backend validates transcription and returns to frontend

TTS Flow

1. TtsService receives text to synthesize
2. Service calls Coqui TTS Python CLI
3. Coqui generates audio file
4. Audio file saved to filesystem
5. Audio URL returned to caller

🚀 Implementation Plan

Phase 1: Configuration & Interfaces (2 hours)

Add AI configuration section to appsettings.json
Create configuration classes (MistralConfig, VoskConfig, CoquiConfig)
Define service interfaces (IMistralService, IVoskService, ITtsService)
Register services in Program.cs
Set up configuration validation

Phase 2: Mistral-Medium Integration (2-3 hours)

Create MistralService implementation
Implement Mistral API client
Create request/response models
Implement retry logic for API calls
Add rate limiting (e.g., max 10 requests/minute)
Add response caching for similar prompts
Create prompt templates for different use cases

Phase 3: Vosk Speech Recognition (2-3 hours)

Create VoskService implementation
Set up Vosk Python environment
Download and configure German model (vosk-model-de-0.22)
Implement audio processing
Handle different audio formats
Add error handling for recognition failures
Create /api/speech/recognize endpoint

Phase 4: Coqui TTS Integration (2-3 hours)

Create TtsService implementation
Set up Coqui TTS Python environment
Download and configure German model
Implement audio generation
Add audio file management (storage, cleanup)
Create audio serving endpoints
Implement batch audio generation

Phase 5: Service Integration (2 hours)

Create StoryGenerationService (uses MistralService)
Create WritingFeedbackService (uses MistralService)
Create SpeechExerciseService (uses VoskService)
Create AudioGenerationService (uses TtsService)
Add health checks for all AI services
Implement fallback mechanisms for service failures

Milestones

Milestone	Date	Status
Configuration & Interfaces	-	⏳
Mistral Integration	-	⏳
Vosk Integration	-	⏳
Coqui TTS Integration	-	⏳
Service Integration	-	⏳

✅ Tasks

Backend - Configuration

Add Mistral settings to appsettings.json
Add Vosk settings to appsettings.json
Add Coqui settings to appsettings.json
Create Configuration/MistralConfig.cs
Create Configuration/VoskConfig.cs
Create Configuration/CoquiConfig.cs
Register all AI services in Program.cs
Add health checks for AI services

Backend - Mistral Service

Create Domain/Interfaces/IMistralService.cs
Create Infrastructure/Services/MistralService.cs
Implement Mistral API client
Create Models/MistralRequest.cs
Create Models/MistralResponse.cs
Add retry logic
Add rate limiting
Add response caching
Write unit tests

Backend - Vosk Service

Create Domain/Interfaces/IVoskService.cs
Create Infrastructure/Services/VoskService.cs
Set up Python process execution
Download and configure vosk-model-de-0.22
Implement audio recognition
Create /api/speech/recognize endpoint
Create Presentation/Controllers/SpeechController.cs
Write unit tests

Backend - Coqui TTS Service

Create Domain/Interfaces/ITtsService.cs
Create Infrastructure/Services/TtsService.cs
Set up Python process execution
Download and configure Coqui German model
Implement audio generation
Create audio file storage mechanism
Create /api/tts/generate endpoint
Create Presentation/Controllers/TtsController.cs
Write unit tests

Backend - Higher-Level Services

Create Application/Services/StoryGenerationService.cs
Create Application/Services/WritingFeedbackService.cs
Integrate with MistralService
Add validation for AI outputs
Write integration tests

Infrastructure Setup

Install Python 3.8+
Install Vosk Python package
Download vosk-model-de-0.22
Install Coqui TTS package
Download Coqui German model
Set up file storage for audio
Configure permissions

Frontend Integration

Create services/speechService.ts
Create services/ttsService.ts
Create services/aiService.ts
Integrate with Recorder component
Integrate with AudioPlayer component
Add error handling for AI failures

✅ Definition of Done

General Criteria (All Features)

All acceptance criteria met and verified
All tasks in this document completed
Code follows Clean Architecture principles
Code reviewed and approved by at least 1 team member
All tests passing (unit, integration)
Documentation updated (README, AGENTS.md if applicable)
Feature works in development environment
Feature deployed to staging environment
Performance meets defined targets
Security review completed
No critical bugs or blockers

AI-Specific Criteria

All AI services functional in development
Mistral API integration tested with valid API key
Vosk speech recognition tested with German model
Coqui TTS tested with German model
Error handling tested (invalid inputs, service failures)
Fallback mechanisms implemented and tested
Rate limiting configured and tested
Audio file generation and storage verified
Health checks for all AI services passing

🧪 Testing Strategy

Testing Approach

Test Type	Coverage	Tools	Responsibility
Unit Tests	80%+ code coverage	MsTest, Moq	Backend Dev
Integration Tests	All service interactions	MsTest, TestContainers	Backend Dev
API Tests	All endpoints	MsTest, HttpClient	Backend Dev
Frontend Unit Tests	Component logic	Vitest	Frontend Dev
Frontend Integration	Service integration	Vitest	Frontend Dev
E2E Tests	Critical user journeys	Playwright	QA/Dev
Manual Testing	Exploratory, edge cases	BrowserStack	QA
Load Testing	AI service performance	k6/JMeter	DevOps

AI-Specific Tests

Mistral Service Tests

Test successful text generation
Test API error handling (429, 500, 503)
Test rate limiting (max requests per minute)
Test response caching
Test retry logic on failures
Test timeout handling
Test invalid API key handling

Vosk Service Tests

Test successful speech recognition (clear audio)
Test speech recognition with background noise
Test speech recognition with different accents
Test empty audio handling
Test invalid audio format handling
Test Python process failure handling
Test model not found error handling
Test confidence threshold validation

Coqui TTS Service Tests

Test successful audio generation
Test audio generation with long text
Test audio generation with special characters
Test invalid text handling
Test Python process failure handling
Test model not found error handling
Test audio file format validation
Test audio quality validation

Test Data

Sample audio files for Vosk testing (clear German speech, noisy audio, non-German speech)
Sample texts for TTS testing (short, long, with special characters, with German umlauts)
Sample prompts for Mistral testing (A1, A2, B1 levels)

🚨 Risks & Mitigations

Technical Risks

Risk	Likelihood	Impact	Mitigation	Owner
Python-.NET integration failures	High	High	Use Process class with proper error handling, implement process pooling, add timeouts	Backend Dev
Vosk model compatibility issues	Medium	High	Test with vosk-model-de-0.22 before implementation, have fallback to vosk-model-small-de-0.15	Backend Dev
Coqui model quality issues	Medium	Medium	Test with sample German text, have alternative TTS service as fallback	Backend Dev
Mistral API rate limits	High	Medium	Implement caching (1h TTL), request queue, exponential backoff	Backend Dev
Mistral API costs exceed budget	Medium	High	Set budget alerts, implement cost tracking, cache aggressively	Backend Dev
AI services slow performance	High	Medium	Implement async processing, use background jobs for batch operations	Backend Dev
Audio files too large	Medium	Medium	Compress audio (16kHz, mono), implement streaming for large files	Backend Dev
Model files too large for deployment	Medium	Medium	Use Docker volumes, separate storage for models, consider cloud storage	DevOps
Memory leaks in Python processes	Medium	High	Implement process lifecycle management, add memory monitoring, use process pooling	Backend Dev
Different Python versions cause issues	Medium	Medium	Use Docker to pin Python version, document exact version in README	DevOps

Operational Risks

Risk	Likelihood	Impact	Mitigation	Owner
AI service downtime	Medium	High	Implement health checks, circuit breakers, fallback responses	DevOps
Model files corrupted	Low	High	Implement checksum validation, store backups, automated recovery	DevOps
API key exposure	Medium	High	Use GitHub secrets, Azure Key Vault, never commit to repo	Security
Audio storage fills up	Medium	Medium	Implement cleanup job, set size quotas, use cloud storage	DevOps

Business Risks

Risk	Likelihood	Impact	Mitigation	Owner
User data privacy concerns	Medium	High	Anonymize audio before processing, document data handling policy, comply with GDPR	Legal
AI generates inappropriate content	Low	High	Implement content moderation, add user reporting, use system prompts to prevent	Backend Dev
AI services become too expensive	Medium	Medium	Monitor costs, set budget caps, evaluate open-source alternatives	Product

🔗 Dependencies

Feature Dependencies

Infrastructure Setup - Required (backend project)

Technical Dependencies

Python 3.8+
Vosk Python library
vosk-model-de-0.22 (German model)
Coqui TTS Python library
Coqui German TTS model
Mistral-Medium API key

External Services

Service	Purpose	Configuration
Mistral-Medium API	Text generation (stories, feedback)	API key, endpoint URL
Vosk	Speech recognition	Python path, model path
Coqui TTS	Text-to-speech	Python path, model name

Blockers

Infrastructure Setup must be complete
Python environment must be configured
AI models must be downloaded
Mistral API key must be obtained

🔧 Technical Deep Dive: Python-.NET Integration

Integration Patterns

Option 1: Process.Start (Recommended for MVP)

// Simple approach - spawn Python process for each request
public async Task<string> RecognizeSpeechAsync(byte[] audioData)
{
    var tempFile = Path.GetTempFileName() + ".wav";
    await File.WriteAllBytesAsync(tempFile, audioData);
    
    var process = new Process
    {
        StartInfo = new ProcessStartInfo
        {
            FileName = "python",
            Arguments = $"-m vosk.transcribe --model {_modelPath} --input {tempFile}",
            RedirectStandardOutput = true,
            RedirectStandardError = true,
            UseShellExecute = false,
            CreateNoWindow = true,
            // Prevent process from hanging
            EnvironmentVariables = new Dictionary<string, string>
            {
                ["PYTHONPATH"] = "/path/to/vosk"
            }
        }
    };
    
    process.Start();
    
    // Read output with timeout
    var output = await process.StandardOutput.ReadToEndAsync();
    var error = await process.StandardError.ReadToEndAsync();
    
    await process.WaitForExitAsync();
    
    if (process.ExitCode != 0)
    {
        throw new AiServiceException($"Vosk failed: {error}");
    }
    
    return output.Trim();
}

Pros: Simple, easy to implement, no additional dependencies
Cons: Process startup overhead (~100-500ms per call), resource-intensive

Option 2: Process Pooling (Recommended for Production)

// Maintain a pool of persistent Python processes
public class PythonProcessPool : IDisposable
{
    private readonly ConcurrentQueue<Process> _pool = new();
    private readonly SemaphoreSlim _semaphore;
    private readonly string _pythonPath;
    private readonly string _scriptPath;
    
    public PythonProcessPool(int size, string pythonPath, string scriptPath)
    {
        _semaphore = new SemaphoreSlim(size);
        _pythonPath = pythonPath;
        _scriptPath = scriptPath;
        
        // Pre-warm the pool
        for (int i = 0; i < size; i++)
        {
            _pool.Enqueue(StartProcess());
        }
    }
    
    public async Task<string> ExecuteAsync(string input)
    {
        await _semaphore.WaitAsync();
        
        if (!_pool.TryDequeue(out var process))
        {
            process = StartProcess();
        }
        
        try
        {
            // Send input to stdin
            await process.StandardInput.WriteLineAsync(input);
            await process.StandardInput.FlushAsync();
            
            // Read response from stdout
            var response = await process.StandardOutput.ReadLineAsync();
            
            return response;
        }
        finally
        {
            _pool.Enqueue(process);
            _semaphore.Release();
        }
    }
    
    private Process StartProcess()
    {
        return new Process
        {
            StartInfo = new ProcessStartInfo
            {
                FileName = _pythonPath,
                Arguments = _scriptPath,
                RedirectStandardInput = true,
                RedirectStandardOutput = true,
                RedirectStandardError = true,
                UseShellExecute = false,
                CreateNoWindow = true
            }
        }.Start();
    }
    
    public void Dispose()
    {
        foreach (var process in _pool)
        {
            try { process.Kill(); } catch { }
            process.Dispose();
        }
    }
}

Pros: Eliminates process startup overhead, much faster for repeated calls
Cons: More complex, need to handle process lifecycle, stdin/stdout parsing

Option 3: gRPC (Best for Production)

Create Python gRPC server for AI services
.NET client calls gRPC methods
Single persistent Python process
Type-safe, high-performance

Pros: Best performance, type-safe, production-ready
Cons: Most complex to set up, requires gRPC knowledge

Error Handling Strategy

// Comprehensive error handling for AI services
public async Task<T> ExecuteWithRetryAsync<T>(
    Func<Task<T>> action,
    string operationName,
    int maxRetries = 3,
    TimeSpan? timeout = null)
{
    var retryCount = 0;
    timeout ??= TimeSpan.FromSeconds(30);
    
    while (true)
    {
        try
        {
            using var cts = new CancellationTokenSource(timeout.Value);
            return await action();
        }
        catch (OperationCanceledException) when (retryCount < maxRetries)
        {
            retryCount++;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            _logger.LogWarning(
                "{Operation} timed out (attempt {Attempt}), retrying in {Delay}s...",
                operationName, retryCount, delay.TotalSeconds);
            await Task.Delay(delay);
        }
        catch (AiServiceException ex) when (IsRetryable(ex) && retryCount < maxRetries)
        {
            retryCount++;
            var delay = TimeSpan.FromSeconds(Math.Pow(2, retryCount));
            _logger.LogWarning(ex, 
                "{Operation} failed (attempt {Attempt}), retrying in {Delay}s...",
                operationName, retryCount, delay.TotalSeconds);
            await Task.Delay(delay);
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "{Operation} failed permanently after {Attempts} attempts",
                operationName, retryCount + 1);
            throw new AiServiceException($"{operationName} failed: {ex.Message}", ex);
        }
    }
    
    bool IsRetryable(AiServiceException ex) => 
        ex.ErrorCode switch
        {
            AiErrorCode.RateLimited => true,
            AiErrorCode.Temporary => true,
            AiErrorCode.Timeout => true,
            _ => false
        };
}

Health Check Implementation

// Health check for AI services
public class AiServicesHealthCheck : IHealthCheck
{
    private readonly IMistralService _mistral;
    private readonly IVoskService _vosk;
    private readonly ITtsService _tts;
    
    public async Task<HealthCheckResult> CheckHealthAsync(
        HealthCheckContext context, 
        CancellationToken cancellationToken = default)
    {
        var checks = new Dictionary<string, HealthStatus>();
        
        // Check Mistral
        try
        {
            await _mistral.TestConnectionAsync(cancellationToken);
            checks["Mistral"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Mistral"] = HealthStatus.Unhealthy;
        }
        
        // Check Vosk
        try
        {
            await _vosk.TestModelAsync(cancellationToken);
            checks["Vosk"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Vosk"] = HealthStatus.Unhealthy;
        }
        
        // Check Coqui TTS
        try
        {
            await _tts.TestModelAsync(cancellationToken);
            checks["Coqui TTS"] = HealthStatus.Healthy;
        }
        catch (Exception ex)
        {
            checks["Coqui TTS"] = HealthStatus.Unhealthy;
        }
        
        var allHealthy = checks.Values.All(s => s == HealthStatus.Healthy);
        var status = allHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy;
        
        return new HealthCheckResult(
            status,
            "AI Services health check",
            data: checks);
    }
}

Audio File Management

// Audio file storage service
public class AudioFileService
{
    private readonly string _basePath;
    private readonly ILogger<AudioFileService> _logger;
    
    public AudioFileService(IConfiguration config, ILogger<AudioFileService> logger)
    {
        _basePath = config["Audio:StoragePath"] ?? "/var/audio";
        _logger = logger;
        
        Directory.CreateDirectory(_basePath);
    }
    
    public async Task<string> SaveAudioAsync(byte[] audioData, string category, int entityId)
    {
        // Validate audio data
        if (audioData == null || audioData.Length == 0)
            throw new ArgumentException("Audio data cannot be empty");
        
        if (audioData.Length > 10 * 1024 * 1024) // 10MB limit
            throw new ArgumentException("Audio file too large");
        
        // Create category directory
        var categoryPath = Path.Combine(_basePath, category);
        Directory.CreateDirectory(categoryPath);
        
        // Generate unique filename
        var extension = ".wav"; // or detect from data
        var filename = $"{entityId}{extension}";
        var fullPath = Path.Combine(categoryPath, filename);
        
        // Check for existing file
        if (File.Exists(fullPath))
            File.Delete(fullPath);
        
        // Save file
        await File.WriteAllBytesAsync(fullPath, audioData);
        
        // Return relative path
        return $"/audio/{category}/{filename}";
    }
    
    public async Task CleanupOldFilesAsync(TimeSpan olderThan)
    {
        var cutoff = DateTime.UtcNow - olderThan;
        
        foreach (var categoryDir in Directory.GetDirectories(_basePath))
        {
            foreach (var file in Directory.GetFiles(categoryDir))
            {
                var fileInfo = new FileInfo(file);
                if (fileInfo.LastWriteTimeUtc < cutoff)
                {
                    try
                    {
                        File.Delete(file);
                        _logger.LogInformation("Deleted old audio file: {File}", file);
                    }
                    catch (Exception ex)
                    {
                        _logger.LogError(ex, "Failed to delete audio file: {File}", file);
                    }
                }
            }
        }
    }
}

Rate Limiting Implementation

// Rate limiter for AI services
public class AiRateLimiter
{
    private readonly ConcurrentDictionary<string, RateLimitEntry> _limits = new();
    private readonly int _maxRequests;
    private readonly TimeSpan _window;
    
    public AiRateLimiter(int maxRequestsPerWindow, TimeSpan window)
    {
        _maxRequests = maxRequestsPerWindow;
        _window = window;
    }
    
    public bool TryAcquire(string serviceName)
    {
        var now = DateTime.UtcNow;
        
        var entry = _limits.GetOrAdd(serviceName, _ => new RateLimitEntry());
        
        lock (entry)
        {
            // Remove old requests
            entry.Requests.RemoveAll(r => now - r > _window);
            
            // Check if limit exceeded
            if (entry.Requests.Count >= _maxRequests)
                return false;
            
            // Add new request
            entry.Requests.Add(now);
            return true;
        }
    }
    
    private class RateLimitEntry
    {
        public List<DateTime> Requests { get; } = new();
    }
}

// Usage in controller
[HttpPost("recognize")]
public async Task<IActionResult> RecognizeSpeech([FromBody] AudioRequest request)
{
    if (!_rateLimiter.TryAcquire("Vosk"))
    {
        return StatusCode(429, "Too many requests");
    }
    
    // ... process request
}

📝 Notes & Decisions

Date	Decision	Rationale
May 31, 2025	Use Mistral-Medium	Best balance of quality and cost for this use case
May 31, 2025	Use Vosk for speech recognition	Open-source, supports German, self-hostable
May 31, 2025	Use Coqui TTS	Open-source, good quality, supports German
May 31, 2025	Self-host AI services	More control, no external API dependencies (except Mistral)
May 31, 2025	Use Python CLI wrappers	Easier integration with .NET, well-supported libraries

Technical Notes

Vosk Configuration

{
  "Vosk": {
    "PythonPath": "/usr/bin/python3",
    "ModelPath": "/models/vosk-model-de-0.22",
    "SampleRate": 16000
  }
}

Coqui TTS Configuration

{
  "Coqui": {
    "PythonPath": "/usr/bin/python3",
    "ModelName": "tts_models/de/deu/fairseq/vits",
    "AudioOutputFormat": "wav",
    "SampleRate": 22050
  }
}

Mistral Configuration

{
  "Mistral": {
    "ApiKey": "your-api-key",
    "BaseUrl": "https://api.mistral.ai/v1/",
    "DefaultModel": "mistral-medium",
    "TimeoutSeconds": 30,
    "MaxRetries": 3
  }
}

Error Handling Strategy

Transient errors: Retry with exponential backoff
Rate limits: Return 429 to client, suggest retry
Service unavailable: Return 503, log error
Invalid response: Validate output, return meaningful error
Timeout: Return 504, suggest retry

Caching Strategy

Mistral responses: Cache for 1 hour (stories unlikely to change)
TTS audio: Cache files permanently (regenerate only if text changes)
Vosk: No caching (each audio is unique)

Gotchas

⚠️ Vosk model is ~500MB - ensure enough disk space
⚠️ Coqui model is ~1.5GB - ensure enough disk space
⚠️ Python processes may have memory leaks - monitor and restart
⚠️ AI services may fail silently - implement health checks
⚠️ Mistral API has costs - implement budget tracking
⚠️ Audio generation can be CPU-intensive - consider separate service
⚠️ Different Python versions may have compatibility issues

File Storage Structure

/public/
├── audio/
│   ├── vocabulary/       # Vocabulary word audio
│   │   └── {id}.wav
│   ├── story/           # Story segment audio
│   │   └── {levelId}-{order}.wav
│   └── quiz/            # Quiz question audio
│       └── {questionId}.wav
└── models/              # AI models
    ├── vosk/
    │   └── vosk-model-de-0.22/
    └── coqui/
        └── tts_models/

Performance Considerations

TTS generation: ~1-2 seconds per sentence
Speech recognition: ~1-3 seconds per audio clip
Mistral API: ~2-5 seconds per request
Consider async/background processing for batch operations

📊 Progress History

Date	Status Change	Notes
May 31, 2025	Created	Initial plan based on application-plan.md

Architecture: Backend Structure
Architecture: Application Plan
Feature: Story Integration
Feature: Vocabulary System
Feature: Quiz System
Reference: Mistral AI API Docs
Reference: Vosk Documentation
Reference: Coqui TTS GitHub
Reference: vosk-model-de-0.22
Reference: Coqui German Model

Feature created from application-plan.md

31 KiB Raw Permalink Blame History

Feature: AI Services Integration

📌 Overview

Purpose

User Story

Acceptance Criteria

📋 Requirements

Functional Requirements

Non-Functional Requirements

🏗️ Technical Design

Architecture Overview

Components Involved

Data Flow

Story Generation Flow

Speech Recognition Flow

TTS Flow

🚀 Implementation Plan

Phase 1: Configuration & Interfaces (2 hours)

Phase 2: Mistral-Medium Integration (2-3 hours)

Phase 3: Vosk Speech Recognition (2-3 hours)

Phase 4: Coqui TTS Integration (2-3 hours)

Phase 5: Service Integration (2 hours)

Milestones

✅ Tasks

Backend - Configuration

Backend - Mistral Service

Backend - Vosk Service

Backend - Coqui TTS Service

Backend - Higher-Level Services

Infrastructure Setup

Frontend Integration

✅ Definition of Done

General Criteria (All Features)

AI-Specific Criteria

🧪 Testing Strategy

Testing Approach

AI-Specific Tests

Mistral Service Tests

Vosk Service Tests

Coqui TTS Service Tests

Test Data

🚨 Risks & Mitigations

Technical Risks

Operational Risks

Business Risks

🔗 Dependencies

Feature Dependencies

Technical Dependencies

External Services

Blockers

🔧 Technical Deep Dive: Python-.NET Integration

Integration Patterns

Option 1: Process.Start (Recommended for MVP)

Option 2: Process Pooling (Recommended for Production)

Option 3: gRPC (Best for Production)

Error Handling Strategy

Health Check Implementation

Audio File Management

Rate Limiting Implementation

📝 Notes & Decisions

Technical Notes

Vosk Configuration

Coqui TTS Configuration

Mistral Configuration

Error Handling Strategy

Caching Strategy

Gotchas

File Storage Structure

Performance Considerations

📊 Progress History

📎 Related Files & Links

31 KiB

Raw Permalink Blame History