DeutschLernen/docs/features/vocabulary-system.md
Lasse Rune Hansen 76e8af4987 Add complete solution: documentation, frontend, and project files
- Add comprehensive documentation in docs/ (architecture, features, roadmap)
- Add german-app-frontend with Vite, TypeScript, ESLint configuration
- Add AGENTS.md and .gitignore

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-05-31 18:20:53 +02:00

368 lines
13 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Feature: Vocabulary System
> **Status**: ⏳ Planned
> **Priority**: High
> **Complexity**: High
> **Estimate**: 8-12 hours
> **Assignee**: -
> **Created**: May 31, 2025
> **Target Completion**: -
> **PR**: -
> **Related Features**: Infrastructure Setup, Lesson Management, AI Services (TTS)
---
## 📌 Overview
### Purpose
Implement a comprehensive vocabulary management system that includes word storage, retrieval, audio generation, and integration with lessons.
### User Story
As a learner, I want to study German vocabulary with translations, articles (der/die/das), and audio pronunciations so that I can build my word knowledge effectively.
### Acceptance Criteria
- [ ] Vocabulary words can be created, read, updated, and deleted (Admin)
- [ ] Each word has: German text, English translation, article (optional), audio URL
- [ ] Vocabulary is associated with specific lessons
- [ ] Words can be filtered by lesson, level, or source (Goethe/DW)
- [ ] Audio is generated for each word using Coqui TTS
- [ ] Users can practice vocabulary through various exercises
---
## 📋 Requirements
### Functional Requirements
| ID | Requirement | Priority |
|----|-------------|----------|
| FR-001 | CRUD operations for vocabulary words | High |
| FR-002 | Associate words with lessons | High |
| FR-003 | Store article information (der/die/das) | High |
| FR-004 | Generate audio for each word | High |
| FR-005 | Import vocabulary from Goethe Institut | High |
| FR-006 | Import vocabulary from DW Learn German | High |
| FR-007 | Filter vocabulary by lesson/level/source | Medium |
| FR-008 | Vocabulary exercises (flashcards, matching) | Medium |
| FR-009 | Admin bulk import functionality | Medium |
### Non-Functional Requirements
- Performance: Vocabulary listing < 100ms
- Storage: Audio files ~50-100KB per word
- Security: Only admins can create/edit/delete words
- Data Integrity: Word-article combinations must be valid
---
## 🏗️ Technical Design
### Components Involved
- **Backend**: VocabularyController, VocabularyService, VocabularyImportService
- **Database**: Vocabulary table
- **Models**: Vocabulary, VocabularyDto
- **External**: Coqui TTS for audio generation
- **Frontend**: VocabularyTab, VocabularyCard, VocabularyExercise components
### Data Flow
```
1. Admin imports vocabulary from Goethe/DW
2. System scrapes word list from source
3. For each word:
a. Store in Vocabulary table
b. Generate audio using Coqui TTS
c. Save audio file and update AudioUrl
4. User views lesson vocabulary
5. Frontend displays words with audio playback
6. User can click to hear pronunciation
```
### API Endpoints
| Endpoint | Method | Description | Auth Required |
|----------|--------|-------------|----------------|
| `/api/vocabulary` | GET | List all vocabulary (with filters) | Yes |
| `/api/vocabulary/{id}` | GET | Get specific vocabulary word | Yes |
| `/api/vocabulary` | POST | Create new vocabulary word (Admin) | Yes |
| `/api/vocabulary/{id}` | PUT | Update vocabulary word (Admin) | Yes |
| `/api/vocabulary/{id}` | DELETE | Delete vocabulary word (Admin) | Yes |
| `/api/vocabulary/lesson/{lessonId}` | GET | Get vocabulary for a lesson | Yes |
| `/api/vocabulary/import` | POST | Import from Goethe/DW (Admin) | Yes |
| `/api/vocabulary/audio/{id}` | GET | Get audio file for word | Yes |
### Database Schema (from application-plan.md)
```sql
CREATE TABLE Vocabulary (
Id SERIAL PRIMARY KEY,
LessonId INT REFERENCES Lessons(Id) ON DELETE CASCADE,
Word VARCHAR(50) NOT NULL,
Translation VARCHAR(100) NOT NULL,
Article VARCHAR(10) CHECK (Article IN ('der', 'die', 'das', '')),
AudioUrl VARCHAR(255),
ImageUrl VARCHAR(255),
Source VARCHAR(50) CHECK (Source IN ('Goethe', 'DW'))
);
```
### Vocabulary Word Structure
```csharp
public class Vocabulary
{
public int Id { get; set; }
public int LessonId { get; set; }
public string Word { get; set; } // e.g., "Buch"
public string Translation { get; set; } // e.g., "book"
public string? Article { get; set; } // "der", "die", "das", or null
public string? AudioUrl { get; set; } // URL to audio file
public string? ImageUrl { get; set; } // Optional image URL
public string Source { get; set; } // "Goethe" or "DW"
}
```
---
## 🚀 Implementation Plan
### Phase 1: Database & Models (2 hours)
- [ ] Create Vocabulary entity
- [ ] Create VocabularyDto for API responses
- [ ] Create VocabularyRepository interface
- [ ] Create VocabularyRepository implementation
- [ ] Create migration for Vocabulary table
- [ ] Add vocabulary to Lesson entity (one-to-many relationship)
### Phase 2: Core CRUD Operations (2-3 hours)
- [ ] Create VocabularyService with basic CRUD
- [ ] Create VocabularyController
- [ ] Implement filtering (by lesson, level, source)
- [ ] Add validation for word data
- [ ] Add authorization (Admin for write operations)
- [ ] Write unit tests for VocabularyService
### Phase 3: Audio Generation (2-3 hours)
- [ ] Integrate with Coqui TTS service
- [ ] Create audio generation queue/background job
- [ ] Configure audio storage location
- [ ] Implement audio file serving endpoint
- [ ] Add audio URL to vocabulary DTOs
- [ ] Create migration to add AudioUrl column
### Phase 4: Vocabulary Import (2-3 hours)
- [ ] Create VocabularyImportService
- [ ] Implement Goethe Institut scraper
- [ ] Implement DW Learn German scraper
- [ ] Create bulk import endpoint
- [ ] Add validation for imported words
- [ ] Generate audio for imported words
- [ ] Create admin UI for import (optional)
### Phase 5: Frontend Integration (1-2 hours)
- [ ] Create VocabularyTab component
- [ ] Create VocabularyCard component with audio playback
- [ ] Create VocabularyExercise component
- [ ] Add vocabulary to LessonPage
### Milestones
| Milestone | Date | Status |
|-----------|------|--------|
| Database & Models | - | |
| Core CRUD | - | |
| Audio Generation | - | |
| Vocabulary Import | - | |
| Frontend Integration | - | |
---
## ✅ Tasks
### Backend
- [ ] Create Domain/Entities/Vocabulary.cs
- [ ] Create Application/DTOs/VocabularyDto.cs
- [ ] Create Domain/Interfaces/IVocabularyRepository.cs
- [ ] Create Infrastructure/Data/Repositories/VocabularyRepository.cs
- [ ] Update Lesson entity to include Vocabulary collection
- [ ] Create Application/Services/VocabularyService.cs
- [ ] Create Presentation/Controllers/VocabularyController.cs
- [ ] Create VocabularyImportService
- [ ] Create endpoints for audio serving
- [ ] Integrate with Coqui TTS service
- [ ] Register services in Program.cs
- [ ] Write unit tests
- [ ] Write integration tests
### Database
- [ ] Create migration for Vocabulary table
- [ ] Add foreign key to Lessons table
- [ ] Add indexes for LessonId, Source
- [ ] Apply migration
### Audio Generation
- [ ] Set up Coqui TTS configuration
- [ ] Create audio file storage directory
- [ ] Implement audio generation for new words
- [ ] Implement background job for bulk audio generation
- [ ] Create audio file cleanup mechanism
### Vocabulary Import
- [ ] Research Goethe Institut vocabulary structure
- [ ] Research DW Learn German vocabulary structure
- [ ] Implement web scraping for Goethe
- [ ] Implement web scraping for DW
- [ ] Create bulk import API endpoint
- [ ] Add rate limiting to scrapers
### Frontend
- [ ] Create components/VocabularyTab.tsx
- [ ] Create components/VocabularyCard.tsx
- [ ] Create components/VocabularyExercise.tsx
- [ ] Create hooks/useVocabulary.ts
- [ ] Integrate with LessonPage
- [ ] Add audio playback functionality
---
## 🔗 Dependencies
### Feature Dependencies
- [Infrastructure Setup](infrastructure-setup.md) - Required
- [Lesson Management](lesson-management.md) - Required (vocabulary associated with lessons)
- [AI Services - TTS](ai-services.md) - Required (for audio generation)
### Technical Dependencies
- Coqui TTS Python library
- HTML Agility Pack or similar for web scraping
- AutoMapper (optional)
### Blockers
- [ ] Infrastructure Setup must be complete
- [ ] Lesson Management must be complete for vocabulary-lesson association
- [ ] Coqui TTS service must be configured
---
## ✅ Definition of Done
### General Criteria (All Features)
- [ ] All acceptance criteria met and verified
- [ ] All tasks in this document completed
- [ ] Code follows Clean Architecture principles
- [ ] Code reviewed and approved by at least 1 team member
- [ ] All tests passing (unit, integration)
- [ ] Documentation updated (README, AGENTS.md if applicable)
- [ ] Feature works in development environment
- [ ] Feature deployed to staging environment
- [ ] Performance meets defined targets
- [ ] Security review completed
- [ ] No critical bugs or blockers
### Vocabulary-Specific Criteria
- [ ] Vocabulary words can be created, read, updated, and deleted
- [ ] Each word has: German text, translation, article, audio URL
- [ ] Words are correctly associated with lessons
- [ ] Words can be filtered by lesson, level, source
- [ ] Audio is generated for all vocabulary words
- [ ] Audio files are accessible and playable
- [ ] Goethe Institut vocabulary import works
- [ ] DW Learn German vocabulary import works
- [ ] Vocabulary exercises (flashcards, matching) are functional
---
## 🧪 Testing Strategy
### Testing Approach
| Test Type | Coverage | Tools | Responsibility |
|-----------|----------|-------|----------------|
| Unit Tests | 80%+ code coverage | MsTest, Moq | Backend Dev |
| Integration Tests | All service interactions | MsTest, TestContainers | Backend Dev |
| API Tests | All endpoints | MsTest, HttpClient | Backend Dev |
| Frontend Unit Tests | Component logic | Vitest | Frontend Dev |
| Frontend Integration | Service integration | Vitest | Frontend Dev |
| E2E Tests | Critical user journeys | Playwright | QA/Dev |
| Manual Testing | Exploratory, edge cases | BrowserStack | QA |
### Vocabulary-Specific Tests
#### Backend Tests
- [ ] Create word with valid data success
- [ ] Create word with missing required fields error
- [ ] Create word with invalid article error
- [ ] Create word with invalid source error
- [ ] Get word by ID returns correct word
- [ ] Get words by lesson returns correct list
- [ ] Get words by level returns correct list
- [ ] Get words by source returns correct list
- [ ] Update word success
- [ ] Update word with invalid data error
- [ ] Delete word success
- [ ] Bulk import from Goethe creates words correctly
- [ ] Bulk import from DW creates words correctly
- [ ] Audio generation for word creates audio file
#### Audio Tests
- [ ] Audio file generated for new word
- [ ] Audio file accessible via endpoint
- [ ] Audio file is valid WAV format
- [ ] Audio file quality is acceptable
- [ ] Audio file size is within limits
#### Integration Tests
- [ ] Create lesson with vocabulary both created
- [ ] Get lesson includes vocabulary
- [ ] Delete lesson cascades to vocabulary
- [ ] Import vocabulary audio generated for all words
---
## 📝 Notes & Decisions
### Decisions Made
| Date | Decision | Rationale |
|------|----------|-----------|
| May 31, 2025 | Generate audio for all vocabulary | Essential for pronunciation practice |
| May 31, 2025 | Store audio files on filesystem | Simple for MVP, can migrate to CDN later |
| May 31, 2025 | Import from Goethe and DW | High-quality, trusted sources |
| May 31, 2025 | Include article information | Critical for German language learning |
### Technical Notes
- Audio files should be stored with consistent naming: `/audio/vocabulary/{id}.wav`
- Vocabulary table has CHECK constraint for Article (only der/die/das or empty)
- Vocabulary table has CHECK constraint for Source (only Goethe or DW)
- Consider adding phonetic transcription (IPA) in the future
- Audio generation can be resource-intensive - consider queueing for bulk operations
### Gotchas
- Coqui TTS may have issues with some German words - need fallback mechanism
- Web scraping may break if Goethe/DW change their HTML structure
- Audio files can be large - consider compression
- Need to handle duplicate words across different lessons
- Article may be empty for verbs, adjectives, etc.
### Article Rules Reference
- **der**: Masculine nouns (e.g., der Mann, der Tag)
- **die**: Feminine nouns (e.g., die Frau, die Stadt)
- **das**: Neuter nouns (e.g., das Kind, das Haus)
- **empty**: Verbs (e.g., gehen, sein), adjectives, adverbs, prepositions
---
## 📊 Progress History
| Date | Status Change | Notes |
|------|---------------|-------|
| May 31, 2025 | Created | Initial plan based on application-plan.md |
---
## 📎 Related Files & Links
- Architecture: [Backend Structure](../architecture/backend-structure.md)
- Architecture: [Application Plan](../architecture/application-plan.md)
- Database Schema: [Initial Database Schema](../database/initial-database-schema.sql)
- Feature: [Lesson Management](lesson-management.md)
- Feature: [AI Services](ai-services.md)
- Reference: [Goethe Institut Vocabulary](https://www.goethe.de/en/spr/ueb.html)
- Reference: [DW Learn German Vocabulary](https://learngerman.dw.com/en/learn-german/s-9528)
- Reference: [Coqui TTS](https://github.com/coqui-ai/TTS)
---
*Feature created from application-plan.md*