DeutschLernen/docs/features/vocabulary-system.md
Lasse Rune Hansen 76e8af4987 Add complete solution: documentation, frontend, and project files
- Add comprehensive documentation in docs/ (architecture, features, roadmap)
- Add german-app-frontend with Vite, TypeScript, ESLint configuration
- Add AGENTS.md and .gitignore

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
2026-05-31 18:20:53 +02:00

13 KiB

Feature: Vocabulary System

Status: Planned
Priority: High
Complexity: High
Estimate: 8-12 hours
Assignee: -
Created: May 31, 2025
Target Completion: -
PR: -
Related Features: Infrastructure Setup, Lesson Management, AI Services (TTS)


📌 Overview

Purpose

Implement a comprehensive vocabulary management system that includes word storage, retrieval, audio generation, and integration with lessons.

User Story

As a learner, I want to study German vocabulary with translations, articles (der/die/das), and audio pronunciations so that I can build my word knowledge effectively.

Acceptance Criteria

  • Vocabulary words can be created, read, updated, and deleted (Admin)
  • Each word has: German text, English translation, article (optional), audio URL
  • Vocabulary is associated with specific lessons
  • Words can be filtered by lesson, level, or source (Goethe/DW)
  • Audio is generated for each word using Coqui TTS
  • Users can practice vocabulary through various exercises

📋 Requirements

Functional Requirements

ID Requirement Priority
FR-001 CRUD operations for vocabulary words High
FR-002 Associate words with lessons High
FR-003 Store article information (der/die/das) High
FR-004 Generate audio for each word High
FR-005 Import vocabulary from Goethe Institut High
FR-006 Import vocabulary from DW Learn German High
FR-007 Filter vocabulary by lesson/level/source Medium
FR-008 Vocabulary exercises (flashcards, matching) Medium
FR-009 Admin bulk import functionality Medium

Non-Functional Requirements

  • Performance: Vocabulary listing < 100ms
  • Storage: Audio files ~50-100KB per word
  • Security: Only admins can create/edit/delete words
  • Data Integrity: Word-article combinations must be valid

🏗️ Technical Design

Components Involved

  • Backend: VocabularyController, VocabularyService, VocabularyImportService
  • Database: Vocabulary table
  • Models: Vocabulary, VocabularyDto
  • External: Coqui TTS for audio generation
  • Frontend: VocabularyTab, VocabularyCard, VocabularyExercise components

Data Flow

1. Admin imports vocabulary from Goethe/DW
2. System scrapes word list from source
3. For each word:
   a. Store in Vocabulary table
   b. Generate audio using Coqui TTS
   c. Save audio file and update AudioUrl
4. User views lesson vocabulary
5. Frontend displays words with audio playback
6. User can click to hear pronunciation

API Endpoints

Endpoint Method Description Auth Required
/api/vocabulary GET List all vocabulary (with filters) Yes
/api/vocabulary/{id} GET Get specific vocabulary word Yes
/api/vocabulary POST Create new vocabulary word (Admin) Yes
/api/vocabulary/{id} PUT Update vocabulary word (Admin) Yes
/api/vocabulary/{id} DELETE Delete vocabulary word (Admin) Yes
/api/vocabulary/lesson/{lessonId} GET Get vocabulary for a lesson Yes
/api/vocabulary/import POST Import from Goethe/DW (Admin) Yes
/api/vocabulary/audio/{id} GET Get audio file for word Yes

Database Schema (from application-plan.md)

CREATE TABLE Vocabulary (
    Id SERIAL PRIMARY KEY,
    LessonId INT REFERENCES Lessons(Id) ON DELETE CASCADE,
    Word VARCHAR(50) NOT NULL,
    Translation VARCHAR(100) NOT NULL,
    Article VARCHAR(10) CHECK (Article IN ('der', 'die', 'das', '')),
    AudioUrl VARCHAR(255),
    ImageUrl VARCHAR(255),
    Source VARCHAR(50) CHECK (Source IN ('Goethe', 'DW'))
);

Vocabulary Word Structure

public class Vocabulary
{
    public int Id { get; set; }
    public int LessonId { get; set; }
    public string Word { get; set; }  // e.g., "Buch"
    public string Translation { get; set; }  // e.g., "book"
    public string? Article { get; set; }  // "der", "die", "das", or null
    public string? AudioUrl { get; set; }  // URL to audio file
    public string? ImageUrl { get; set; }  // Optional image URL
    public string Source { get; set; }  // "Goethe" or "DW"
}

🚀 Implementation Plan

Phase 1: Database & Models (2 hours)

  • Create Vocabulary entity
  • Create VocabularyDto for API responses
  • Create VocabularyRepository interface
  • Create VocabularyRepository implementation
  • Create migration for Vocabulary table
  • Add vocabulary to Lesson entity (one-to-many relationship)

Phase 2: Core CRUD Operations (2-3 hours)

  • Create VocabularyService with basic CRUD
  • Create VocabularyController
  • Implement filtering (by lesson, level, source)
  • Add validation for word data
  • Add authorization (Admin for write operations)
  • Write unit tests for VocabularyService

Phase 3: Audio Generation (2-3 hours)

  • Integrate with Coqui TTS service
  • Create audio generation queue/background job
  • Configure audio storage location
  • Implement audio file serving endpoint
  • Add audio URL to vocabulary DTOs
  • Create migration to add AudioUrl column

Phase 4: Vocabulary Import (2-3 hours)

  • Create VocabularyImportService
  • Implement Goethe Institut scraper
  • Implement DW Learn German scraper
  • Create bulk import endpoint
  • Add validation for imported words
  • Generate audio for imported words
  • Create admin UI for import (optional)

Phase 5: Frontend Integration (1-2 hours)

  • Create VocabularyTab component
  • Create VocabularyCard component with audio playback
  • Create VocabularyExercise component
  • Add vocabulary to LessonPage

Milestones

Milestone Date Status
Database & Models -
Core CRUD -
Audio Generation -
Vocabulary Import -
Frontend Integration -

Tasks

Backend

  • Create Domain/Entities/Vocabulary.cs
  • Create Application/DTOs/VocabularyDto.cs
  • Create Domain/Interfaces/IVocabularyRepository.cs
  • Create Infrastructure/Data/Repositories/VocabularyRepository.cs
  • Update Lesson entity to include Vocabulary collection
  • Create Application/Services/VocabularyService.cs
  • Create Presentation/Controllers/VocabularyController.cs
  • Create VocabularyImportService
  • Create endpoints for audio serving
  • Integrate with Coqui TTS service
  • Register services in Program.cs
  • Write unit tests
  • Write integration tests

Database

  • Create migration for Vocabulary table
  • Add foreign key to Lessons table
  • Add indexes for LessonId, Source
  • Apply migration

Audio Generation

  • Set up Coqui TTS configuration
  • Create audio file storage directory
  • Implement audio generation for new words
  • Implement background job for bulk audio generation
  • Create audio file cleanup mechanism

Vocabulary Import

  • Research Goethe Institut vocabulary structure
  • Research DW Learn German vocabulary structure
  • Implement web scraping for Goethe
  • Implement web scraping for DW
  • Create bulk import API endpoint
  • Add rate limiting to scrapers

Frontend

  • Create components/VocabularyTab.tsx
  • Create components/VocabularyCard.tsx
  • Create components/VocabularyExercise.tsx
  • Create hooks/useVocabulary.ts
  • Integrate with LessonPage
  • Add audio playback functionality

🔗 Dependencies

Feature Dependencies

Technical Dependencies

  • Coqui TTS Python library
  • HTML Agility Pack or similar for web scraping
  • AutoMapper (optional)

Blockers

  • Infrastructure Setup must be complete
  • Lesson Management must be complete for vocabulary-lesson association
  • Coqui TTS service must be configured

Definition of Done

General Criteria (All Features)

  • All acceptance criteria met and verified
  • All tasks in this document completed
  • Code follows Clean Architecture principles
  • Code reviewed and approved by at least 1 team member
  • All tests passing (unit, integration)
  • Documentation updated (README, AGENTS.md if applicable)
  • Feature works in development environment
  • Feature deployed to staging environment
  • Performance meets defined targets
  • Security review completed
  • No critical bugs or blockers

Vocabulary-Specific Criteria

  • Vocabulary words can be created, read, updated, and deleted
  • Each word has: German text, translation, article, audio URL
  • Words are correctly associated with lessons
  • Words can be filtered by lesson, level, source
  • Audio is generated for all vocabulary words
  • Audio files are accessible and playable
  • Goethe Institut vocabulary import works
  • DW Learn German vocabulary import works
  • Vocabulary exercises (flashcards, matching) are functional

🧪 Testing Strategy

Testing Approach

Test Type Coverage Tools Responsibility
Unit Tests 80%+ code coverage MsTest, Moq Backend Dev
Integration Tests All service interactions MsTest, TestContainers Backend Dev
API Tests All endpoints MsTest, HttpClient Backend Dev
Frontend Unit Tests Component logic Vitest Frontend Dev
Frontend Integration Service integration Vitest Frontend Dev
E2E Tests Critical user journeys Playwright QA/Dev
Manual Testing Exploratory, edge cases BrowserStack QA

Vocabulary-Specific Tests

Backend Tests

  • Create word with valid data → success
  • Create word with missing required fields → error
  • Create word with invalid article → error
  • Create word with invalid source → error
  • Get word by ID → returns correct word
  • Get words by lesson → returns correct list
  • Get words by level → returns correct list
  • Get words by source → returns correct list
  • Update word → success
  • Update word with invalid data → error
  • Delete word → success
  • Bulk import from Goethe → creates words correctly
  • Bulk import from DW → creates words correctly
  • Audio generation for word → creates audio file

Audio Tests

  • Audio file generated for new word
  • Audio file accessible via endpoint
  • Audio file is valid WAV format
  • Audio file quality is acceptable
  • Audio file size is within limits

Integration Tests

  • Create lesson with vocabulary → both created
  • Get lesson → includes vocabulary
  • Delete lesson → cascades to vocabulary
  • Import vocabulary → audio generated for all words

📝 Notes & Decisions

Decisions Made

Date Decision Rationale
May 31, 2025 Generate audio for all vocabulary Essential for pronunciation practice
May 31, 2025 Store audio files on filesystem Simple for MVP, can migrate to CDN later
May 31, 2025 Import from Goethe and DW High-quality, trusted sources
May 31, 2025 Include article information Critical for German language learning

Technical Notes

  • Audio files should be stored with consistent naming: /audio/vocabulary/{id}.wav
  • Vocabulary table has CHECK constraint for Article (only der/die/das or empty)
  • Vocabulary table has CHECK constraint for Source (only Goethe or DW)
  • Consider adding phonetic transcription (IPA) in the future
  • Audio generation can be resource-intensive - consider queueing for bulk operations

Gotchas

  • ⚠️ Coqui TTS may have issues with some German words - need fallback mechanism
  • ⚠️ Web scraping may break if Goethe/DW change their HTML structure
  • ⚠️ Audio files can be large - consider compression
  • ⚠️ Need to handle duplicate words across different lessons
  • ⚠️ Article may be empty for verbs, adjectives, etc.

Article Rules Reference

  • der: Masculine nouns (e.g., der Mann, der Tag)
  • die: Feminine nouns (e.g., die Frau, die Stadt)
  • das: Neuter nouns (e.g., das Kind, das Haus)
  • empty: Verbs (e.g., gehen, sein), adjectives, adverbs, prepositions

📊 Progress History

Date Status Change Notes
May 31, 2025 Created Initial plan based on application-plan.md


Feature created from application-plan.md