audio-transcription-mcp
If you are the rightful owner of audio-transcription-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Audio Transcription MCP Server is designed to transcribe and analyze French audio files, aiding in language learning.
Audio Transcription MCP Server
An MCP (Model Context Protocol) server that enables Claude to transcribe and analyze audio files, specifically designed for French language learning exercises.
Features
- Download MP3 audio files from HTTP/HTTPS URLs
- Transcribe French audio to text using OpenAI Whisper
- Translate French text to English using GPT-4
- Analyze French imperative sentences to identify the subject pronoun (tu, vous, nous)
- Secure handling of temporary files with automatic cleanup
Prerequisites
- Bun runtime (latest stable version)
- OpenAI API key with access to Whisper and GPT-4
Installation
- Clone the repository:
cd /Users/joshnewton/Development/audio-transcription-mcp
- Install dependencies:
bun install
- Set up environment variables:
export OPENAI_API_KEY="your-openai-api-key"
export LOG_LEVEL="info" # Optional: debug, info, error
export TEMP_DIR="/tmp/audio-transcription" # Optional: custom temp directory
export MAX_FILE_SIZE="25000000" # Optional: max file size in bytes (default 25MB)
Usage
Starting the Server
bun run index.ts
Or with npm scripts:
bun start # Production mode
bun dev # Development mode with auto-reload
Integration with Claude Code
Add the server to Claude Code:
claude mcp add /Users/joshnewton/Development/audio-transcription-mcp
Available Tools
1. transcribe_audio
Downloads and transcribes an audio file from a URL.
Input:
{
"url": "https://example.com/audio.mp3"
}
Output:
{
"transcription": "Original French text",
"translation": "English translation",
"language": "French"
}
2. analyze_french_imperative
Analyzes a French audio file to determine the imperative subject pronoun.
Input:
{
"url": "https://example.com/french-command.mp3"
}
Output:
{
"transcription": "Γcoutez attentivement",
"translation": "Listen carefully",
"subject": "vous",
"analysis": "The verb 'Γ©coutez' ends in -ez, which indicates the formal/plural 'vous' form"
}
Example Usage in Claude
User: "Listen to this audio and tell me who is being given the command: https://example.com/french-audio.mp3"
Claude: [Uses analyze_french_imperative tool]
Result: The command "Γcoutez attentivement" is addressed to "vous" (formal or plural 'you')
Development
Running Tests
bun test
Project Structure
audio-transcription-mcp/
βββ index.ts # Main entry point
βββ src/
β βββ server.ts # MCP server setup
β βββ handlers/ # Tool request handlers
β βββ services/ # Core services
β βββ utils/ # Utility functions
βββ tests/ # Test files
Security Considerations
- Only HTTP/HTTPS URLs are accepted (no file:// or local paths)
- File size limited to 25MB by default
- Temporary files are automatically cleaned up
- API keys are never logged
- Input validation on all user-provided data
Multi-Language Support Implementation Plan
Current State
The server is currently optimized for French language learning with hardcoded language parameters in several components.
Changes Required for Multi-Language Support
Phase 1: Core Multi-Language Infrastructure
Service Layer Updates:
transcriber.ts:34
- Remove hardcodedlanguage: "fr"
parameteranalyzer.ts:29-42
- Replace French-specific imperative analysis with configurable grammar analysistranslator.ts:25
- Make source/target languages configurable parameters
Tool Schema Changes:
- Replace
analyze_french_imperative
with genericanalyze_grammar
tool - Add required
language
parameter to tool input schemas - Add optional
targetLanguage
parameter for translation control
Handler Updates:
- Add language parameter validation against supported languages
- Pass language codes to all services
- Update response formatting to include language metadata
Phase 2: Language-Specific Analysis Modules
New File Structure:
src/services/
βββ analysis/
β βββ french.ts # Imperative analysis (tu/vous/nous)
β βββ spanish.ts # Ser vs Estar, subjunctive detection
β βββ german.ts # Case analysis (Nominativ, Akkusativ, Dativ, Genitiv)
β βββ italian.ts # Subjunctive mood, formal/informal register
β βββ base.ts # Common analysis interface
βββ languages.ts # Language configurations and metadata
βββ language-detector.ts # Auto-detection fallback logic
Language Configuration System:
interface LanguageConfig {
code: string; // ISO 639-1 code ('fr', 'es', 'de', etc.)
name: string; // Display name
whisperCode: string; // Whisper API language code
analysisRules: object; // Language-specific grammar rules
defaultTarget: string; // Default translation target
}
Phase 3: Advanced Features
- Auto-Detection: Use Whisper without language parameter, confidence scoring
- Environment Configuration:
SUPPORTED_LANGUAGES=fr,es,de,it
- Fallback Handling: Default language and error recovery
- Performance: Language-specific caching and optimization
Estimated Implementation Time: 3-4 days for core infrastructure, +1-2 days per additional language's specialized analysis rules.
Roadmap & Feature Ideas
Language Expansion
- Multi-language support (Spanish, German, Italian, Portuguese)
- Auto-detect language in audio files
- Cross-language comparative analysis tools
- Language-specific linguistic analysis patterns
Enhanced French Analysis
- Verb tense identification and explanation
- Subjunctive mood detection
- Conditional vs. indicative mood analysis
- Grammar error detection and suggestions
- Vocabulary difficulty level assessment
- Formal vs informal register analysis beyond imperatives
Advanced Audio Processing
- Support for additional audio formats (WAV, M4A, AAC, FLAC)
- Audio enhancement and noise reduction
- Speaker diarization (multiple speakers)
- Audio speed/pitch adjustment for learning
- Batch processing of multiple audio files
- Audio segmentation by sentences/phrases
- Real-time streaming transcription
Language Learning Tools
- Vocabulary extraction with frequency analysis
- Automatic flashcard generation from transcriptions
- Pronunciation scoring and feedback
- IPA (International Phonetic Alphabet) notation
- Cultural context and idiom explanations
- Sentence complexity scoring
- Learning progress tracking and analytics
Output & Export Features
- Subtitle file generation (SRT, VTT, ASS)
- PDF study guides with translations and notes
- Anki deck export for spaced repetition
- Study worksheet generation
- Audio-synchronized text highlighting
- Integration with popular language learning apps
Performance & Technical Improvements
- Caching for repeated audio URLs
- Streaming support for large audio files
- Multiple AI model options (local vs cloud)
- Rate limiting and usage analytics
- WebSocket support for real-time features
- Database integration for user progress
- API versioning and backward compatibility
Specialized Analysis Tools
- Emotion/sentiment analysis in speech
- Speaking pace and fluency analysis
- Accent identification and analysis
- Conversation flow analysis
- Politeness markers detection
- Regional dialect identification
- Academic vs colloquial language detection
License
MIT