audio-transcription-mcp

audio-transcription-mcp

3.2

If you are the rightful owner of audio-transcription-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Audio Transcription MCP Server is designed to transcribe and analyze French audio files, aiding in language learning.

Audio Transcription MCP Server

An MCP (Model Context Protocol) server that enables Claude to transcribe and analyze audio files, specifically designed for French language learning exercises.

Features

  • Download MP3 audio files from HTTP/HTTPS URLs
  • Transcribe French audio to text using OpenAI Whisper
  • Translate French text to English using GPT-4
  • Analyze French imperative sentences to identify the subject pronoun (tu, vous, nous)
  • Secure handling of temporary files with automatic cleanup

Prerequisites

  • Bun runtime (latest stable version)
  • OpenAI API key with access to Whisper and GPT-4

Installation

  1. Clone the repository:
cd /Users/joshnewton/Development/audio-transcription-mcp
  1. Install dependencies:
bun install
  1. Set up environment variables:
export OPENAI_API_KEY="your-openai-api-key"
export LOG_LEVEL="info"  # Optional: debug, info, error
export TEMP_DIR="/tmp/audio-transcription"  # Optional: custom temp directory
export MAX_FILE_SIZE="25000000"  # Optional: max file size in bytes (default 25MB)

Usage

Starting the Server

bun run index.ts

Or with npm scripts:

bun start  # Production mode
bun dev    # Development mode with auto-reload

Integration with Claude Code

Add the server to Claude Code:

claude mcp add /Users/joshnewton/Development/audio-transcription-mcp

Available Tools

1. transcribe_audio

Downloads and transcribes an audio file from a URL.

Input:

{
  "url": "https://example.com/audio.mp3"
}

Output:

{
  "transcription": "Original French text",
  "translation": "English translation",
  "language": "French"
}

2. analyze_french_imperative

Analyzes a French audio file to determine the imperative subject pronoun.

Input:

{
  "url": "https://example.com/french-command.mp3"
}

Output:

{
  "transcription": "Γ‰coutez attentivement",
  "translation": "Listen carefully",
  "subject": "vous",
  "analysis": "The verb 'Γ©coutez' ends in -ez, which indicates the formal/plural 'vous' form"
}

Example Usage in Claude

User: "Listen to this audio and tell me who is being given the command: https://example.com/french-audio.mp3"
Claude: [Uses analyze_french_imperative tool]
Result: The command "Γ‰coutez attentivement" is addressed to "vous" (formal or plural 'you')

Development

Running Tests

bun test

Project Structure

audio-transcription-mcp/
β”œβ”€β”€ index.ts                 # Main entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ server.ts           # MCP server setup
β”‚   β”œβ”€β”€ handlers/           # Tool request handlers
β”‚   β”œβ”€β”€ services/           # Core services
β”‚   └── utils/              # Utility functions
└── tests/                  # Test files

Security Considerations

  • Only HTTP/HTTPS URLs are accepted (no file:// or local paths)
  • File size limited to 25MB by default
  • Temporary files are automatically cleaned up
  • API keys are never logged
  • Input validation on all user-provided data

Multi-Language Support Implementation Plan

Current State

The server is currently optimized for French language learning with hardcoded language parameters in several components.

Changes Required for Multi-Language Support

Phase 1: Core Multi-Language Infrastructure

Service Layer Updates:

  • transcriber.ts:34 - Remove hardcoded language: "fr" parameter
  • analyzer.ts:29-42 - Replace French-specific imperative analysis with configurable grammar analysis
  • translator.ts:25 - Make source/target languages configurable parameters

Tool Schema Changes:

  • Replace analyze_french_imperative with generic analyze_grammar tool
  • Add required language parameter to tool input schemas
  • Add optional targetLanguage parameter for translation control

Handler Updates:

  • Add language parameter validation against supported languages
  • Pass language codes to all services
  • Update response formatting to include language metadata
Phase 2: Language-Specific Analysis Modules

New File Structure:

src/services/
β”œβ”€β”€ analysis/
β”‚   β”œβ”€β”€ french.ts     # Imperative analysis (tu/vous/nous)
β”‚   β”œβ”€β”€ spanish.ts    # Ser vs Estar, subjunctive detection
β”‚   β”œβ”€β”€ german.ts     # Case analysis (Nominativ, Akkusativ, Dativ, Genitiv)
β”‚   β”œβ”€β”€ italian.ts    # Subjunctive mood, formal/informal register
β”‚   └── base.ts       # Common analysis interface
β”œβ”€β”€ languages.ts      # Language configurations and metadata
└── language-detector.ts # Auto-detection fallback logic

Language Configuration System:

interface LanguageConfig {
  code: string;           // ISO 639-1 code ('fr', 'es', 'de', etc.)
  name: string;           // Display name
  whisperCode: string;    // Whisper API language code
  analysisRules: object;  // Language-specific grammar rules
  defaultTarget: string;  // Default translation target
}
Phase 3: Advanced Features
  • Auto-Detection: Use Whisper without language parameter, confidence scoring
  • Environment Configuration: SUPPORTED_LANGUAGES=fr,es,de,it
  • Fallback Handling: Default language and error recovery
  • Performance: Language-specific caching and optimization

Estimated Implementation Time: 3-4 days for core infrastructure, +1-2 days per additional language's specialized analysis rules.

Roadmap & Feature Ideas

Language Expansion

  • Multi-language support (Spanish, German, Italian, Portuguese)
  • Auto-detect language in audio files
  • Cross-language comparative analysis tools
  • Language-specific linguistic analysis patterns

Enhanced French Analysis

  • Verb tense identification and explanation
  • Subjunctive mood detection
  • Conditional vs. indicative mood analysis
  • Grammar error detection and suggestions
  • Vocabulary difficulty level assessment
  • Formal vs informal register analysis beyond imperatives

Advanced Audio Processing

  • Support for additional audio formats (WAV, M4A, AAC, FLAC)
  • Audio enhancement and noise reduction
  • Speaker diarization (multiple speakers)
  • Audio speed/pitch adjustment for learning
  • Batch processing of multiple audio files
  • Audio segmentation by sentences/phrases
  • Real-time streaming transcription

Language Learning Tools

  • Vocabulary extraction with frequency analysis
  • Automatic flashcard generation from transcriptions
  • Pronunciation scoring and feedback
  • IPA (International Phonetic Alphabet) notation
  • Cultural context and idiom explanations
  • Sentence complexity scoring
  • Learning progress tracking and analytics

Output & Export Features

  • Subtitle file generation (SRT, VTT, ASS)
  • PDF study guides with translations and notes
  • Anki deck export for spaced repetition
  • Study worksheet generation
  • Audio-synchronized text highlighting
  • Integration with popular language learning apps

Performance & Technical Improvements

  • Caching for repeated audio URLs
  • Streaming support for large audio files
  • Multiple AI model options (local vs cloud)
  • Rate limiting and usage analytics
  • WebSocket support for real-time features
  • Database integration for user progress
  • API versioning and backward compatibility

Specialized Analysis Tools

  • Emotion/sentiment analysis in speech
  • Speaking pace and fluency analysis
  • Accent identification and analysis
  • Conversation flow analysis
  • Politeness markers detection
  • Regional dialect identification
  • Academic vs colloquial language detection

License

MIT