audio-transcription-mcp

imjoshnewton/audio-transcription-mcp

3.1

If you are the rightful owner of audio-transcription-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Audio Transcription MCP Server is designed to transcribe and analyze French audio files, aiding in language learning.

Tools
2
Resources
0
Prompts
0

Audio Transcription MCP Server

An MCP (Model Context Protocol) server that enables Claude to transcribe and analyze audio files, specifically designed for French language learning exercises.

Features

  • Download MP3 audio files from HTTP/HTTPS URLs
  • Transcribe French audio to text using OpenAI Whisper
  • Translate French text to English using GPT-4
  • Analyze French imperative sentences to identify the subject pronoun (tu, vous, nous)
  • Secure handling of temporary files with automatic cleanup

Prerequisites

  • Bun runtime (latest stable version)
  • OpenAI API key with access to Whisper and GPT-4

Installation

  1. Clone the repository:
cd /Users/joshnewton/Development/audio-transcription-mcp
  1. Install dependencies:
bun install
  1. Set up environment variables:
export OPENAI_API_KEY="your-openai-api-key"
export LOG_LEVEL="info"  # Optional: debug, info, error
export TEMP_DIR="/tmp/audio-transcription"  # Optional: custom temp directory
export MAX_FILE_SIZE="25000000"  # Optional: max file size in bytes (default 25MB)

Usage

Starting the Server

bun run index.ts

Or with npm scripts:

bun start  # Production mode
bun dev    # Development mode with auto-reload

Integration with Claude Code

Add the server to Claude Code:

claude mcp add /Users/joshnewton/Development/audio-transcription-mcp

Available Tools

1. transcribe_audio

Downloads and transcribes an audio file from a URL.

Input:

{
  "url": "https://example.com/audio.mp3"
}

Output:

{
  "transcription": "Original French text",
  "translation": "English translation",
  "language": "French"
}

2. analyze_french_imperative

Analyzes a French audio file to determine the imperative subject pronoun.

Input:

{
  "url": "https://example.com/french-command.mp3"
}

Output:

{
  "transcription": "Γ‰coutez attentivement",
  "translation": "Listen carefully",
  "subject": "vous",
  "analysis": "The verb 'Γ©coutez' ends in -ez, which indicates the formal/plural 'vous' form"
}

Example Usage in Claude

User: "Listen to this audio and tell me who is being given the command: https://example.com/french-audio.mp3"
Claude: [Uses analyze_french_imperative tool]
Result: The command "Γ‰coutez attentivement" is addressed to "vous" (formal or plural 'you')

Development

Running Tests

bun test

Project Structure

audio-transcription-mcp/
β”œβ”€β”€ index.ts                 # Main entry point
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ server.ts           # MCP server setup
β”‚   β”œβ”€β”€ handlers/           # Tool request handlers
β”‚   β”œβ”€β”€ services/           # Core services
β”‚   └── utils/              # Utility functions
└── tests/                  # Test files

Security Considerations

  • Only HTTP/HTTPS URLs are accepted (no file:// or local paths)
  • File size limited to 25MB by default
  • Temporary files are automatically cleaned up
  • API keys are never logged
  • Input validation on all user-provided data

Multi-Language Support Implementation Plan

Current State

The server is currently optimized for French language learning with hardcoded language parameters in several components.

Changes Required for Multi-Language Support

Phase 1: Core Multi-Language Infrastructure

Service Layer Updates:

  • transcriber.ts:34 - Remove hardcoded language: "fr" parameter
  • analyzer.ts:29-42 - Replace French-specific imperative analysis with configurable grammar analysis
  • translator.ts:25 - Make source/target languages configurable parameters

Tool Schema Changes:

  • Replace analyze_french_imperative with generic analyze_grammar tool
  • Add required language parameter to tool input schemas
  • Add optional targetLanguage parameter for translation control

Handler Updates:

  • Add language parameter validation against supported languages
  • Pass language codes to all services
  • Update response formatting to include language metadata
Phase 2: Language-Specific Analysis Modules

New File Structure:

src/services/
β”œβ”€β”€ analysis/
β”‚   β”œβ”€β”€ french.ts     # Imperative analysis (tu/vous/nous)
β”‚   β”œβ”€β”€ spanish.ts    # Ser vs Estar, subjunctive detection
β”‚   β”œβ”€β”€ german.ts     # Case analysis (Nominativ, Akkusativ, Dativ, Genitiv)
β”‚   β”œβ”€β”€ italian.ts    # Subjunctive mood, formal/informal register
β”‚   └── base.ts       # Common analysis interface
β”œβ”€β”€ languages.ts      # Language configurations and metadata
└── language-detector.ts # Auto-detection fallback logic

Language Configuration System:

interface LanguageConfig {
  code: string;           // ISO 639-1 code ('fr', 'es', 'de', etc.)
  name: string;           // Display name
  whisperCode: string;    // Whisper API language code
  analysisRules: object;  // Language-specific grammar rules
  defaultTarget: string;  // Default translation target
}
Phase 3: Advanced Features
  • Auto-Detection: Use Whisper without language parameter, confidence scoring
  • Environment Configuration: SUPPORTED_LANGUAGES=fr,es,de,it
  • Fallback Handling: Default language and error recovery
  • Performance: Language-specific caching and optimization

Estimated Implementation Time: 3-4 days for core infrastructure, +1-2 days per additional language's specialized analysis rules.

Roadmap & Feature Ideas

Language Expansion

  • Multi-language support (Spanish, German, Italian, Portuguese)
  • Auto-detect language in audio files
  • Cross-language comparative analysis tools
  • Language-specific linguistic analysis patterns

Enhanced French Analysis

  • Verb tense identification and explanation
  • Subjunctive mood detection
  • Conditional vs. indicative mood analysis
  • Grammar error detection and suggestions
  • Vocabulary difficulty level assessment
  • Formal vs informal register analysis beyond imperatives

Advanced Audio Processing

  • Support for additional audio formats (WAV, M4A, AAC, FLAC)
  • Audio enhancement and noise reduction
  • Speaker diarization (multiple speakers)
  • Audio speed/pitch adjustment for learning
  • Batch processing of multiple audio files
  • Audio segmentation by sentences/phrases
  • Real-time streaming transcription

Language Learning Tools

  • Vocabulary extraction with frequency analysis
  • Automatic flashcard generation from transcriptions
  • Pronunciation scoring and feedback
  • IPA (International Phonetic Alphabet) notation
  • Cultural context and idiom explanations
  • Sentence complexity scoring
  • Learning progress tracking and analytics

Output & Export Features

  • Subtitle file generation (SRT, VTT, ASS)
  • PDF study guides with translations and notes
  • Anki deck export for spaced repetition
  • Study worksheet generation
  • Audio-synchronized text highlighting
  • Integration with popular language learning apps

Performance & Technical Improvements

  • Caching for repeated audio URLs
  • Streaming support for large audio files
  • Multiple AI model options (local vs cloud)
  • Rate limiting and usage analytics
  • WebSocket support for real-time features
  • Database integration for user progress
  • API versioning and backward compatibility

Specialized Analysis Tools

  • Emotion/sentiment analysis in speech
  • Speaking pace and fluency analysis
  • Accent identification and analysis
  • Conversation flow analysis
  • Politeness markers detection
  • Regional dialect identification
  • Academic vs colloquial language detection

License

MIT