galacoder/mcp-whisper-transcription
If you are the rightful owner of mcp-whisper-transcription and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The MCP Whisper Transcription Server is an advanced transcription server optimized for Apple Silicon devices, utilizing MLX-optimized Whisper models for efficient audio and video transcription.
MCP Whisper Transcription Server
An MCP (Model Context Protocol) server for audio/video transcription using MLX-optimized Whisper models. Optimized for Apple Silicon devices with ultra-fast performance.
โจ Features
- ๐ MLX-Optimized: Leverages Apple Silicon for blazing-fast transcription (up to 10x faster)
- ๐ฏ Multiple Formats: Supports txt, md, srt, and json output formats
- ๐ฌ Video Support: Automatically extracts audio from video files (MP4, MOV, AVI, MKV)
- ๐ฆ Batch Processing: Process multiple files in parallel with configurable workers
- ๐ง MCP Integration: Full MCP protocol support with tools and resources
- ๐ Performance Tracking: Built-in performance monitoring and reporting
- ๐๏ธ Flexible Models: Choose from 6 different Whisper models (tiny to large-v3-turbo)
- ๐ ๏ธ Error Handling: Robust error handling and validation
- ๐ Concurrent Processing: Thread-safe concurrent transcription support
- ๐ Voice Activity Detection: Optional VAD to remove silence and speed up processing
- ๐งน Hallucination Prevention: Advanced filtering to remove common transcription artifacts
๐ Performance
- Speed: Up to 10x realtime transcription on Apple Silicon
- Memory: Optimized memory usage (< 500MB for most files)
- Concurrent: Handle multiple transcriptions simultaneously
- Scalable: Batch process hundreds of files efficiently
๐ Quick Start
Prerequisites
- Apple Silicon Mac (M1, M2, M3, or later)
- Python 3.10+
- FFmpeg (for video support)
Installation
-
Install FFmpeg (if not already installed):
brew install ffmpeg
-
Clone the repository:
git clone https://github.com/galacoder/mcp-whisper-transcription.git cd mcp-whisper-transcription
-
Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -
-
Install dependencies:
poetry install
-
Test the installation:
poetry run python src/whisper_mcp_server.py --help
๐ Configuration
Environment Variables
Create a .env
file to customize settings:
# Model Configuration
DEFAULT_MODEL=mlx-community/whisper-large-v3-turbo
OUTPUT_FORMATS=txt,md,srt,json
# Performance Settings
MAX_WORKERS=4
TEMP_DIR=./temp
# Optional: API Keys for future cloud features
# OPENAI_API_KEY=your_key_here
Available Models
Model | Size | Speed | Memory | Best For |
---|---|---|---|---|
whisper-tiny-mlx | 39M | ~10x | ~150MB | Quick drafts |
whisper-base-mlx | 74M | ~7x | ~250MB | Balanced performance |
whisper-small-mlx | 244M | ~5x | ~600MB | High quality |
whisper-medium-mlx | 769M | ~3x | ~1.5GB | Professional use |
whisper-large-v3-mlx | 1550M | ~2x | ~3GB | Maximum accuracy |
whisper-large-v3-turbo | 809M | ~4x | ~1.6GB | Recommended |
๐ง Usage
Claude Desktop Integration
Add to your Claude Desktop configuration file:
{
"mcpServers": {
"whisper-transcription": {
"command": "poetry",
"args": ["run", "python", "src/whisper_mcp_server.py"],
"cwd": "/absolute/path/to/mcp-whisper-transcription"
}
}
}
๐ Configuration File Locations:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
Standalone Usage
# Run the MCP server directly
poetry run python src/whisper_mcp_server.py
# Or use the development server
poetry run python -m src.whisper_mcp_server
๐ ๏ธ Available Tools & Resources
MCP Tools
Tool | Description | Key Parameters |
---|---|---|
transcribe_file | Transcribe a single audio/video file | file_path , model , output_formats |
batch_transcribe | Process multiple files in a directory | directory , pattern , max_workers |
list_models | Show available Whisper models | None |
get_model_info | Get details about a specific model | model_id |
clear_cache | Clear model cache | model_id (optional) |
estimate_processing_time | Estimate transcription time | file_path , model |
validate_media_file | Check file compatibility | file_path |
get_supported_formats | List supported input/output formats | None |
MCP Resources
Resource | Description | Data Provided |
---|---|---|
transcription://history | Recent transcriptions | List of all transcriptions |
transcription://history/{id} | Specific transcription details | Full transcription metadata |
transcription://models | Available models | Model specifications and status |
transcription://config | Current configuration | Server settings and environment |
transcription://formats | Supported formats | Input/output format details |
transcription://performance | Performance statistics | Speed, memory, and uptime metrics |
Quick Examples
# Single file transcription
result = await client.call_tool("transcribe_file", {
"file_path": "interview.mp4",
"output_formats": "txt,srt",
"model": "mlx-community/whisper-large-v3-turbo"
})
# Transcription with Voice Activity Detection
result = await client.call_tool("transcribe_file", {
"file_path": "long_interview.mp4",
"output_formats": "txt,srt",
"use_vad": True # Remove silence for faster processing
})
# Batch processing
result = await client.call_tool("batch_transcribe", {
"directory": "./podcasts",
"pattern": "*.mp3",
"max_workers": 4
})
# Check supported formats
formats = await client.call_tool("get_supported_formats", {})
๐งช Development
Running Tests
# Run all tests
poetry run pytest
# Run with coverage
poetry run pytest --cov=src --cov-report=html
# Run specific test file
poetry run pytest tests/test_mcp_tools.py -v
Code Quality
# Format code
poetry run black .
poetry run isort .
# Type checking (optional)
poetry run mypy src/
# Lint code
poetry run flake8 src/
Project Structure
mcp-whisper-transcription/
โโโ src/
โ โโโ whisper_mcp_server.py # Main MCP server
โโโ tests/ # Comprehensive test suite
โโโ examples/ # Usage examples and test files
โโโ transcribe_mlx.py # MLX Whisper integration
โโโ whisper_utils.py # Utility functions
โโโ pyproject.toml # Project configuration
๐ Performance Benchmarks
Test Results (Apple M3 Max)
Model | Audio Duration | Processing Time | Speed | Memory |
---|---|---|---|---|
tiny | 10 minutes | 1.2 minutes | 8.3x | 150MB |
base | 10 minutes | 1.8 minutes | 5.6x | 250MB |
small | 10 minutes | 2.5 minutes | 4.0x | 600MB |
medium | 10 minutes | 4.2 minutes | 2.4x | 1.5GB |
large-v3 | 10 minutes | 5.8 minutes | 1.7x | 3GB |
large-v3-turbo | 10 minutes | 3.1 minutes | 3.2x | 1.6GB |
๐ง Troubleshooting
Common Issues
-
FFmpeg not found
brew install ffmpeg
-
Model download slow
- Models are cached in
~/.cache/huggingface/
- First download can be slow but subsequent runs are fast
- Models are cached in
-
Memory issues
- Use smaller models (tiny/base) for large files
- Reduce
MAX_WORKERS
for concurrent processing
-
Permission errors
- Ensure proper file permissions
- Check output directory write access
See for detailed solutions.
๐ Requirements
- Python 3.10+
- Apple Silicon Mac (M1, M2, M3, or later)
- FFmpeg (for video file support)
- 4GB+ RAM (8GB+ recommended for large models)
- 2GB+ free disk space (for model cache)
๐ License
MIT License - see file for details.
๐ค Contributing
Contributions are welcome! Please see for guidelines.
๐ Acknowledgments
- Built with FastMCP - Modern MCP server framework
- Powered by MLX Whisper - Apple Silicon optimization
- Original Whisper by OpenAI - Revolutionary speech recognition
- Thanks to the MLX team at Apple for the incredible performance optimizations