mcp-whisper-transcription

galacoder/mcp-whisper-transcription

3.2

If you are the rightful owner of mcp-whisper-transcription and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The MCP Whisper Transcription Server is an advanced transcription server optimized for Apple Silicon devices, utilizing MLX-optimized Whisper models for efficient audio and video transcription.

Tools
8
Resources
0
Prompts
0

MCP Whisper Transcription Server

An MCP (Model Context Protocol) server for audio/video transcription using MLX-optimized Whisper models. Optimized for Apple Silicon devices with ultra-fast performance.

โœจ Features

  • ๐Ÿš€ MLX-Optimized: Leverages Apple Silicon for blazing-fast transcription (up to 10x faster)
  • ๐ŸŽฏ Multiple Formats: Supports txt, md, srt, and json output formats
  • ๐ŸŽฌ Video Support: Automatically extracts audio from video files (MP4, MOV, AVI, MKV)
  • ๐Ÿ“ฆ Batch Processing: Process multiple files in parallel with configurable workers
  • ๐Ÿ”ง MCP Integration: Full MCP protocol support with tools and resources
  • ๐Ÿ“Š Performance Tracking: Built-in performance monitoring and reporting
  • ๐ŸŽ›๏ธ Flexible Models: Choose from 6 different Whisper models (tiny to large-v3-turbo)
  • ๐Ÿ› ๏ธ Error Handling: Robust error handling and validation
  • ๐Ÿ“ˆ Concurrent Processing: Thread-safe concurrent transcription support
  • ๐Ÿ”‡ Voice Activity Detection: Optional VAD to remove silence and speed up processing
  • ๐Ÿงน Hallucination Prevention: Advanced filtering to remove common transcription artifacts

๐Ÿ† Performance

  • Speed: Up to 10x realtime transcription on Apple Silicon
  • Memory: Optimized memory usage (< 500MB for most files)
  • Concurrent: Handle multiple transcriptions simultaneously
  • Scalable: Batch process hundreds of files efficiently

๐Ÿš€ Quick Start

Prerequisites

  • Apple Silicon Mac (M1, M2, M3, or later)
  • Python 3.10+
  • FFmpeg (for video support)

Installation

  1. Install FFmpeg (if not already installed):

    brew install ffmpeg
    
  2. Clone the repository:

    git clone https://github.com/galacoder/mcp-whisper-transcription.git
    cd mcp-whisper-transcription
    
  3. Install Poetry (if not already installed):

    curl -sSL https://install.python-poetry.org | python3 -
    
  4. Install dependencies:

    poetry install
    
  5. Test the installation:

    poetry run python src/whisper_mcp_server.py --help
    

๐Ÿ“‹ Configuration

Environment Variables

Create a .env file to customize settings:

# Model Configuration
DEFAULT_MODEL=mlx-community/whisper-large-v3-turbo
OUTPUT_FORMATS=txt,md,srt,json

# Performance Settings
MAX_WORKERS=4
TEMP_DIR=./temp

# Optional: API Keys for future cloud features
# OPENAI_API_KEY=your_key_here

Available Models

ModelSizeSpeedMemoryBest For
whisper-tiny-mlx39M~10x~150MBQuick drafts
whisper-base-mlx74M~7x~250MBBalanced performance
whisper-small-mlx244M~5x~600MBHigh quality
whisper-medium-mlx769M~3x~1.5GBProfessional use
whisper-large-v3-mlx1550M~2x~3GBMaximum accuracy
whisper-large-v3-turbo809M~4x~1.6GBRecommended

๐Ÿ”ง Usage

Claude Desktop Integration

Add to your Claude Desktop configuration file:

{
  "mcpServers": {
    "whisper-transcription": {
      "command": "poetry",
      "args": ["run", "python", "src/whisper_mcp_server.py"],
      "cwd": "/absolute/path/to/mcp-whisper-transcription"
    }
  }
}

๐Ÿ“ Configuration File Locations:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Standalone Usage

# Run the MCP server directly
poetry run python src/whisper_mcp_server.py

# Or use the development server
poetry run python -m src.whisper_mcp_server

๐Ÿ› ๏ธ Available Tools & Resources

MCP Tools

ToolDescriptionKey Parameters
transcribe_fileTranscribe a single audio/video filefile_path, model, output_formats
batch_transcribeProcess multiple files in a directorydirectory, pattern, max_workers
list_modelsShow available Whisper modelsNone
get_model_infoGet details about a specific modelmodel_id
clear_cacheClear model cachemodel_id (optional)
estimate_processing_timeEstimate transcription timefile_path, model
validate_media_fileCheck file compatibilityfile_path
get_supported_formatsList supported input/output formatsNone

MCP Resources

ResourceDescriptionData Provided
transcription://historyRecent transcriptionsList of all transcriptions
transcription://history/{id}Specific transcription detailsFull transcription metadata
transcription://modelsAvailable modelsModel specifications and status
transcription://configCurrent configurationServer settings and environment
transcription://formatsSupported formatsInput/output format details
transcription://performancePerformance statisticsSpeed, memory, and uptime metrics

Quick Examples

# Single file transcription
result = await client.call_tool("transcribe_file", {
    "file_path": "interview.mp4",
    "output_formats": "txt,srt",
    "model": "mlx-community/whisper-large-v3-turbo"
})

# Transcription with Voice Activity Detection
result = await client.call_tool("transcribe_file", {
    "file_path": "long_interview.mp4",
    "output_formats": "txt,srt",
    "use_vad": True  # Remove silence for faster processing
})

# Batch processing
result = await client.call_tool("batch_transcribe", {
    "directory": "./podcasts",
    "pattern": "*.mp3",
    "max_workers": 4
})

# Check supported formats
formats = await client.call_tool("get_supported_formats", {})

๐Ÿงช Development

Running Tests

# Run all tests
poetry run pytest

# Run with coverage
poetry run pytest --cov=src --cov-report=html

# Run specific test file
poetry run pytest tests/test_mcp_tools.py -v

Code Quality

# Format code
poetry run black .
poetry run isort .

# Type checking (optional)
poetry run mypy src/

# Lint code
poetry run flake8 src/

Project Structure

mcp-whisper-transcription/
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ whisper_mcp_server.py    # Main MCP server
โ”œโ”€โ”€ tests/                       # Comprehensive test suite
โ”œโ”€โ”€ examples/                    # Usage examples and test files
โ”œโ”€โ”€ transcribe_mlx.py           # MLX Whisper integration
โ”œโ”€โ”€ whisper_utils.py            # Utility functions
โ””โ”€โ”€ pyproject.toml              # Project configuration

๐Ÿ“Š Performance Benchmarks

Test Results (Apple M3 Max)

ModelAudio DurationProcessing TimeSpeedMemory
tiny10 minutes1.2 minutes8.3x150MB
base10 minutes1.8 minutes5.6x250MB
small10 minutes2.5 minutes4.0x600MB
medium10 minutes4.2 minutes2.4x1.5GB
large-v310 minutes5.8 minutes1.7x3GB
large-v3-turbo10 minutes3.1 minutes3.2x1.6GB

๐Ÿ”ง Troubleshooting

Common Issues

  1. FFmpeg not found

    brew install ffmpeg
    
  2. Model download slow

    • Models are cached in ~/.cache/huggingface/
    • First download can be slow but subsequent runs are fast
  3. Memory issues

    • Use smaller models (tiny/base) for large files
    • Reduce MAX_WORKERS for concurrent processing
  4. Permission errors

    • Ensure proper file permissions
    • Check output directory write access

See for detailed solutions.

๐Ÿ“‹ Requirements

  • Python 3.10+
  • Apple Silicon Mac (M1, M2, M3, or later)
  • FFmpeg (for video file support)
  • 4GB+ RAM (8GB+ recommended for large models)
  • 2GB+ free disk space (for model cache)

๐Ÿ“„ License

MIT License - see file for details.

๐Ÿค Contributing

Contributions are welcome! Please see for guidelines.

๐Ÿ™ Acknowledgments

  • Built with FastMCP - Modern MCP server framework
  • Powered by MLX Whisper - Apple Silicon optimization
  • Original Whisper by OpenAI - Revolutionary speech recognition
  • Thanks to the MLX team at Apple for the incredible performance optimizations