SmartLittleApps/local-stt-mcp
If you are the rightful owner of local-stt-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A high-performance local speech-to-text MCP server optimized for Apple Silicon, utilizing whisper.cpp for transcription.
transcribe
Basic audio transcription with automatic format conversion.
transcribe_long
Long audio file processing with chunking and format conversion.
transcribe_with_speakers
Speaker diarization and transcription with format support.
list_models
Show available whisper models.
health_check
System diagnostics.
version
Server version information.
Local Speech-to-Text MCP Server
A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
šÆ Features
- š 100% Local Processing: No cloud APIs, complete privacy
- š Apple Silicon Optimized: 15x+ real-time transcription speed
- š¤ Speaker Diarization: Identify and separate multiple speakers
- šµ Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
- š Multiple Output Formats: txt, json, vtt, srt, csv
- š¾ Low Memory Footprint: <2GB memory usage
- š§ TypeScript: Full type safety and modern development
š Quick Start
Prerequisites
- Node.js 18+
- whisper.cpp (
brew install whisper-cpp
) - For audio format conversion: ffmpeg (
brew install ffmpeg
) - automatically handles MP3, M4A, FLAC, OGG, etc. - For speaker diarization: Python 3.8+ and HuggingFace token (free)
Supported Audio Formats
- Native whisper.cpp formats: WAV, FLAC
- Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
- Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
- Format detection: Automatic format detection and conversion when needed
Installation
git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build
# Download whisper models
npm run setup:models
# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here" # Get free token from huggingface.co
Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.
MCP Client Configuration
Add to your MCP client configuration:
{
"mcpServers": {
"whisper-mcp": {
"command": "node",
"args": ["path/to/local-stt-mcp/mcp-server/dist/index.js"]
}
}
}
š ļø Available Tools
Tool | Description |
---|---|
transcribe | Basic audio transcription with automatic format conversion |
transcribe_long | Long audio file processing with chunking and format conversion |
transcribe_with_speakers | Speaker diarization and transcription with format support |
list_models | Show available whisper models |
health_check | System diagnostics |
version | Server version information |
š Performance
Apple Silicon Benchmarks:
- Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
- Memory Usage: <2GB (vs WhisperX ~4GB)
- GPU Acceleration: ā Apple Neural Engine
- Setup: Medium complexity but superior performance
See /benchmarks/
for detailed performance comparisons.
šļø Project Structure
mcp-server/
āāā src/ # TypeScript source code
ā āāā tools/ # MCP tool implementations
ā āāā whisper/ # whisper.cpp integration
ā āāā utils/ # Speaker diarization & utilities
ā āāā types/ # Type definitions
āāā dist/ # Compiled JavaScript
āāā python/ # Python dependencies
š§ Development
# Build
npm run build
# Development mode (watch)
npm run dev
# Linting & formatting
npm run lint
npm run format
# Type checking
npm run type-check
š¤ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
š License
MIT License - see LICENSE file for details.
š Acknowledgments
- whisper.cpp for optimized inference
- OpenAI Whisper for the original models
- Model Context Protocol for the framework
- Pyannote.audio for speaker diarization