local-stt-mcp

SmartLittleApps/local-stt-mcp

3.3

If you are the rightful owner of local-stt-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A high-performance local speech-to-text MCP server optimized for Apple Silicon, utilizing whisper.cpp for transcription.

Tools
  1. transcribe

    Basic audio transcription with automatic format conversion.

  2. transcribe_long

    Long audio file processing with chunking and format conversion.

  3. transcribe_with_speakers

    Speaker diarization and transcription with format support.

  4. list_models

    Show available whisper models.

  5. health_check

    System diagnostics.

  6. version

    Server version information.

Local Speech-to-Text MCP Server

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

šŸŽÆ Features

  • šŸ  100% Local Processing: No cloud APIs, complete privacy
  • šŸš€ Apple Silicon Optimized: 15x+ real-time transcription speed
  • šŸŽ¤ Speaker Diarization: Identify and separate multiple speakers
  • šŸŽµ Universal Audio Support: Automatic conversion from MP3, M4A, FLAC, and more
  • šŸ“ Multiple Output Formats: txt, json, vtt, srt, csv
  • šŸ’¾ Low Memory Footprint: <2GB memory usage
  • šŸ”§ TypeScript: Full type safety and modern development

šŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • whisper.cpp (brew install whisper-cpp)
  • For audio format conversion: ffmpeg (brew install ffmpeg) - automatically handles MP3, M4A, FLAC, OGG, etc.
  • For speaker diarization: Python 3.8+ and HuggingFace token (free)

Supported Audio Formats

  • Native whisper.cpp formats: WAV, FLAC
  • Auto-converted formats: MP3, M4A, AAC, OGG, WMA, and more
  • Automatic conversion: Powered by ffmpeg with 16kHz/mono optimization for whisper.cpp
  • Format detection: Automatic format detection and conversion when needed

Installation

git clone https://github.com/your-username/local-stt-mcp.git
cd local-stt-mcp/mcp-server
npm install
npm run build

# Download whisper models
npm run setup:models

# For speaker diarization, set HuggingFace token
export HF_TOKEN="your_token_here"  # Get free token from huggingface.co

Speaker Diarization Note: Requires HuggingFace account and accepting pyannote/speaker-diarization-3.1 license.

MCP Client Configuration

Add to your MCP client configuration:

{
  "mcpServers": {
    "whisper-mcp": {
      "command": "node",
      "args": ["path/to/local-stt-mcp/mcp-server/dist/index.js"]
    }
  }
}

šŸ› ļø Available Tools

ToolDescription
transcribeBasic audio transcription with automatic format conversion
transcribe_longLong audio file processing with chunking and format conversion
transcribe_with_speakersSpeaker diarization and transcription with format support
list_modelsShow available whisper models
health_checkSystem diagnostics
versionServer version information

šŸ“Š Performance

Apple Silicon Benchmarks:

  • Processing Speed: 15.8x real-time (vs WhisperX 5.5x)
  • Memory Usage: <2GB (vs WhisperX ~4GB)
  • GPU Acceleration: āœ… Apple Neural Engine
  • Setup: Medium complexity but superior performance

See /benchmarks/ for detailed performance comparisons.

šŸ—ļø Project Structure

mcp-server/
ā”œā”€ā”€ src/                    # TypeScript source code
│   ā”œā”€ā”€ tools/             # MCP tool implementations
│   ā”œā”€ā”€ whisper/           # whisper.cpp integration
│   ā”œā”€ā”€ utils/             # Speaker diarization & utilities
│   └── types/             # Type definitions
ā”œā”€ā”€ dist/                  # Compiled JavaScript
└── python/                # Python dependencies

šŸ”§ Development

# Build
npm run build

# Development mode (watch)
npm run dev

# Linting & formatting
npm run lint
npm run format

# Type checking
npm run type-check

šŸ¤ Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

šŸ“„ License

MIT License - see LICENSE file for details.

šŸ™ Acknowledgments