michaelyuwh/mcp-text-to-speech
If you are the rightful owner of mcp-text-to-speech and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The MCP Text-to-Speech Server is a robust, offline-first TTS solution designed for privacy-conscious applications, offering both local and cloud-based synthesis options.
MCP Text-to-Speech Server
A powerful, offline-first Text-to-Speech (TTS) MCP server that works completely locally without requiring internet connectivity or API keys. Perfect for privacy-conscious applications, development environments, and production deployments where data sovereignty is important.
đŻ Key Features
- đ Completely Offline: Works without internet connection using local TTS engines
- đ Online Fallback: Supports cloud TTS services when needed
- đ¤ Multiple Engines: pyttsx3, espeak, festival, Coqui TTS, gTTS, Azure, Polly, Watson
- đ Auto-Detection: Automatically selects the best available TTS engine
- đł Docker Ready: Optimized multi-platform Docker containers
- đľ Multiple Formats: Supports WAV, MP3, and other audio formats
- đ Multi-Language: Supports dozens of languages and accents
- ⥠High Performance: Optimized for speed and low resource usage
- đ§ Easy Integration: Simple MCP protocol integration
đ ď¸ Supported TTS Engines
Offline Engines (No Internet Required)
- pyttsx3: Cross-platform offline TTS with system voices
- espeak: Lightweight, open-source TTS for Linux
- festival: High-quality speech synthesis for Linux
- Coqui TTS: AI-based neural TTS with excellent quality
Online Services (Internet Required)
- Google TTS (gTTS): Free, high-quality synthesis
- Azure Cognitive Services: Premium neural voices
- Amazon Polly: Professional-grade TTS with neural voices
- IBM Watson: Enterprise-level speech synthesis
đŚ Quick Start
Option 1: Python Installation (Recommended for Development)
# Clone the repository
git clone <repository-url>
cd mcp-text-to-speech
# Install with uv (recommended)
pip install uv
uv pip install .
# Or install with pip
pip install .
# Run with auto-detection
python -m mcp_text_to_speech
# Check available engines
python -m mcp_text_to_speech --info
Option 2: Docker (Recommended for Production)
Docker Hub (Optimized Images)
# Ultra-slim production image (406MB, optimized)
docker pull michaelyuwh/mcp-text-to-speech:slim
docker run -p 8000:8000 michaelyuwh/mcp-text-to-speech:slim
# Versioned slim image
docker pull michaelyuwh/mcp-text-to-speech:v1.0.0-slim
docker run -p 8000:8000 -v ./output:/app/output michaelyuwh/mcp-text-to-speech:v1.0.0-slim
# GitHub Container Registry (alternative)
docker pull ghcr.io/michaelyuwh/mcp-text-to-speech:slim
docker run -p 8000:8000 ghcr.io/michaelyuwh/mcp-text-to-speech:slim
Build from Source
# Quick start with Docker Compose
docker-compose up -d
# Build standard image
docker build -t mcp-text-to-speech .
# Build slim image (optimized)
docker build -f Dockerfile.slim -t mcp-text-to-speech:slim .
# Run with audio output
docker run -it --rm \
-v ./output:/app/output \
--device /dev/snd:/dev/snd \
mcp-text-to-speech:slim
Option 3: Development Setup
# Clone and setup development environment
git clone <repository-url>
cd mcp-text-to-speech
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install in development mode
pip install -e ".[dev]"
# Install offline TTS engines
sudo apt-get install espeak festival # Linux
# On macOS: brew install espeak festival
# Run development server
python -m mcp_text_to_speech --debug
đď¸ Usage Examples
Basic Text-to-Speech
# Using MCP protocol
{
"method": "tools/call",
"params": {
"name": "synthesize_speech",
"arguments": {
"text": "Hello, this is a test of text-to-speech synthesis!",
"engine": "auto",
"language": "en"
}
}
}
Advanced Configuration
# High-quality synthesis with specific voice
{
"method": "tools/call",
"params": {
"name": "synthesize_speech",
"arguments": {
"text": "Welcome to our application",
"engine": "pyttsx3",
"voice": "female",
"speed": 180,
"language": "en",
"output_file": "/app/output/welcome.wav"
}
}
}
Batch Processing
# Convert multiple texts at once
{
"method": "tools/call",
"params": {
"name": "batch_synthesize",
"arguments": {
"texts": [
"Welcome to our service",
"Please select an option",
"Thank you for your choice"
],
"engine": "auto",
"output_dir": "/app/output/batch"
}
}
}
đ Language Support
Enhanced Chinese & Cantonese Support đđ°
Perfect for Hong Kong users and Cantonese speakers:
Cantonese (精čŞ)
- Offline: macOS Sinji voice (
zh-HK
) - Native Hong Kong Cantonese - Online: gTTS Cantonese (
yue
) - High-quality synthesis - Smart Mapping:
zh-HK
,cantonese
â Auto-selects best Cantonese voice
Mandarin Chinese (ćŽé芹)
- Simplified Chinese:
zh-CN
- Mainland China - Traditional Chinese:
zh-TW
- Taiwan - Generic Chinese:
zh
- Default Mandarin
Language Usage Examples
# Hong Kong Cantonese (Offline)
{"language": "zh-HK", "engine": "pyttsx3"} # â Sinji voice
# Cantonese (Online)
{"language": "yue", "engine": "gtts"} # â gTTS Cantonese
# Auto-detection
{"language": "cantonese", "engine": "auto"} # â Best available
Other Supported Languages
The server supports numerous languages depending on the engine:
- English: en, en-US, en-GB, en-AU
- Spanish: es, es-ES, es-MX, es-AR
- French: fr, fr-FR, fr-CA
- German: de, de-DE, de-AT
- Italian: it, it-IT
- Portuguese: pt, pt-PT, pt-BR
- Russian: ru
- Japanese: ja
- Korean: ko
- Chinese: zh, zh-CN, zh-TW, yue (Cantonese)
- And many more...
âď¸ Configuration
Environment Variables
# Force specific mode
export TTS_MODE=offline # or 'online' or 'auto'
# Cache and output directories
export TTS_CACHE_DIR=/tmp/tts_cache
export TTS_OUTPUT_DIR=/app/output
# Online service credentials (optional)
export AZURE_SPEECH_KEY=your_key
export AZURE_SPEECH_REGION=eastus
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export IBM_WATSON_APIKEY=your_key
export IBM_WATSON_URL=your_url
Command Line Options
# Auto-detection (default)
python -m mcp_text_to_speech
# Force offline mode
python -m mcp_text_to_speech --offline
# Force online mode
python -m mcp_text_to_speech --online
# Show environment info
python -m mcp_text_to_speech --info
# Debug mode with detailed logging
python -m mcp_text_to_speech --debug
đł Docker Deployment
Production Deployment (Recommended)
# docker-compose.yml - Using optimized slim image
version: '3.8'
services:
mcp-tts:
image: michaelyuwh/mcp-text-to-speech:slim # 406MB optimized image
restart: unless-stopped
volumes:
- ./output:/app/output
- tts_cache:/tmp/tts_cache
environment:
- TTS_MODE=offline
devices:
- /dev/snd:/dev/snd
ports:
- "8000:8000"
volumes:
tts_cache:
Docker Image Options
Image Type | Size | Use Case | Command |
---|---|---|---|
Slim | 406MB | Production | michaelyuwh/mcp-text-to-speech:slim |
Standard | ~800MB | Development | Build from source |
Latest | Variable | Testing | michaelyuwh/mcp-text-to-speech:latest |
Multi-Platform Build
# Build for multiple architectures
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 -t mcp-text-to-speech:latest .
đ ď¸ Available MCP Tools
1. get_available_engines
Get list of available TTS engines and their capabilities.
2. synthesize_speech
Convert text to speech with customizable options:
- Text content
- Engine selection
- Voice selection
- Speed/rate control
- Language selection
- Output format
3. list_voices
List available voices for each engine with details:
- Voice IDs and names
- Supported languages
- Gender information
4. play_audio
Play generated audio files through the system audio.
5. batch_synthesize
Convert multiple texts to speech files efficiently.
6. Online Service Tools
get_available_services
: List online TTS servicessynthesize_speech_online
: Use cloud TTS serviceslist_online_voices
: Browse cloud voice optionsget_service_limits
: Check API usage and limits
đ§ System Requirements
Minimum Requirements
- Python 3.9+
- 512MB RAM
- 100MB disk space
- Audio output capability
Recommended for Production
- Python 3.11+
- 1GB RAM
- 1GB disk space
- Linux with audio system (ALSA/PulseAudio)
Dependencies
Core Dependencies:
mcp
>= 1.0.0pyttsx3
>= 2.90 (cross-platform TTS)pygame
>= 2.0.0 (audio playback)
Optional TTS Engines:
gtts
>= 2.3.0 (Google TTS)TTS
>= 0.22.0 (Coqui AI TTS)azure-cognitiveservices-speech
(Azure)boto3
(Amazon Polly)ibm-watson
(IBM Watson)
System Dependencies (Linux):
sudo apt-get install espeak festival alsa-utils pulseaudio sox ffmpeg
đ Performance Optimization
Speed Optimizations
- Engine Selection: pyttsx3 for speed, Coqui for quality
- Caching: Automatic caching of generated audio
- Batch Processing: Efficient multi-text synthesis
- Resource Management: Memory-efficient streaming
Resource Usage
- Offline Mode: ~100-500MB RAM
- Online Mode: ~50-200MB RAM
- Disk Cache: ~10MB per hour of audio
- CPU: Low usage except during synthesis
đ Troubleshooting
Common Issues
No TTS engines available:
# Install offline engines
pip install pyttsx3 gtts
sudo apt-get install espeak # Linux
# Check environment
python -m mcp_text_to_speech --info
Audio playback issues:
# Check audio system
pulseaudio --check -v
aplay -l
# Configure Docker audio
docker run --device /dev/snd:/dev/snd mcp-text-to-speech
Online service errors:
# Check credentials
export AZURE_SPEECH_KEY=your_key
export AZURE_SPEECH_REGION=your_region
# Test connectivity
python -c "from gtts import gTTS; print('gTTS works')"
Debug Mode
# Run with detailed logging
python -m mcp_text_to_speech --debug
# Check specific engine
python -c "import pyttsx3; engine = pyttsx3.init(); print('pyttsx3 works')"
đ¤ Integration Examples
n8n Integration
// n8n workflow node
{
"nodes": [
{
"name": "Text to Speech",
"type": "mcp-text-to-speech",
"parameters": {
"text": "{{ $json.message }}",
"engine": "auto",
"language": "en"
}
}
]
}
Claude Desktop Integration
// claude_desktop_config.json
{
"mcpServers": {
"text-to-speech": {
"command": "python",
"args": ["-m", "mcp_text_to_speech"],
"cwd": "/path/to/mcp-text-to-speech"
}
}
}
đ Privacy & Security
- Data Privacy: All text processing happens locally
- No Telemetry: No data sent to external services (offline mode)
- Secure Defaults: Non-root Docker containers
- Credential Management: Environment-based configuration
- Audit Trail: Comprehensive logging available
đ Benchmarks
Engine | Quality | Speed | Memory | Offline |
---|---|---|---|---|
pyttsx3 | Good | Fast | 100MB | â |
espeak | Basic | Very Fast | 50MB | â |
Coqui TTS | Excellent | Medium | 500MB | â |
gTTS | Excellent | Fast | 100MB | â |
Azure | Excellent | Fast | 150MB | â |
đşď¸ Roadmap
- WebRTC Integration: Real-time streaming synthesis
- Voice Cloning: Custom voice model support
- SSML Support: Advanced speech markup language
- Emotion Control: Emotional expression in synthesis
- Multilingual Models: Advanced language switching
- Performance Dashboard: Real-time monitoring
- Plugin System: Custom engine integration
đ¤ Contributing
We welcome contributions! Please see our for details.
Development Setup
# Fork and clone the repository
git clone https://github.com/yourusername/mcp-text-to-speech.git
cd mcp-text-to-speech
# Setup development environment
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"
# Run tests
pytest tests/
# Code formatting
black src/
isort src/
# Type checking
mypy src/
đ License
This project is licensed under the MIT License - see the file for details.
đ Acknowledgments
- pyttsx3 - Cross-platform TTS library
- gTTS - Google Text-to-Speech wrapper
- Coqui TTS - Advanced neural TTS
- MCP Protocol - Model Context Protocol specification
- espeak - Compact open source TTS
- Festival - Speech synthesis system
đ Support
- Documentation: Full documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@yourcompany.com
Made with â¤ď¸ for the MCP community