mcp-text-to-speech

michaelyuwh/mcp-text-to-speech

3.2

If you are the rightful owner of mcp-text-to-speech and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The MCP Text-to-Speech Server is a robust, offline-first TTS solution designed for privacy-conscious applications, offering both local and cloud-based synthesis options.

Tools
5
Resources
0
Prompts
0

MCP Text-to-Speech Server

A powerful, offline-first Text-to-Speech (TTS) MCP server that works completely locally without requiring internet connectivity or API keys. Perfect for privacy-conscious applications, development environments, and production deployments where data sovereignty is important.

🎯 Key Features

  • 🔒 Completely Offline: Works without internet connection using local TTS engines
  • 🌐 Online Fallback: Supports cloud TTS services when needed
  • 🤖 Multiple Engines: pyttsx3, espeak, festival, Coqui TTS, gTTS, Azure, Polly, Watson
  • 🚀 Auto-Detection: Automatically selects the best available TTS engine
  • 🐳 Docker Ready: Optimized multi-platform Docker containers
  • 🎵 Multiple Formats: Supports WAV, MP3, and other audio formats
  • 🌍 Multi-Language: Supports dozens of languages and accents
  • ⚡ High Performance: Optimized for speed and low resource usage
  • 🔧 Easy Integration: Simple MCP protocol integration

🛠️ Supported TTS Engines

Offline Engines (No Internet Required)

  • pyttsx3: Cross-platform offline TTS with system voices
  • espeak: Lightweight, open-source TTS for Linux
  • festival: High-quality speech synthesis for Linux
  • Coqui TTS: AI-based neural TTS with excellent quality

Online Services (Internet Required)

  • Google TTS (gTTS): Free, high-quality synthesis
  • Azure Cognitive Services: Premium neural voices
  • Amazon Polly: Professional-grade TTS with neural voices
  • IBM Watson: Enterprise-level speech synthesis

📦 Quick Start

Option 1: Python Installation (Recommended for Development)

# Clone the repository
git clone <repository-url>
cd mcp-text-to-speech

# Install with uv (recommended)
pip install uv
uv pip install .

# Or install with pip
pip install .

# Run with auto-detection
python -m mcp_text_to_speech

# Check available engines
python -m mcp_text_to_speech --info

Option 2: Docker (Recommended for Production)

Docker Hub (Optimized Images)
# Ultra-slim production image (406MB, optimized)
docker pull michaelyuwh/mcp-text-to-speech:slim
docker run -p 8000:8000 michaelyuwh/mcp-text-to-speech:slim

# Versioned slim image
docker pull michaelyuwh/mcp-text-to-speech:v1.0.0-slim
docker run -p 8000:8000 -v ./output:/app/output michaelyuwh/mcp-text-to-speech:v1.0.0-slim

# GitHub Container Registry (alternative)
docker pull ghcr.io/michaelyuwh/mcp-text-to-speech:slim
docker run -p 8000:8000 ghcr.io/michaelyuwh/mcp-text-to-speech:slim
Build from Source
# Quick start with Docker Compose
docker-compose up -d

# Build standard image
docker build -t mcp-text-to-speech .

# Build slim image (optimized)
docker build -f Dockerfile.slim -t mcp-text-to-speech:slim .

# Run with audio output
docker run -it --rm \
  -v ./output:/app/output \
  --device /dev/snd:/dev/snd \
  mcp-text-to-speech:slim

Option 3: Development Setup

# Clone and setup development environment
git clone <repository-url>
cd mcp-text-to-speech

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\\Scripts\\activate

# Install in development mode
pip install -e ".[dev]"

# Install offline TTS engines
sudo apt-get install espeak festival  # Linux
# On macOS: brew install espeak festival

# Run development server
python -m mcp_text_to_speech --debug

🎛️ Usage Examples

Basic Text-to-Speech

# Using MCP protocol
{
  "method": "tools/call",
  "params": {
    "name": "synthesize_speech",
    "arguments": {
      "text": "Hello, this is a test of text-to-speech synthesis!",
      "engine": "auto",
      "language": "en"
    }
  }
}

Advanced Configuration

# High-quality synthesis with specific voice
{
  "method": "tools/call",
  "params": {
    "name": "synthesize_speech",
    "arguments": {
      "text": "Welcome to our application",
      "engine": "pyttsx3",
      "voice": "female",
      "speed": 180,
      "language": "en",
      "output_file": "/app/output/welcome.wav"
    }
  }
}

Batch Processing

# Convert multiple texts at once
{
  "method": "tools/call",
  "params": {
    "name": "batch_synthesize",
    "arguments": {
      "texts": [
        "Welcome to our service",
        "Please select an option",
        "Thank you for your choice"
      ],
      "engine": "auto",
      "output_dir": "/app/output/batch"
    }
  }
}

🌍 Language Support

Enhanced Chinese & Cantonese Support 🇭🇰

Perfect for Hong Kong users and Cantonese speakers:

Cantonese (粵語)
  • Offline: macOS Sinji voice (zh-HK) - Native Hong Kong Cantonese
  • Online: gTTS Cantonese (yue) - High-quality synthesis
  • Smart Mapping: zh-HK, cantonese → Auto-selects best Cantonese voice
Mandarin Chinese (普通話)
  • Simplified Chinese: zh-CN - Mainland China
  • Traditional Chinese: zh-TW - Taiwan
  • Generic Chinese: zh - Default Mandarin
Language Usage Examples
# Hong Kong Cantonese (Offline)
{"language": "zh-HK", "engine": "pyttsx3"}  # → Sinji voice

# Cantonese (Online)  
{"language": "yue", "engine": "gtts"}       # → gTTS Cantonese

# Auto-detection
{"language": "cantonese", "engine": "auto"} # → Best available

Other Supported Languages

The server supports numerous languages depending on the engine:

  • English: en, en-US, en-GB, en-AU
  • Spanish: es, es-ES, es-MX, es-AR
  • French: fr, fr-FR, fr-CA
  • German: de, de-DE, de-AT
  • Italian: it, it-IT
  • Portuguese: pt, pt-PT, pt-BR
  • Russian: ru
  • Japanese: ja
  • Korean: ko
  • Chinese: zh, zh-CN, zh-TW, yue (Cantonese)
  • And many more...

⚙️ Configuration

Environment Variables

# Force specific mode
export TTS_MODE=offline  # or 'online' or 'auto'

# Cache and output directories
export TTS_CACHE_DIR=/tmp/tts_cache
export TTS_OUTPUT_DIR=/app/output

# Online service credentials (optional)
export AZURE_SPEECH_KEY=your_key
export AZURE_SPEECH_REGION=eastus
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export IBM_WATSON_APIKEY=your_key
export IBM_WATSON_URL=your_url

Command Line Options

# Auto-detection (default)
python -m mcp_text_to_speech

# Force offline mode
python -m mcp_text_to_speech --offline

# Force online mode
python -m mcp_text_to_speech --online

# Show environment info
python -m mcp_text_to_speech --info

# Debug mode with detailed logging
python -m mcp_text_to_speech --debug

🐳 Docker Deployment

Production Deployment (Recommended)

# docker-compose.yml - Using optimized slim image
version: '3.8'
services:
  mcp-tts:
    image: michaelyuwh/mcp-text-to-speech:slim  # 406MB optimized image
    restart: unless-stopped
    volumes:
      - ./output:/app/output
      - tts_cache:/tmp/tts_cache
    environment:
      - TTS_MODE=offline
    devices:
      - /dev/snd:/dev/snd
    ports:
      - "8000:8000"

volumes:
  tts_cache:

Docker Image Options

Image TypeSizeUse CaseCommand
Slim406MBProductionmichaelyuwh/mcp-text-to-speech:slim
Standard~800MBDevelopmentBuild from source
LatestVariableTestingmichaelyuwh/mcp-text-to-speech:latest

Multi-Platform Build

# Build for multiple architectures
docker buildx create --use
docker buildx build --platform linux/amd64,linux/arm64 -t mcp-text-to-speech:latest .

🛠️ Available MCP Tools

1. get_available_engines

Get list of available TTS engines and their capabilities.

2. synthesize_speech

Convert text to speech with customizable options:

  • Text content
  • Engine selection
  • Voice selection
  • Speed/rate control
  • Language selection
  • Output format

3. list_voices

List available voices for each engine with details:

  • Voice IDs and names
  • Supported languages
  • Gender information

4. play_audio

Play generated audio files through the system audio.

5. batch_synthesize

Convert multiple texts to speech files efficiently.

6. Online Service Tools

  • get_available_services: List online TTS services
  • synthesize_speech_online: Use cloud TTS services
  • list_online_voices: Browse cloud voice options
  • get_service_limits: Check API usage and limits

🔧 System Requirements

Minimum Requirements

  • Python 3.9+
  • 512MB RAM
  • 100MB disk space
  • Audio output capability

Recommended for Production

  • Python 3.11+
  • 1GB RAM
  • 1GB disk space
  • Linux with audio system (ALSA/PulseAudio)

Dependencies

Core Dependencies:

  • mcp >= 1.0.0
  • pyttsx3 >= 2.90 (cross-platform TTS)
  • pygame >= 2.0.0 (audio playback)

Optional TTS Engines:

  • gtts >= 2.3.0 (Google TTS)
  • TTS >= 0.22.0 (Coqui AI TTS)
  • azure-cognitiveservices-speech (Azure)
  • boto3 (Amazon Polly)
  • ibm-watson (IBM Watson)

System Dependencies (Linux):

sudo apt-get install espeak festival alsa-utils pulseaudio sox ffmpeg

🚀 Performance Optimization

Speed Optimizations

  • Engine Selection: pyttsx3 for speed, Coqui for quality
  • Caching: Automatic caching of generated audio
  • Batch Processing: Efficient multi-text synthesis
  • Resource Management: Memory-efficient streaming

Resource Usage

  • Offline Mode: ~100-500MB RAM
  • Online Mode: ~50-200MB RAM
  • Disk Cache: ~10MB per hour of audio
  • CPU: Low usage except during synthesis

🔍 Troubleshooting

Common Issues

No TTS engines available:

# Install offline engines
pip install pyttsx3 gtts
sudo apt-get install espeak  # Linux

# Check environment
python -m mcp_text_to_speech --info

Audio playback issues:

# Check audio system
pulseaudio --check -v
aplay -l

# Configure Docker audio
docker run --device /dev/snd:/dev/snd mcp-text-to-speech

Online service errors:

# Check credentials
export AZURE_SPEECH_KEY=your_key
export AZURE_SPEECH_REGION=your_region

# Test connectivity
python -c "from gtts import gTTS; print('gTTS works')"

Debug Mode

# Run with detailed logging
python -m mcp_text_to_speech --debug

# Check specific engine
python -c "import pyttsx3; engine = pyttsx3.init(); print('pyttsx3 works')"

🤝 Integration Examples

n8n Integration

// n8n workflow node
{
  "nodes": [
    {
      "name": "Text to Speech",
      "type": "mcp-text-to-speech",
      "parameters": {
        "text": "{{ $json.message }}",
        "engine": "auto",
        "language": "en"
      }
    }
  ]
}

Claude Desktop Integration

// claude_desktop_config.json
{
  "mcpServers": {
    "text-to-speech": {
      "command": "python",
      "args": ["-m", "mcp_text_to_speech"],
      "cwd": "/path/to/mcp-text-to-speech"
    }
  }
}

🔒 Privacy & Security

  • Data Privacy: All text processing happens locally
  • No Telemetry: No data sent to external services (offline mode)
  • Secure Defaults: Non-root Docker containers
  • Credential Management: Environment-based configuration
  • Audit Trail: Comprehensive logging available

📊 Benchmarks

EngineQualitySpeedMemoryOffline
pyttsx3GoodFast100MB✅
espeakBasicVery Fast50MB✅
Coqui TTSExcellentMedium500MB✅
gTTSExcellentFast100MB❌
AzureExcellentFast150MB❌

🗺️ Roadmap

  • WebRTC Integration: Real-time streaming synthesis
  • Voice Cloning: Custom voice model support
  • SSML Support: Advanced speech markup language
  • Emotion Control: Emotional expression in synthesis
  • Multilingual Models: Advanced language switching
  • Performance Dashboard: Real-time monitoring
  • Plugin System: Custom engine integration

🤝 Contributing

We welcome contributions! Please see our for details.

Development Setup

# Fork and clone the repository
git clone https://github.com/yourusername/mcp-text-to-speech.git
cd mcp-text-to-speech

# Setup development environment
python -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

# Run tests
pytest tests/

# Code formatting
black src/
isort src/

# Type checking
mypy src/

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments

🆘 Support


Made with ❤️ for the MCP community