advanced-tts-mcp

advanced-tts-mcp

3.2

If you are the rightful owner of advanced-tts-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Advanced TTS MCP Server is a high-quality, feature-rich Text-to-Speech server implemented in TypeScript, designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.

Advanced TTS MCP Server

A high-quality, feature-rich Text-to-Speech MCP server with native TypeScript implementation. Designed for professional applications requiring natural, expressive speech synthesis with advanced controls and zero external dependencies.

✨ Features

šŸŽÆ Advanced Voice Control

  • 10 High-Quality Voices - Male and female voices with distinct personalities
  • Emotion Control - Neutral, happy, excited, calm, serious, casual, confident
  • Dynamic Pacing - Natural, conversational, presentation, tutorial, narrative modes
  • Speed & Volume - Precise control from 0.25x to 3.0x speed, 0.1x to 2.0x volume

šŸš€ Professional Capabilities

  • Streaming Audio - Real-time synthesis and playback
  • Batch Processing - Handle multiple text segments efficiently
  • Multiple Formats - WAV, MP3, FLAC, OGG output support
  • Natural Speech Enhancement - Automatic pause insertion and emotion markers
  • Queue Management - Handle multiple concurrent requests

šŸ”§ MCP Integration

  • 6 Powerful Tools - Complete synthesis, batch processing, voice management
  • 2 Rich Resources - Voice capabilities and usage examples
  • Real-time Status - Track processing progress and manage requests
  • File Management - Save, list, and organize audio outputs

šŸš€ Quick Start

Option 1: Deploy to Smithery.ai (Recommended)

šŸŽÆ One-Click Deployment to Smithery Platform

  1. Deploy Now: Visit Smithery.ai and import this repository
  2. Configure: Set your preferred voice and speech settings
  3. Use Instantly: Access via Claude Desktop or any MCP-compatible client

Benefits:

  • āœ… Zero setup required
  • āœ… Automatic scaling and updates
  • āœ… No model downloads needed
  • āœ… Enterprise-grade hosting

Option 2: Local Installation

Prerequisites:

  • Node.js 18+

Installation:

  1. Clone the repository
git clone https://github.com/samihalawa/advanced-tts-mcp.git
cd advanced-tts-mcp
  1. Install dependencies
npm install
  1. Configure Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "advanced-tts": {
      "command": "node",
      "args": ["dist/index.js"],
      "cwd": "/path/to/advanced-tts-mcp"
    }
  }
}
  1. Start using!
# Build TypeScript
npm run build

# Start server
npm start

Restart Claude Desktop and start synthesizing with natural, expressive voices.

šŸŽ™ļø Available Voices

Voice IDNameGenderDescription
af_heartHeartFemaleWarm, friendly voice (default)
af_skySkyFemaleClear, bright voice
af_bellaBellaFemaleElegant, sophisticated voice
af_sarahSarahFemaleProfessional, confident voice
af_nicoleNicoleFemaleGentle, soothing voice
am_adamAdamMaleStrong, authoritative voice
am_michaelMichaelMaleFriendly, approachable voice
bf_emmaEmmaFemaleYoung, energetic voice
bf_isabellaIsabellaFemaleMature, expressive voice
bm_lewisLewisMaleDeep, resonant voice

šŸ“š Usage Examples

Basic Synthesis

# Simple text-to-speech
await synthesize_speech(
    text="Hello! Welcome to Advanced TTS.",
    voice_id="af_heart"
)

Emotional Expression

# Excited announcement
await synthesize_speech(
    text="This is amazing news! You're going to love this new feature!",
    voice_id="af_heart",
    emotion="excited",
    pacing="conversational",
    speed=1.1
)

Professional Presentation

# Tutorial narration
await synthesize_speech(
    text="Step one: Open your browser. Step two: Navigate to the website.",
    voice_id="am_adam", 
    emotion="calm",
    pacing="tutorial",
    speed=0.9
)

Batch Processing

# Multiple segments with pauses
await batch_synthesize(
    segments=[
        "Welcome to our presentation.",
        "Today we'll cover three main topics.", 
        "Let's begin with the first topic."
    ],
    voice_id="af_sarah",
    emotion="confident",
    pacing="presentation",
    merge_output=True,
    segment_pause=1.0,
    save_file=True
)

šŸ› ļø Available Tools

synthesize_speech

Convert text to natural speech with full control over voice characteristics.

Parameters:

  • text - Text to synthesize (max 10,000 chars)
  • voice_id - Voice selection (see table above)
  • speed - Speech rate (0.25-3.0)
  • emotion - Voice emotion (neutral, happy, excited, calm, serious, casual, confident)
  • pacing - Speech style (natural, conversational, presentation, tutorial, narrative, fast, slow)
  • volume - Audio volume (0.1-2.0)
  • output_format - File format (wav, mp3, flac, ogg)
  • save_file - Save to file (boolean)
  • filename - Custom filename

batch_synthesize

Process multiple text segments efficiently with optional merging.

Parameters:

  • segments - List of text segments
  • merge_output - Combine into single file
  • segment_pause - Pause between segments (0.0-5.0s)
  • All synthesis parameters from above

get_voices

Retrieve complete voice information and capabilities.

get_status

Check processing status for synthesis requests.

cancel_request

Cancel active synthesis operations.

list_output_files

Browse saved audio files with metadata.

šŸŽ›ļø Voice Controls

Emotions

  • Neutral - Standard, professional tone
  • Happy - Upbeat, cheerful expression
  • Excited - Enthusiastic, energetic delivery
  • Calm - Relaxed, soothing tone
  • Serious - Formal, authoritative delivery
  • Casual - Relaxed, conversational style
  • Confident - Assured, professional tone

Pacing Styles

  • Natural - Balanced, human-like rhythm
  • Conversational - Casual discussion pace
  • Presentation - Professional speaking rhythm
  • Tutorial - Educational, clear delivery
  • Narrative - Storytelling pace
  • Fast - Quick delivery (1.2x base speed)
  • Slow - Deliberate delivery (0.8x base speed)

šŸŽµ Audio Formats

FormatQualityUse Case
WAVUncompressedHighest quality, editing
MP3CompressedWeb, streaming, sharing
FLACLosslessArchival, high-quality storage
OGGCompressedOpen source alternative

šŸ”§ Configuration

Environment Variables

# Model paths (optional)
KOKORO_MODEL_PATH=./kokoro-v1.0.onnx
KOKORO_VOICES_PATH=./voices-v1.0.bin

# Output settings
TTS_OUTPUT_DIR=./audio_output
TTS_MAX_QUEUE_SIZE=100

# Audio settings  
TTS_DEFAULT_VOICE=af_heart
TTS_ENABLE_STREAMING=true

Server Configuration

config = ServerConfig(
    model_path="./kokoro-v1.0.onnx",
    voices_path="./voices-v1.0.bin", 
    output_dir="./audio_output",
    max_queue_size=100,
    enable_streaming=True,
    default_voice="af_heart"
)

šŸ—ļø Architecture

ā”œā”€ā”€ src/advanced_tts/
│   ā”œā”€ā”€ __init__.py          # Package initialization
│   ā”œā”€ā”€ server.py            # MCP server implementation  
│   ā”œā”€ā”€ engine.py            # Kokoro TTS engine wrapper
│   ā”œā”€ā”€ models.py            # Data models and validation
│   └── utils.py             # Utility functions
ā”œā”€ā”€ pyproject.toml           # Project configuration
ā”œā”€ā”€ README.md               # Documentation
└── LICENSE                 # MIT License

šŸ¤ Contributing

Contributions welcome! Areas for improvement:

  • Additional voice models
  • Real-time streaming synthesis
  • Advanced audio effects
  • Multi-language support
  • Performance optimizations

šŸ“„ License

MIT License - see for details.

šŸ™ Acknowledgments

  • Kokoro TTS - High-quality neural voice synthesis
  • MCP Protocol - Seamless AI model integration
  • FastMCP - Efficient server framework

Developed by Sami Halawa

Transform your text into natural, expressive speech with Advanced TTS MCP Server.