tts-mcp-server

WaterTaoMind/tts-mcp-server

3.1

If you are the rightful owner of tts-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A comprehensive Text-to-Speech Model Context Protocol (MCP) server that provides advanced speech synthesis capabilities through Azure Cognitive Services and OpenAI APIs.

Tools
4
Resources
0
Prompts
0

TTS MCP Server

A comprehensive Text-to-Speech Model Context Protocol (MCP) server that provides advanced speech synthesis capabilities through Azure Cognitive Services and OpenAI APIs.

Features

🎯 Multi-Provider Support

  • Azure Cognitive Services Speech: Full SSML support, neural voices, multi-language
  • OpenAI TTS: High-quality synthesis with natural voices
  • Automatic provider fallback and load balancing

🗣️ Advanced Voice Control

  • 400+ neural voices across 100+ languages
  • Real-time voice discovery and caching
  • Voice recommendation system for different use cases
  • Gender, language, and style filtering

🎙️ Podcast Generation

  • Multi-speaker conversation synthesis
  • Automatic speaker detection from scripts
  • SSML optimization for natural dialogue
  • Support for 10+ speakers per episode

🛠️ MCP Integration

  • Tools: synthesize-text, synthesize-ssml, generate-podcast, list-voices
  • Resources: Voice catalog, server status, performance metrics
  • Full TypeScript SDK compatibility

Quick Start

Prerequisites

  • Node.js 18+
  • Azure Speech Services subscription (optional)
  • OpenAI API key (optional)

Installation

# Clone and install
cd src/mcp-servers/azure-tts
npm install

# Build the server
npm run build

# Set environment variables
export AZURE_SPEECH_KEY="your_azure_key"
export AZURE_SPEECH_REGION="eastus"
export OPENAI_API_KEY="your_openai_key"
export TTS_OUTPUT_DIR="/path/to/audio/output"

# Start the server
npm start

MCP Configuration

Add to your MCP client configuration:

{
  "tts-server": {
    "command": "node",
    "args": ["/path/to/tts-mcp-server/dist/server.js"],
    "env": {
      "AZURE_SPEECH_KEY": "your_azure_key",
      "AZURE_SPEECH_REGION": "eastus",
      "OPENAI_API_KEY": "your_openai_key",
      "TTS_OUTPUT_DIR": "/tmp/tts-output",
      "TTS_DEFAULT_PROVIDER": "azure",
      "TTS_MAX_CONCURRENT": "3"
    }
  }
}

Tools

synthesize-text

Convert plain text to speech with voice and parameter control.

{
  "text": "Hello, this is a test of text-to-speech synthesis.",
  "voice": "en-US-JennyNeural",
  "provider": "azure",
  "outputFormat": "wav",
  "speed": 1.0,
  "pitch": 0
}

synthesize-ssml

Advanced SSML markup synthesis with validation and optimization.

{
  "ssml": "<speak version=\"1.0\" xmlns=\"https://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-AriaNeural\">Hello world!</voice></speak>",
  "provider": "azure",
  "validateSSML": true
}

generate-podcast

Multi-speaker podcast generation from script.

{
  "script": "Host: Welcome to our show!\nGuest: Thanks for having me!",
  "speakers": [
    {"name": "Host", "voice": "en-US-JennyNeural"},
    {"name": "Guest", "voice": "en-US-GuyNeural"}
  ],
  "provider": "azure",
  "title": "Episode 1"
}

list-voices

Discover and filter available voices.

{
  "provider": "azure",
  "language": "en-US",
  "gender": "Female",
  "refresh": false
}

Resources

tts://voice-catalog

Comprehensive voice catalog with filtering and search capabilities.

tts://status

Real-time server status, provider health, and performance metrics.

Configuration

Environment Variables

VariableDescriptionDefault
AZURE_SPEECH_KEYAzure Speech Services subscription keyRequired for Azure
AZURE_SPEECH_REGIONAzure region (e.g., "eastus")Required for Azure
AZURE_SPEECH_ENDPOINTCustom Azure endpointAuto-generated
OPENAI_API_KEYOpenAI API keyRequired for OpenAI
OPENAI_TTS_MODELOpenAI TTS modeltts-1
TTS_DEFAULT_PROVIDERDefault providerazure
TTS_OUTPUT_DIRAudio output directory/tmp/tts-output
TTS_MAX_CONCURRENTMax concurrent requests3
TTS_ENABLE_LLMEnable LLM optimizationsfalse

Supported Audio Formats

  • WAV: Uncompressed, high quality
  • MP3: Compressed, widely compatible
  • Opus: High compression, web optimized

Architecture

TTS MCP Server
├── Tools/               # MCP tool implementations
│   ├── SynthesizeTextTool
│   ├── SynthesizeSSMLTool
│   ├── GeneratePodcastTool
│   └── ListVoicesTool
├── Resources/           # MCP resource providers
│   ├── VoiceCatalogResource
│   └── TTSStatusResource
├── Services/            # TTS provider implementations
│   ├── AzureTTSService
│   ├── OpenAITTSService
│   └── TTSServiceFactory
└── Types/               # TypeScript definitions
    └── TTSTypes

Examples

Basic Text Synthesis

# Using MCP client
mcp call tts-server synthesize-text '{
  "text": "Welcome to our podcast about AI development",
  "voice": "en-US-JennyNeural",
  "outputFormat": "wav"
}'

Multi-Speaker Podcast

mcp call tts-server generate-podcast '{
  "script": "Host: Today we discuss AI ethics.\nExpert: This is a crucial topic.\nHost: What are the main challenges?",
  "speakers": [
    {"name": "Host", "voice": "en-US-AriaNeural"},
    {"name": "Expert", "voice": "en-US-BrianNeural"}
  ],
  "title": "AI Ethics Discussion"
}'

Voice Discovery

# List all available voices
mcp call tts-server list-voices '{}'

# Filter by language and gender
mcp call tts-server list-voices '{
  "language": "en-US",
  "gender": "Female",
  "provider": "azure"
}'

Performance

  • Concurrent Requests: Up to 3 simultaneous syntheses
  • Voice Caching: 1-hour TTL for voice catalogs
  • Audio Streaming: Direct provider streaming
  • Format Support: WAV, MP3, Opus with automatic conversion

Error Handling

The server includes comprehensive error handling:

  • Provider failover and retry logic
  • SSML validation and automatic fixing
  • Rate limiting and queue management
  • Detailed error reporting with context

Development

# Install dependencies
npm install

# Development mode with hot reload
npm run dev

# Type checking
npm run typecheck

# Build for production
npm run build

# Clean build artifacts
npm run clean

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new functionality
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Support

  • Issues: GitHub Issues
  • Documentation: In-code documentation and examples
  • Community: MCP Discord server

Built with the Model Context Protocol SDK for seamless AI tool integration.