tts-mcp-server by WaterTaoMind - MCP Server

TTS MCP Server

A comprehensive Text-to-Speech Model Context Protocol (MCP) server that provides advanced speech synthesis capabilities through Azure Cognitive Services and OpenAI APIs.

Features

🎯 Multi-Provider Support

Azure Cognitive Services Speech: Full SSML support, neural voices, multi-language
OpenAI TTS: High-quality synthesis with natural voices
Automatic provider fallback and load balancing

🗣️ Advanced Voice Control

400+ neural voices across 100+ languages
Real-time voice discovery and caching
Voice recommendation system for different use cases
Gender, language, and style filtering

🎙️ Podcast Generation

Multi-speaker conversation synthesis
Automatic speaker detection from scripts
SSML optimization for natural dialogue
Support for 10+ speakers per episode

🛠️ MCP Integration

Tools: synthesize-text, synthesize-ssml, generate-podcast, list-voices
Resources: Voice catalog, server status, performance metrics
Full TypeScript SDK compatibility

Quick Start

Prerequisites

Node.js 18+
Azure Speech Services subscription (optional)
OpenAI API key (optional)

Installation

# Clone and install
cd src/mcp-servers/azure-tts
npm install

# Build the server
npm run build

# Set environment variables
export AZURE_SPEECH_KEY="your_azure_key"
export AZURE_SPEECH_REGION="eastus"
export OPENAI_API_KEY="your_openai_key"
export TTS_OUTPUT_DIR="/path/to/audio/output"

# Start the server
npm start

MCP Configuration

Add to your MCP client configuration:

{
  "tts-server": {
    "command": "node",
    "args": ["/path/to/tts-mcp-server/dist/server.js"],
    "env": {
      "AZURE_SPEECH_KEY": "your_azure_key",
      "AZURE_SPEECH_REGION": "eastus",
      "OPENAI_API_KEY": "your_openai_key",
      "TTS_OUTPUT_DIR": "/tmp/tts-output",
      "TTS_DEFAULT_PROVIDER": "azure",
      "TTS_MAX_CONCURRENT": "3"
    }
  }
}

Tools

`synthesize-text`

Convert plain text to speech with voice and parameter control.

{
  "text": "Hello, this is a test of text-to-speech synthesis.",
  "voice": "en-US-JennyNeural",
  "provider": "azure",
  "outputFormat": "wav",
  "speed": 1.0,
  "pitch": 0
}

`synthesize-ssml`

Advanced SSML markup synthesis with validation and optimization.

{
  "ssml": "<speak version=\"1.0\" xmlns=\"https://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-AriaNeural\">Hello world!</voice></speak>",
  "provider": "azure",
  "validateSSML": true
}

`generate-podcast`

Multi-speaker podcast generation from script.

{
  "script": "Host: Welcome to our show!\nGuest: Thanks for having me!",
  "speakers": [
    {"name": "Host", "voice": "en-US-JennyNeural"},
    {"name": "Guest", "voice": "en-US-GuyNeural"}
  ],
  "provider": "azure",
  "title": "Episode 1"
}

`list-voices`

Discover and filter available voices.

{
  "provider": "azure",
  "language": "en-US",
  "gender": "Female",
  "refresh": false
}

Resources

`tts://voice-catalog`

Comprehensive voice catalog with filtering and search capabilities.

`tts://status`

Real-time server status, provider health, and performance metrics.

Configuration

Environment Variables

Variable	Description	Default
`AZURE_SPEECH_KEY`	Azure Speech Services subscription key	Required for Azure
`AZURE_SPEECH_REGION`	Azure region (e.g., "eastus")	Required for Azure
`AZURE_SPEECH_ENDPOINT`	Custom Azure endpoint	Auto-generated
`OPENAI_API_KEY`	OpenAI API key	Required for OpenAI
`OPENAI_TTS_MODEL`	OpenAI TTS model	`tts-1`
`TTS_DEFAULT_PROVIDER`	Default provider	`azure`
`TTS_OUTPUT_DIR`	Audio output directory	`/tmp/tts-output`
`TTS_MAX_CONCURRENT`	Max concurrent requests	`3`
`TTS_ENABLE_LLM`	Enable LLM optimizations	`false`

Supported Audio Formats

WAV: Uncompressed, high quality
MP3: Compressed, widely compatible
Opus: High compression, web optimized

Architecture

TTS MCP Server
├── Tools/               # MCP tool implementations
│   ├── SynthesizeTextTool
│   ├── SynthesizeSSMLTool
│   ├── GeneratePodcastTool
│   └── ListVoicesTool
├── Resources/           # MCP resource providers
│   ├── VoiceCatalogResource
│   └── TTSStatusResource
├── Services/            # TTS provider implementations
│   ├── AzureTTSService
│   ├── OpenAITTSService
│   └── TTSServiceFactory
└── Types/               # TypeScript definitions
    └── TTSTypes

Examples

Basic Text Synthesis

# Using MCP client
mcp call tts-server synthesize-text '{
  "text": "Welcome to our podcast about AI development",
  "voice": "en-US-JennyNeural",
  "outputFormat": "wav"
}'

Multi-Speaker Podcast

mcp call tts-server generate-podcast '{
  "script": "Host: Today we discuss AI ethics.\nExpert: This is a crucial topic.\nHost: What are the main challenges?",
  "speakers": [
    {"name": "Host", "voice": "en-US-AriaNeural"},
    {"name": "Expert", "voice": "en-US-BrianNeural"}
  ],
  "title": "AI Ethics Discussion"
}'

Voice Discovery

# List all available voices
mcp call tts-server list-voices '{}'

# Filter by language and gender
mcp call tts-server list-voices '{
  "language": "en-US",
  "gender": "Female",
  "provider": "azure"
}'

Performance

Concurrent Requests: Up to 3 simultaneous syntheses
Voice Caching: 1-hour TTL for voice catalogs
Audio Streaming: Direct provider streaming
Format Support: WAV, MP3, Opus with automatic conversion

Error Handling

The server includes comprehensive error handling:

Provider failover and retry logic
SSML validation and automatic fixing
Rate limiting and queue management
Detailed error reporting with context

Development

# Install dependencies
npm install

# Development mode with hot reload
npm run dev

# Type checking
npm run typecheck

# Build for production
npm run build

# Clean build artifacts
npm run clean

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Submit a pull request

License

MIT License - see LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: In-code documentation and examples
Community: MCP Discord server

Built with the Model Context Protocol SDK for seamless AI tool integration.