WaterTaoMind/tts-mcp-server
If you are the rightful owner of tts-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A comprehensive Text-to-Speech Model Context Protocol (MCP) server that provides advanced speech synthesis capabilities through Azure Cognitive Services and OpenAI APIs.
TTS MCP Server
A comprehensive Text-to-Speech Model Context Protocol (MCP) server that provides advanced speech synthesis capabilities through Azure Cognitive Services and OpenAI APIs.
Features
🎯 Multi-Provider Support
- Azure Cognitive Services Speech: Full SSML support, neural voices, multi-language
- OpenAI TTS: High-quality synthesis with natural voices
- Automatic provider fallback and load balancing
🗣️ Advanced Voice Control
- 400+ neural voices across 100+ languages
- Real-time voice discovery and caching
- Voice recommendation system for different use cases
- Gender, language, and style filtering
🎙️ Podcast Generation
- Multi-speaker conversation synthesis
- Automatic speaker detection from scripts
- SSML optimization for natural dialogue
- Support for 10+ speakers per episode
🛠️ MCP Integration
- Tools:
synthesize-text,synthesize-ssml,generate-podcast,list-voices - Resources: Voice catalog, server status, performance metrics
- Full TypeScript SDK compatibility
Quick Start
Prerequisites
- Node.js 18+
- Azure Speech Services subscription (optional)
- OpenAI API key (optional)
Installation
# Clone and install
cd src/mcp-servers/azure-tts
npm install
# Build the server
npm run build
# Set environment variables
export AZURE_SPEECH_KEY="your_azure_key"
export AZURE_SPEECH_REGION="eastus"
export OPENAI_API_KEY="your_openai_key"
export TTS_OUTPUT_DIR="/path/to/audio/output"
# Start the server
npm start
MCP Configuration
Add to your MCP client configuration:
{
"tts-server": {
"command": "node",
"args": ["/path/to/tts-mcp-server/dist/server.js"],
"env": {
"AZURE_SPEECH_KEY": "your_azure_key",
"AZURE_SPEECH_REGION": "eastus",
"OPENAI_API_KEY": "your_openai_key",
"TTS_OUTPUT_DIR": "/tmp/tts-output",
"TTS_DEFAULT_PROVIDER": "azure",
"TTS_MAX_CONCURRENT": "3"
}
}
}
Tools
synthesize-text
Convert plain text to speech with voice and parameter control.
{
"text": "Hello, this is a test of text-to-speech synthesis.",
"voice": "en-US-JennyNeural",
"provider": "azure",
"outputFormat": "wav",
"speed": 1.0,
"pitch": 0
}
synthesize-ssml
Advanced SSML markup synthesis with validation and optimization.
{
"ssml": "<speak version=\"1.0\" xmlns=\"https://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-AriaNeural\">Hello world!</voice></speak>",
"provider": "azure",
"validateSSML": true
}
generate-podcast
Multi-speaker podcast generation from script.
{
"script": "Host: Welcome to our show!\nGuest: Thanks for having me!",
"speakers": [
{"name": "Host", "voice": "en-US-JennyNeural"},
{"name": "Guest", "voice": "en-US-GuyNeural"}
],
"provider": "azure",
"title": "Episode 1"
}
list-voices
Discover and filter available voices.
{
"provider": "azure",
"language": "en-US",
"gender": "Female",
"refresh": false
}
Resources
tts://voice-catalog
Comprehensive voice catalog with filtering and search capabilities.
tts://status
Real-time server status, provider health, and performance metrics.
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
AZURE_SPEECH_KEY | Azure Speech Services subscription key | Required for Azure |
AZURE_SPEECH_REGION | Azure region (e.g., "eastus") | Required for Azure |
AZURE_SPEECH_ENDPOINT | Custom Azure endpoint | Auto-generated |
OPENAI_API_KEY | OpenAI API key | Required for OpenAI |
OPENAI_TTS_MODEL | OpenAI TTS model | tts-1 |
TTS_DEFAULT_PROVIDER | Default provider | azure |
TTS_OUTPUT_DIR | Audio output directory | /tmp/tts-output |
TTS_MAX_CONCURRENT | Max concurrent requests | 3 |
TTS_ENABLE_LLM | Enable LLM optimizations | false |
Supported Audio Formats
- WAV: Uncompressed, high quality
- MP3: Compressed, widely compatible
- Opus: High compression, web optimized
Architecture
TTS MCP Server
├── Tools/ # MCP tool implementations
│ ├── SynthesizeTextTool
│ ├── SynthesizeSSMLTool
│ ├── GeneratePodcastTool
│ └── ListVoicesTool
├── Resources/ # MCP resource providers
│ ├── VoiceCatalogResource
│ └── TTSStatusResource
├── Services/ # TTS provider implementations
│ ├── AzureTTSService
│ ├── OpenAITTSService
│ └── TTSServiceFactory
└── Types/ # TypeScript definitions
└── TTSTypes
Examples
Basic Text Synthesis
# Using MCP client
mcp call tts-server synthesize-text '{
"text": "Welcome to our podcast about AI development",
"voice": "en-US-JennyNeural",
"outputFormat": "wav"
}'
Multi-Speaker Podcast
mcp call tts-server generate-podcast '{
"script": "Host: Today we discuss AI ethics.\nExpert: This is a crucial topic.\nHost: What are the main challenges?",
"speakers": [
{"name": "Host", "voice": "en-US-AriaNeural"},
{"name": "Expert", "voice": "en-US-BrianNeural"}
],
"title": "AI Ethics Discussion"
}'
Voice Discovery
# List all available voices
mcp call tts-server list-voices '{}'
# Filter by language and gender
mcp call tts-server list-voices '{
"language": "en-US",
"gender": "Female",
"provider": "azure"
}'
Performance
- Concurrent Requests: Up to 3 simultaneous syntheses
- Voice Caching: 1-hour TTL for voice catalogs
- Audio Streaming: Direct provider streaming
- Format Support: WAV, MP3, Opus with automatic conversion
Error Handling
The server includes comprehensive error handling:
- Provider failover and retry logic
- SSML validation and automatic fixing
- Rate limiting and queue management
- Detailed error reporting with context
Development
# Install dependencies
npm install
# Development mode with hot reload
npm run dev
# Type checking
npm run typecheck
# Build for production
npm run build
# Clean build artifacts
npm run clean
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
MIT License - see LICENSE file for details.
Support
- Issues: GitHub Issues
- Documentation: In-code documentation and examples
- Community: MCP Discord server
Built with the Model Context Protocol SDK for seamless AI tool integration.