JosefGold/tts-mcp
If you are the rightful owner of tts-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Text to Speech MCP Server is a sophisticated text-to-speech server that transforms written text into audible speech using OpenAI's TTS models.
π€ Text to Speech MCP Server
Where your agent finally learns to speak up for itself
Welcome to the Text to Speech (TTS) MCP Server β a sophisticated yet charmingly chaotic text-to-speech MCP server that transforms your boring written words into magnificent audible experiences.
Because who needs human vocal cords when you have Python and some very fancy AI models?
π What Does This Do?
This delightful contraption takes your text and makes it speak through your computer's speakers using OpenAI's cutting-edge TTS models. It's like having a personal narrator, except they never get tired, never ask for coffee breaks, and never judge your terrible programming jokes.
Features That Actually Matter
- Speak MCP Tool: Gives your agent the ability to voice any given text in one of several available voices
- Instructions for Delivery: Provide optional
instructions
to guide delivery, character, pacing, tone, and emotion - Model Selection: OpenAI TTS model can be configured via environment variables (default:
gpt-4o-mini-tts
) - Blocking/Non-Blocking Mode: Speak commands can either return immediately for continued agent operation while sound is playing (default) or return only after the sound finishes for a more controlled workflow
- Queue-Based Audio Playback: Agents can queue up messages to wait patiently in line and be played in sequence
π οΈ Installation & Setup
Prerequisites
- Python 3.10+
- An OpenAI API key (the magic ingredient)
- PortAudio (required for PyAudio to work properly)
- A sense of humor (optional but recommended)
Quick Start
-
Install PortAudio:
# macOS brew install portaudio
# Linux (Debian/Ubuntu) sudo apt-get install portaudio19-dev
# Windows pip install pipwin && pipwin install pyaudio
-
Clone this repository:
git clone <your-repo-url> cd tts-mcp
-
Create a virtual environment (because global installs are for rebels):
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up your environment variables:
cp env.template .env # Edit .env and add your OpenAI API key
Or set directly:
export OPENAI_API_KEY="your-secret-key-here"
-
Configure MCP in your Cursor settings with the provided
mcp-config.json
. Example:{ "mcpServers": { "tts-server": { "command": "/absolute/path/to/tts-mcp/.venv/bin/python", "args": ["/absolute/path/to/tts-mcp/tts_mcp_server.py"], "cwd": "/absolute/path/to/tts-mcp", "env": { "PYTHONPATH": "/absolute/path/to/tts-mcp" } } } }
Replace paths with your local repo and venv.
-
Start making your computer talk!
π Voice Options
Choose your narrator wisely:
- alloy: Neutral, balanced tone (default)
- ash: warm, expressive; friendly support vibes
- ballad: smooth narrator; long-form storytelling
- coral: bright, upbeat; cheerful promos
- echo: Clear and professional, like a news anchor
- fable: Warm and storytelling, perfect for bedtime code reviews
- onyx: Deep and authoritative, for when your code needs to sound important
- nova: Bright and energetic, like your enthusiasm before debugging
- sage: calm, measured; helpful explainer
- shimmer: Soft and gentle, for when you need to break bad news about production bugs
- verse: dramatic, theatrical; trailer read
πͺ Usage Examples
Basic Usage
# Non-blocking (default) - returns immediately
speak("Hello, world! I'm now audible!")
# Blocking - waits for completion
speak("This message will finish before I return", blocking=True)
# With specific voice
speak("I'm feeling dramatic today!", voice="fable")
# With delivery instructions
speak(
"You're doing greatβlet's take this one step at a time.",
voice="shimmer",
instructions="Speak in a warm, reassuring and unhurried tone and pace"
)
In Cursor with MCP
Just tell Cursor to use the speak
tool in your conversations.
You can suggest a voice and style instructions for maintaining a consistent character.
βοΈ Configuration
Environment variables:
OPENAI_API_KEY
(required): Your OpenAI API keyTTS_MODEL
(optional): Defaults togpt-4o-mini-tts
. Other options includetts-1
,tts-1-hd
(though "instructions" are not supported on those, as well as some of the voices)LOG_LEVEL
(optional):DEBUG
,INFO
(default),WARNING
,ERROR
π§° Troubleshooting
- No audio / no default output device:
- Set a system default output device and restart the MCP server.
- macOS: System Settings β Sound β Output.
- PyAudio install issues:
- macOS:
brew install portaudio
thenpip install -r requirements.txt
- Linux (Debian/Ubuntu):
sudo apt-get install portaudio19-dev
thenpip install pyaudio
- Windows:
pip install pipwin && pipwin install pyaudio
- macOS:
- Missing API key:
- Ensure
.env
containsOPENAI_API_KEY=...
or export it in your shell.
- Ensure
- High latency or choppy audio:
- Close other audio apps; verify system output device; keep
blocking=False
if you need responsiveness.
- Close other audio apps; verify system output device; keep
- Logs:
- Logs stream to stderr and to
tts_mcp_server.log
. Tail with:tail -f tts_mcp_server.log
- Logs stream to stderr and to
π Acknowledgments
- Cursor for writing 95% of the code here
- Coffee, for making everything else possible
Remember: With great text-to-speech power comes great responsibility. Use your new vocal abilities wisely, and try not to annoy your coworkers too much.
Pro tip: If your computer starts talking back to you without being prompted, it might be time to take a break. Or update your Python version. Probably the latter.
This project is licensed under the BSD 3-Clause License. See the file for details.