sanastasiou/claude-code-voice-mcp-server
If you are the rightful owner of claude-code-voice-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Claude Voice TTS MCP Server is a high-quality text-to-speech server utilizing the Kokoro-82M model, known for its superior performance in TTS Arena.
Claude Voice TTS MCP Server
High-quality text-to-speech MCP server using the Kokoro-82M model (#1 ranked in TTS Arena) with voice blending support and Claude Desktop integration.
Features
- Best-in-Class Quality: Kokoro-82M outperforms XTTS v2, MetaVoice, Fish Speech in blind tests
- Fast: 100-300ms latency with GPU acceleration (35-100x real-time)
- Voice Blending: Mix multiple voices with custom ratios (e.g.,
af_bella(2)+af_sky(1)) - MCP Integration: Seamless integration with Claude Desktop/Code
- GPU Accelerated: NVIDIA CUDA support with automatic CPU fallback
- One-Command Install: Automated installer handles everything
Quick Start
# Clone or download this repository
git clone https://github.com/your-username/claude-code-voice-mcp-server.git
cd claude-code-voice-mcp-server
# Run the installer (handles everything automatically)
./install.sh
# Start the service
tts start
# Test it's working
tts test
That's it! The MCP server is now available in Claude Desktop.
Requirements
- OS: Linux (Ubuntu, Debian, Fedora, RHEL, Arch, SUSE) or macOS
- GPU: NVIDIA GPU with CUDA 12.3+ (recommended, CPU fallback available)
- Note: macOS does not support NVIDIA GPUs, will use CPU automatically
- Docker: Will be installed automatically if missing
- Disk: ~4GB for Docker image and models
- RAM: 4GB minimum, 8GB recommended
Installation
The installer (install.sh) automatically:
- Checks system requirements (GPU, Docker, etc.)
- Installs all dependencies (Docker, NVIDIA Container Toolkit, Python packages)
- Sets up Python environment with conda/venv
- Pulls Docker image for Kokoro TTS
- Creates systemd service for auto-start
- Installs MCP server and CLI tools
- Configures Claude Desktop
- Tests the installation
Manual Installation (Advanced)
If you prefer manual control:
# 1. Install dependencies (Debian/Ubuntu)
sudo apt-get update && sudo apt-get install -y docker.io python3 python3-pip
# Or Fedora/RHEL
# sudo dnf install -y docker python3 python3-pip
# Or Arch
# sudo pacman -Sy docker python python-pip
# Or macOS
# brew install docker python
# 2. Install NVIDIA Container Toolkit (Linux with NVIDIA GPU only)
# Debian/Ubuntu:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://nvidia.github.io/libnvidia-container/stable/deb/\$(ARCH) /" | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
# Fedora/RHEL:
# curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# sudo dnf install -y nvidia-container-toolkit
# sudo systemctl restart docker
# 3. Pull Docker image
docker pull ghcr.io/remsky/kokoro-fastapi-gpu:latest # GPU
# docker pull ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU
# 4. Start container
docker run -d --name claude-voice-tts --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest # GPU
# docker run -d --name claude-voice-tts -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU
# 5. Install MCP server
pip install uv
uv pip install -e .
# 6. Configure Claude Desktop
# Edit ~/.config/claude/claude_desktop_config.json
# Add claude-voice-tts MCP server configuration (see Configuration section)
Usage
Service Control
# Start/stop service
tts start
tts stop
tts restart
# Check status
tts status
# View logs
tts logs # Last 50 lines
tts logs -f # Follow logs
# Enable/disable auto-start
tts enable # Start on login
tts disable # Don't start on login
# Test service
tts test
Claude Desktop Integration
After installation, restart Claude Desktop. The MCP server automatically provides these tools:
generate_speech
Generate speech from text with optional voice blending.
Parameters:
text(required): Text to convert to speechvoice(optional): Voice name or blended voice (default:af_bella)speed(optional): Speech speed 0.5-2.0 (default:1.0)output_format(optional): Audio format: mp3, wav, opus (default:mp3)save_to_file(optional): Save to file or return base64 (default:true)
Example:
Claude, generate speech saying "Hello, this is a test of voice blending" using a blend of af_bella and af_sky voices.
list_voices
List all available voices and voice blending information.
Example:
Claude, what voices are available?
check_status
Check if Kokoro TTS service is running and accessible.
Example:
Claude, check if the TTS service is working.
Direct API Usage
The Kokoro TTS backend exposes an OpenAI-compatible API on port 8880:
# List voices
curl http://localhost:8880/v1/audio/voices
# Generate speech
curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro",
"input": "Hello, this is a test.",
"voice": "af_bella",
"speed": 1.0,
"response_format": "mp3"
}' \
-o output.mp3
# Voice blending (2 parts Bella + 1 part Sky)
curl -X POST http://localhost:8880/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro",
"input": "Testing voice blending.",
"voice": "af_bella(2)+af_sky(1)",
"speed": 1.0
}' \
-o blended.mp3
# Play the audio
mpg123 output.mp3 # or ffplay output.mp3
Available Voices
| Voice | Gender | Accent | Description |
|---|---|---|---|
af_bella | Female | American | Bella |
af_sky | Female | American | Sky |
af_nicole | Female | American | Nicole |
am_adam | Male | American | Adam |
am_michael | Male | American | Michael |
bf_emma | Female | British | Emma |
bf_isabella | Female | British | Isabella |
bm_george | Male | British | George |
bm_lewis | Male | British | Lewis |
Voice Blending
Create custom voices by blending multiple voices:
# Syntax: voice1(weight1)+voice2(weight2)+...
af_bella(2)+af_sky(1) # 2 parts Bella, 1 part Sky
am_adam(3)+am_michael(1) # 3 parts Adam, 1 part Michael
bf_emma(1)+bf_isabella(1) # Equal mix of Emma and Isabella
Configuration
Environment Variables
Create a .env file in the installation directory (~/.local/share/claude-code-voice-mcp-server/):
KOKORO_BASE_URL=http://localhost:8880
DEFAULT_VOICE=af_bella
DEFAULT_SPEED=1.0
OUTPUT_DIR=~/tts_output
TIMEOUT=30
LOG_LEVEL=INFO
Claude Desktop Configuration
The installer automatically configures Claude Desktop, but you can manually edit ~/.config/claude/claude_desktop_config.json:
{
"mcpServers": {
"claude-voice-tts": {
"command": "uv",
"args": [
"--directory",
"/home/USERNAME/.local/share/claude-code-voice-mcp-server",
"run",
"claude-voice-mcp"
],
"env": {
"KOKORO_BASE_URL": "http://localhost:8880"
}
}
}
}
Replace USERNAME with your actual username.
GPU vs CPU Performance
GPU (NVIDIA with CUDA 12.3+)
- Latency: 100-300ms
- Speed: 35-100x real-time
- VRAM: ~2-3GB
- Recommended: RTX 3060 or better
CPU
- Latency: 1-3.5s
- Speed: <1x real-time
- RAM: ~4GB
- Works: Any modern CPU
The installer automatically detects your hardware and uses the appropriate configuration.
Coexistence with Other Services
Kokoro TTS uses minimal resources and can run alongside other GPU services:
- VRAM: ~2-3GB (RTX 3090 has 24GB total)
- Example: Run Kokoro TTS + Whisper STT simultaneously
- Port: 8880 (configurable)
Troubleshooting
Service won't start
# Check Docker status
docker ps
systemctl --user status claude-voice-tts
# Check logs
tts logs
# Restart Docker
sudo systemctl restart docker
tts restart
API not responding
# Test connection
curl http://localhost:8880/v1/audio/voices
# Check if port is in use
netstat -tulpn | grep 8880
# Restart service
tts restart
GPU not detected
# Check NVIDIA driver
nvidia-smi
# Test NVIDIA Container Toolkit
docker run --rm --gpus all nvidia/cuda:12.3.0-base-ubuntu22.04 nvidia-smi
# Reinstall toolkit
sudo apt-get install --reinstall nvidia-container-toolkit
sudo systemctl restart docker
Claude Desktop not showing MCP tools
- Check MCP server is configured:
cat ~/.config/claude/claude_desktop_config.json - Restart Claude Desktop completely
- Check Claude Desktop logs for errors
- Test MCP server directly:
uv run claude-voice-mcp(should start without errors)
Audio files not playing
# Install audio player
sudo apt-get install mpg123 ffmpeg
# Test playback
mpg123 ~/tts_output/your-file.mp3
ffplay ~/tts_output/your-file.mp3
Development
Run MCP server in debug mode
cd ~/.local/share/claude-code-voice-mcp-server
LOG_LEVEL=DEBUG uv run claude-voice-mcp
Run tests
# Full test suite
pytest tests/
# Specific tests
pytest tests/test_mcp.py -v
# Coverage
pytest --cov=src tests/
Modify and reload
# Edit MCP server
vim ~/.local/share/claude-code-voice-mcp-server/src/claude_voice_mcp.py
# Restart Claude Desktop to reload MCP server
# Or test directly:
uv run claude-voice-mcp
Architecture
┌─────────────────┐
│ Claude Desktop │
│ (MCP Client) │
└────────┬────────┘
│ stdio
▼
┌─────────────────┐
│ MCP Server │
│ (claude_voice_mcp.py) │
└────────┬────────┘
│ HTTP
▼
┌─────────────────┐ ┌──────────────┐
│ Docker Container│◄──────┤ systemd │
│ Kokoro-FastAPI │ │ service │
│ (port 8880) │ └──────────────┘
└────────┬────────┘
│ GPU
▼
┌─────────────────┐
│ Kokoro-82M │
│ TTS Model │
│ (~2-3GB VRAM) │
└─────────────────┘
Performance Benchmarks
| Configuration | Latency | VRAM | Speed |
|---|---|---|---|
| RTX 4090 GPU | 100ms | 2.5GB | 100x RT |
| RTX 3090 GPU | 150ms | 2.8GB | 70x RT |
| RTX 3060 GPU | 250ms | 3.0GB | 40x RT |
| CPU (i7-12700) | 3.5s | N/A | 0.3x RT |
| CPU (M3 Pro) | 1.0s | N/A | 1.0x RT |
RT = Real-time (1x = same duration as audio length)
License
MIT License - See LICENSE file for details.
Credits
- Kokoro TTS: github.com/remsky/Kokoro-FastAPI
- Original Kokoro Model: Style-Bert-VITS2
- MCP Framework: FastMCP
Support
For issues, questions, or contributions:
- GitHub Issues: [your-repo-url/issues]
- Documentation: See CLAUDE.md for developer guidance
Roadmap
- Add streaming audio support
- Support for additional languages (Japanese, Chinese)
- Voice cloning from audio samples
- Web UI for voice testing
- Real-time voice morphing
- Integration with more MCP clients