fwdslsh/converse
If you are the rightful owner of converse and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
A lightweight Model Context Protocol (MCP) server that provides voice capabilities using remote OpenAI-compatible TTS/STT APIs.
converse
A lightweight Model Context Protocol (MCP) server that provides voice capabilities using remote OpenAI-compatible TTS/STT APIs. Enables AI assistants like Claude to speak and listen through your system's audio devices.
Features
- Text-to-Speech (TTS): Convert text to natural-sounding speech
- Speech-to-Text (STT): Transcribe spoken audio to text
- Two-Way Conversation: Speak and listen in a single interaction
- OpenAI-Compatible: Works with any OpenAI-compatible TTS/STT API
- System Audio: Uses standard Linux audio utilities (
aplay,arecord)
Installation
npm install -g @fwdslsh/converse
Quick Start
1. Configure Environment Variables
The converse server requires API endpoints for TTS and STT services:
export STT_API_URL="https://your-api.example.com/v1/audio/transcriptions"
export STT_API_KEY="your-stt-api-key"
export TTS_API_URL="https://your-api.example.com/v1/audio/speech"
export TTS_API_KEY="your-tts-api-key"
export TTS_VOICE="alloy" # Optional: default voice
export TTS_MODEL="tts-1" # Optional: default model
2. Add to MCP Client Configuration
For Claude Code, add to your .mcp.json:
{
"mcpServers": {
"converse": {
"command": "converse",
"env": {
// //Optionally set configuration here instead of ENV
// "STT_API_URL": "https://your-api.example.com/v1/audio/transcriptions",
// "STT_API_KEY": "your-stt-api-key",
// "TTS_API_URL": "https://your-api.example.com/v1/audio/speech",
// "TTS_API_KEY": "your-tts-api-key",
// "TTS_VOICE": "alloy",
// "TTS_MODEL": "tts-1"
}
}
}
}
3. Start Your MCP Client
The converse server will automatically start when your MCP client connects.
Available Tools
speak_text
Convert text to speech and play through speakers.
Parameters:
text(string, required): The message to speakvoiceId(string, optional): Voice identifier (default: "alloy")language(string, optional): Language hint for TTS
Example:
await mcp.call('speak_text', {
text: 'Hello! I can speak now.',
voiceId: 'alloy'
});
listen_for_speech
Record audio from microphone and transcribe to text.
Parameters:
maxDurationSeconds(number, optional): Maximum recording length (default: 5, max: 60)language(string, optional): Language hint for STT (e.g., 'en')
Returns:
text(string): Transcribed textlanguage(string, optional): Detected language
Example:
const result = await mcp.call('listen_for_speech', {
maxDurationSeconds: 10,
language: 'en'
});
console.log('You said:', result.text);
converse
Speak a message and optionally listen for a response (two-way conversation).
Parameters:
message(string, required): The message to speakwait_for_response(boolean, optional): Whether to listen after speaking (default: true)listen_duration(number, optional): Max listen time in seconds (default: 30, max: 120)voice(string, optional): Voice identifiertts_model(string, optional): TTS model hinttts_provider(string, optional): TTS provider hint
Returns:
spoken_message(string): The message that was spokenheard_text(string, optional): Transcribed response (ifwait_for_responseis true)heard_language(string, optional): Detected language of response
Example:
const result = await mcp.call('converse', {
message: 'Can you hear me?',
wait_for_response: true,
listen_duration: 15
});
console.log('I said:', result.spoken_message);
console.log('You said:', result.heard_text);
Environment Variables
Required
TTS_API_URL: OpenAI-compatible TTS API endpoint (e.g.,https://api.example.com/v1/audio/speech)STT_API_URL: OpenAI-compatible STT API endpoint (e.g.,https://api.example.com/v1/audio/transcriptions)
Optional
TTS_API_KEY: Bearer token for TTS API authenticationSTT_API_KEY: Bearer token for STT API authenticationTTS_VOICE: Default voice for TTS (default: "alloy")TTS_MODEL: Default TTS model to use (default: "tts-1")PLAY_CMD: Custom audio playback command (default:aplay -q)RECORD_CMD: Custom audio recording command (default:arecord -f cd -t wav -q)STT_MODE: STT engine mode (default: "remote", only "remote" is currently supported)
System Requirements
Linux
- Audio Playback:
aplay(from ALSA utils) or custom command viaPLAY_CMD - Audio Recording:
arecord(from ALSA utils) or custom command viaRECORD_CMD - Node.js: Version 18 or higher
Installation on Ubuntu/Debian
sudo apt-get install alsa-utils
macOS/Windows
Audio recording and playback require system-specific commands. Configure via environment variables:
macOS:
export PLAY_CMD="afplay"
export RECORD_CMD="sox -d -t wav -"
API Compatibility
This server expects OpenAI-compatible API endpoints:
TTS Request Format
{
"model": "tts-1",
"input": "Text to speak",
"voice": "alloy",
"response_format": "wav"
}
STT Request Format
{
"file": "<audio file>",
"model": "whisper-1"
}
Compatible services include:
- OpenAI API
- LocalAI
- FastAPI-based TTS/STT servers
- Any OpenAI-compatible speech service
Example: Using with LocalAI
LocalAI provides OpenAI-compatible TTS/STT endpoints that run locally on your machine.
1. Install and Start LocalAI
# Using Docker
docker run -p 8080:8080 -v $PWD/models:/models localai/localai:latest
# Or install locally
# See https://localai.io/basics/getting_started/
2. Configure Converse for LocalAI
export STT_API_URL="http://localhost:8080/v1/audio/transcriptions"
export TTS_API_URL="http://localhost:8080/v1/audio/speech"
export TTS_MODEL="tts-1" # Or your LocalAI TTS model name
export TTS_VOICE="alloy"
# No API keys needed for local deployment
3. Update .mcp.json
{
"mcpServers": {
"converse": {
"command": "converse",
"env": {
"STT_API_URL": "http://localhost:8080/v1/audio/transcriptions",
"TTS_API_URL": "http://localhost:8080/v1/audio/speech",
"TTS_MODEL": "tts-1",
"TTS_VOICE": "alloy"
}
}
}
}
4. Test the Setup
# Test TTS endpoint
curl -X POST http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello from LocalAI",
"voice": "alloy",
"response_format": "wav"
}' \
--output test.wav && aplay test.wav
Note: Make sure LocalAI has the appropriate TTS and STT models installed. See the LocalAI models documentation for setup instructions.
Finding Available Voices in LocalAI
LocalAI supports multiple TTS backends (Piper, Coqui, etc.), each with different voices:
List All Available Models:
# Query LocalAI's models endpoint (adjust port if needed)
curl http://localhost:8080/v1/models | jq -r '.data[].id'
# Filter for TTS models only
curl http://localhost:8080/v1/models | jq -r '.data[].id' | grep -i tts
Common Voice Naming Patterns:
- Kokoro TTS:
af_heart,am_heart,bf_heart,bm_heart,af_sky,af_bella- See Kokoro TTS Documentation for complete voice list
- Piper TTS:
en_US_lessac_medium,en_GB_alan_low,en_US_amy_low- See Piper Voices for samples
- Coqui TTS: Depends on installed models
- Check LocalAI config or Coqui TTS Models
4. Test a Specific Voice:
# Replace "en_US_lessac_medium" with your voice name
curl -X POST http://localhost:8080/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "kokoro",
"input": "Testing this voice",
"voice": "af_heart",
"response_format": "wav"
}' \
--output test.wav && aplay test.wav
5. Configure Multiple Voices:
You can switch voices dynamically per conversation, or set a default:
{
"mcpServers": {
"converse": {
"command": "converse",
"env": {
"STT_API_URL": "http://localhost:8080/v1/audio/transcriptions",
"TTS_API_URL": "http://localhost:8080/v1/audio/speech",
"TTS_VOICE": "en_US_lessac_medium", // Your preferred Piper voice
"TTS_MODEL": "tts-1"
}
}
}
}
Note: The backend (Piper vs. Coqui) is determined automatically by LocalAI based on the voice name you specify, as each voice is tied to a specific backend in LocalAI's configuration.
Troubleshooting
No audio playback
- Verify
aplayworks:aplay test.wav - Check audio devices:
aplay -l - Try different device:
export PLAY_CMD="aplay -D hw:0,0 -q"
No audio recording
- Verify
arecordworks:arecord -d 5 test.wav && aplay test.wav - Check input devices:
arecord -l - Try different device:
export RECORD_CMD="arecord -D hw:1,0 -f cd -t wav -q"
API connection errors
-
Verify API URLs are correct and accessible
-
Check API keys are valid
-
Test endpoints directly with
curl:curl -X POST "$TTS_API_URL" \ -H "Authorization: Bearer $TTS_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"tts-1","input":"test","voice":"alloy","response_format":"wav"}' \ --output test.wav
Development
Build from source
git clone https://github.com/fwdslsh/converse.git
cd converse
npm install
npm run build
Run locally
export STT_API_URL="..."
export TTS_API_URL="..."
npm start
License
CC-BY
Contributing
Contributions welcome! Please open an issue or PR at https://github.com/fwdslsh/converse
Acknowledgments
Built with the Model Context Protocol SDK