joelfuller2016/claude-desktop-realtime-audio-mcp-python
If you are the rightful owner of claude-desktop-realtime-audio-mcp-python and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A Python-based Model Context Protocol (MCP) server for real-time audio input on Claude Desktop, leveraging Python's audio processing capabilities.
Claude Desktop Real-time Audio MCP Server (Python Implementation)
A Python-based Model Context Protocol (MCP) server that enables real-time microphone input for Claude Desktop on Windows. This implementation leverages Python's superior audio processing ecosystem to provide robust voice-driven conversations with Claude through WASAPI audio capture and multiple speech recognition engines.
๐ Key Advantages of Python Implementation
- ๐ Mature Audio Ecosystem: Leverages
sounddevice
,webrtcvad
, and specialized Windows audio libraries - ๐ง Multiple STT Engines: OpenAI Whisper (local/API), Azure Speech, Google Speech-to-Text
- โก FastMCP Framework: High-level Pythonic interface for rapid MCP development
- ๐ง Easy Configuration: JSON/YAML configuration with environment variable support
- ๐ Better Debugging: Comprehensive logging and performance monitoring
- ๐ Async Architecture: Non-blocking operations with asyncio
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Claude โ โ FastMCP Server โ โ Audio Capture โ
โ Desktop โโโโโบโ (Python) โโโโโบโ (sounddevice) โ
โ โ โ โ โ + WASAPI โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ STT Engines โ โ Voice Activity โ
โ โข Whisper โ โ Detection โ
โ โข Azure Speech โ โ (webrtcvad) โ
โ โข Google Speech โ โ โ
โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
๐ Prerequisites
- Windows 10/11 (Windows 7+ with WASAPI support)
- Python 3.8+
- Claude Desktop (latest version)
๐ฆ Quick Start
1. Installation
# Clone the repository
git clone https://github.com/joelfuller2016/claude-desktop-realtime-audio-mcp-python.git
cd claude-desktop-realtime-audio-mcp-python
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Install with GPU support (optional)
pip install -r requirements.txt[gpu]
2. Configuration
Create a configuration file or use environment variables:
# Set OpenAI API key (for Whisper API)
set OPENAI_API_KEY=your_api_key_here
# Set Azure Speech key (optional)
set AZURE_SPEECH_KEY=your_azure_key
set AZURE_SPEECH_REGION=eastus
# Set Google credentials (optional)
set GOOGLE_CREDENTIALS_PATH=path/to/credentials.json
3. Test Audio Setup
# Test your microphone and audio devices
python -m audio.test_setup
4. Run the MCP Server
# Start the server
python main.py
# Or with debug logging
python main.py --debug
5. Configure Claude Desktop
Add to your Claude Desktop configuration file (claude_desktop_config.json
):
{
"mcpServers": {
"claude-audio": {
"command": "python",
"args": [
"C:\\full\\path\\to\\main.py"
],
"env": {
"OPENAI_API_KEY": "your_api_key_here"
}
}
}
}
๐ ๏ธ MCP Tools Available
Audio Control
start_recording()
- Start real-time audio capturestop_recording()
- Stop audio captureget_recording_status()
- Get current status and configtest_audio_capture(duration=3.0)
- Test microphone
Device Management
list_audio_devices()
- List all audio input devicesset_audio_device(device_id)
- Set audio input deviceconfigure_audio_settings()
- Adjust sample rate, channels, etc.
Speech Recognition
set_stt_engine(engine)
- Switch between whisper/azure/google- View available engines and status
Resources
audio://devices
- Available audio devicesaudio://config
- Current audio configurationstt://engines
- STT engines status
โ๏ธ Configuration
Audio Settings
{
"audio": {
"sample_rate": 16000,
"channels": 1,
"chunk_size": 1024,
"device_id": null,
"use_wasapi_exclusive": false,
"low_latency": true
}
}
Voice Activity Detection
{
"vad": {
"mode": "hybrid",
"webrtc_aggressiveness": 2,
"energy_threshold": 0.01,
"min_speech_duration": 0.1,
"min_silence_duration": 0.3
}
}
Speech-to-Text Engines
{
"stt": {
"default_engine": "whisper",
"whisper": {
"model_size": "base",
"use_api": false,
"language": null
},
"azure": {
"enabled": false,
"api_key": null,
"region": "eastus"
},
"google": {
"enabled": false,
"credentials_path": null
}
}
}
๐ง Advanced Usage
Using Local Whisper Models
# Different model sizes (tiny, base, small, medium, large)
config.stt.whisper.model_size = "small" # Faster
config.stt.whisper.model_size = "large" # More accurate
Optimizing for Real-time Performance
# Low-latency settings
config.audio.chunk_size = 512
config.audio.sample_rate = 16000
config.vad.min_speech_duration = 0.1
Using Cloud STT Services
# Azure Speech
export AZURE_SPEECH_KEY="your_key"
export AZURE_SPEECH_REGION="eastus"
# Google Speech
export GOOGLE_CREDENTIALS_PATH="/path/to/credentials.json"
๐งช Testing
Test Audio Devices
python -c "from audio.capture import list_audio_devices; print(list_audio_devices())"
Test Voice Activity Detection
python -c "from audio.vad import create_vad; vad = create_vad('hybrid')"
Benchmark STT Engines
python -m stt.benchmark --audio test_audio.wav
๐ Performance Monitoring
The server includes comprehensive logging and performance monitoring:
- Audio Processing: Chunk processing times, dropouts, queue status
- VAD Performance: Speech detection accuracy, false positives
- STT Metrics: Transcription latency, confidence scores, accuracy
- System Resources: Memory usage, CPU utilization
Enable debug logging for detailed metrics:
python main.py --debug
๐ Troubleshooting
Common Issues
1. No audio devices detected
# Check if sounddevice can see your devices
python -c "import sounddevice as sd; print(sd.query_devices())"
2. High latency
# Reduce chunk size and enable low latency
config.audio.chunk_size = 512
config.audio.low_latency = True
3. Whisper model loading errors
# Clear Whisper cache and redownload
pip uninstall openai-whisper
pip install openai-whisper
4. WASAPI permissions on Windows
- Check microphone privacy settings
- Run as administrator if needed
- Ensure Claude Desktop has microphone permissions
Debug Mode
# Enable comprehensive logging
python main.py --debug
# Or set environment variable
set LOG_LEVEL=DEBUG
python main.py
๐ Security Considerations
- API keys are stored in environment variables or secure config files
- Audio data is processed locally by default (Whisper local models)
- Cloud STT services can be disabled for maximum privacy
- No audio data is permanently stored
๐ Performance Optimization
For Low Latency (<200ms)
{
"audio": {
"chunk_size": 512,
"sample_rate": 16000
},
"stt": {
"whisper": {
"model_size": "tiny",
"fp16": true
}
}
}
For High Accuracy
{
"audio": {
"chunk_size": 2048
},
"stt": {
"whisper": {
"model_size": "large",
"beam_size": 10
}
}
}
๐ค Contributing
We welcome contributions! Areas where help is needed:
- ๐ฏ Additional STT engines (AssemblyAI, Rev.ai, etc.)
- ๐ง Audio preprocessing (noise reduction, normalization)
- ๐ฑ Cross-platform support (macOS, Linux)
- ๐งช Testing frameworks (automated audio testing)
- ๐ Documentation (tutorials, examples)
๐ License
This project is licensed under the MIT License - see the file for details.
๐ Acknowledgments
- Anthropic for Claude and the Model Context Protocol
- OpenAI for Whisper speech recognition
- Python Audio Community for excellent libraries
- FastMCP for the high-level MCP framework
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Wiki
โญ Star this repository if you find it useful!
This Python implementation provides a more maintainable and feature-rich alternative to the original TypeScript version, with better audio processing capabilities and easier extensibility.