rquast/badcat-mcp-server
If you are the rightful owner of badcat-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
Badcat MCP Server is a TypeScript implementation of Pipecat's core audio processing pipeline, providing real-time audio-to-audio conversation capabilities through a standardized Model Context Protocol (MCP) interface.
Badcat MCP Server
A TypeScript implementation of Pipecat's core audio processing pipeline as a Model Context Protocol (MCP) server. This server provides real-time audio-to-audio conversation capabilities through a standardized MCP interface.
Features
- 🎙️ Audio Stream Processing: Handle chunked audio input/output with real-time processing
- 🧠 AI Pipeline: Integrated Speech-to-Text → Large Language Model → Text-to-Speech pipeline
- 🔧 Modular Architecture: Pluggable services for STT, LLM, and TTS providers
- 📦 MCP Compatible: Standard Model Context Protocol interface for easy integration
- 🧪 Comprehensive Testing: Unit tests and integration tests with mock services
- ⚡ High Performance: Efficient audio buffering and streaming capabilities
- 🔊 Audio Processing: Built-in audio format conversion, resampling, and chunking
Architecture
The server implements these core concepts in TypeScript:
graph TD
A[Audio Input Chunks] --> B[Audio Buffer Manager]
B --> C[Frame Processor Pipeline]
C --> D[STT Service]
D --> E[LLM Service]
E --> F[TTS Service]
F --> G[Audio Output Manager]
G --> H[Audio Output Chunks]
subgraph "MCP Server"
I[Tool Handler]
J[Conversation Context]
K[Service Registry]
end
I --> C
C --> J
K --> D
K --> E
K --> F
Installation
cd badcat-mcp-server
npm install
Quick Start
Basic Usage
import {
createMockBadcatServer,
createTestAudio,
audioToBase64,
} from 'badcat-mcp-server';
// Create server with mock services
const server = createMockBadcatServer({
sampleRate: 24000,
channels: 1,
debug: true,
});
await server.start();
// Process audio through MCP interface
const mcpServer = server.getMCPServer();
const testAudio = createTestAudio(1.0, 24000, 440); // 1 second, 440Hz
const audioBase64 = audioToBase64(testAudio);
const response = await mcpServer.request({
method: 'tools/call',
params: {
name: 'process_audio_stream',
arguments: {
audioChunks: [audioBase64],
},
},
});
console.log('Processing result:', response.content[0]);
await server.stop();
Available MCP Tools
The server provides these MCP tools:
process_audio_stream- Process audio chunks through the AI pipelineget_conversation_context- Retrieve conversation history and stateconfigure_services- Configure AI service providersclear_conversation- Clear conversation history
Development
Running Tests
# Run all tests
npm test
# Run only unit tests
npm run test:unit
# Run only integration tests
npm run test:integration
# Run tests with coverage
npm run test:coverage
# Watch mode for development
npm run test:watch
Linting and Formatting
# Lint code
npm run lint
# Fix linting issues
npm run lint:fix
# Type checking
npm run typecheck
Running Examples
# Basic usage example
npm run dev
Core Components
Frame System
The frame system provides typed data containers for audio and control data:
import {
InputAudioRawFrame,
OutputAudioRawFrame,
TextFrame,
} from 'badcat-mcp-server';
// Create audio frame
const audioData = new Float32Array(1024);
const frame = new InputAudioRawFrame(audioData, 24000, 1, 'user');
// Audio properties
console.log(frame.getDurationMs()); // Duration in milliseconds
console.log(frame.getRMSAmplitude()); // Audio amplitude
console.log(frame.isSilent()); // Silence detection
Pipeline Architecture
Build custom processing pipelines with frame processors:
import { Pipeline, FrameProcessor, TransformProcessor } from 'badcat-mcp-server';
// Custom processor
class EchoProcessor extends FrameProcessor {
async process(frame) {
if (frame instanceof TextFrame) {
return [new TextFrame(`Echo: ${frame.text}`)];
}
return [frame];
}
}
// Create pipeline
const pipeline = new Pipeline([
new EchoProcessor(),
new TransformProcessor(frame => /* transform logic */)
]);
await pipeline.start();
const results = await pipeline.processFrame(inputFrame);
await pipeline.stop();
Audio Processing
Handle audio format conversion and buffering:
import {
AudioChunkManager,
AudioFormatConverter,
CircularAudioBuffer,
} from 'badcat-mcp-server';
// Chunk management
const chunkManager = new AudioChunkManager(24000, 1, 20, 1000);
// Process variable-sized chunks into fixed frames
for await (const frame of chunkManager.processChunk(audioData)) {
// Process frame
}
// Format conversion
const buffer = Buffer.from(base64Audio, 'base64');
const audioData = AudioFormatConverter.bufferToFloat32(buffer);
const backToBuffer = AudioFormatConverter.float32ToBuffer(audioData);
Service Integration
Register and manage AI services:
import { ServiceRegistry, MockServiceFactory } from 'badcat-mcp-server';
const registry = new ServiceRegistry();
const services = MockServiceFactory.createAll({
stt: { language: 'en-US' },
llm: { temperature: 0.7 },
tts: { voice: 'neural-voice' },
});
registry.register('stt', services.stt);
registry.register('llm', services.llm);
registry.register('tts', services.tts);
await registry.initializeAll();
Testing
The project includes comprehensive test coverage:
- Unit Tests: Individual component testing with Vitest
- Integration Tests: End-to-end pipeline testing
- Mock Services: Realistic service implementations for testing
- Performance Tests: Load and concurrency testing
Test Structure
tests/
├── setup.ts # Global test configuration
└── integration/
├── audio-pipeline.test.ts # End-to-end pipeline tests
└── mcp-server.test.ts # MCP server integration tests
src/
├── frames/__tests__/ # Frame system tests
├── audio/__tests__/ # Audio processing tests
├── pipeline/__tests__/ # Pipeline architecture tests
└── services/__tests__/ # Service system tests
Example Test
it('should process audio through complete pipeline', async () => {
const pipeline = new Pipeline([audioProcessor]);
await pipeline.start();
const audioFrame = new InputAudioRawFrame(testAudio, 24000, 1);
const results = await pipeline.processFrame(audioFrame);
expect(results).toHaveLength(3); // Transcription, Response, Audio
expect(results[2]).toBeInstanceOf(TTSAudioRawFrame);
await pipeline.cleanup();
});
Configuration
Server Configuration
interface BadcatMCPConfig {
sampleRate?: number; // Audio sample rate (default: 24000)
channels?: number; // Audio channels (default: 1)
targetChunkSizeMs?: number; // Target chunk size (default: 20ms)
bufferSizeMs?: number; // Buffer size (default: 1000ms)
debug?: boolean; // Enable debug logging
defaultProviders?: {
// Default service providers
stt?: string;
llm?: string;
tts?: string;
};
}
Service Configuration
// STT Configuration
interface STTConfig {
language?: string;
sampleRate?: number;
enablePunctuation?: boolean;
enableWordTimestamps?: boolean;
interimResults?: boolean;
}
// LLM Configuration
interface LLMConfig {
temperature?: number;
maxTokens?: number;
topP?: number;
systemPrompt?: string;
}
// TTS Configuration
interface TTSConfig {
voice?: string;
sampleRate?: number;
speed?: number;
pitch?: number;
format?: 'wav' | 'mp3' | 'pcm';
}
API Reference
MCP Tool: process_audio_stream
Process audio chunks through the AI pipeline.
Input:
{
"audioChunks": ["base64-audio-data", ...],
"config": {
"sttProvider": "string",
"llmProvider": "string",
"ttsProvider": "string",
"sampleRate": 24000,
"channels": 1
}
}
Output:
{
"audioChunks": ["base64-audio-output", ...],
"metadata": {
"inputDuration": 1000,
"outputDuration": 1200,
"processingTime": 500,
"transcription": "Hello world",
"responseText": "Hi there! How can I help?",
"servicesUsed": {
"stt": "mock-stt",
"llm": "mock-llm",
"tts": "mock-tts"
}
}
}
MCP Tool: get_conversation_context
Retrieve current conversation state.
Output:
{
"messages": [
{
"role": "user",
"content": "Hello",
"timestamp": "2024-01-01T12:00:00Z",
"audioMetadata": {
"duration": 1000,
"sampleRate": 24000
}
}
],
"userId": "optional",
"sessionId": "optional"
}
Performance
The server is designed for real-time audio processing:
- Latency: Target <200ms end-to-end processing
- Throughput: Handles concurrent audio streams
- Memory: Efficient circular buffering for audio data
- CPU: Optimized frame processing pipeline
Benchmarks
Typical performance on modern hardware:
- Audio Processing: ~50ms for 1-second audio chunk
- Pipeline Latency: 100-200ms end-to-end
- Memory Usage: ~10MB base + audio buffers
- Concurrent Streams: 10+ simultaneous conversations
Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make changes with tests:
npm test - Lint and format:
npm run lint:fix - Commit changes:
git commit -m "Description" - Push branch:
git push origin feature-name - Create pull request
Development Guidelines
- Tests Required: All new features must include tests
- Type Safety: Full TypeScript typing required
- Documentation: Update README and code comments
- Performance: Consider impact on real-time processing
- Compatibility: Maintain MCP protocol compliance
License
See file.
Related Projects
- Pipecat (Python) - Original Python implementation
- Model Context Protocol - Protocol specification
- MCP Servers - Official MCP server implementations
Support
- Issues: GitHub Issues