badcat-mcp-server by rquast - MCP Server

Badcat MCP Server

A TypeScript implementation of Pipecat's core audio processing pipeline as a Model Context Protocol (MCP) server. This server provides real-time audio-to-audio conversation capabilities through a standardized MCP interface.

Features

🎙️ Audio Stream Processing: Handle chunked audio input/output with real-time processing
🧠 AI Pipeline: Integrated Speech-to-Text → Large Language Model → Text-to-Speech pipeline
🔧 Modular Architecture: Pluggable services for STT, LLM, and TTS providers
📦 MCP Compatible: Standard Model Context Protocol interface for easy integration
🧪 Comprehensive Testing: Unit tests and integration tests with mock services
⚡ High Performance: Efficient audio buffering and streaming capabilities
🔊 Audio Processing: Built-in audio format conversion, resampling, and chunking

Architecture

The server implements these core concepts in TypeScript:

graph TD
    A[Audio Input Chunks] --> B[Audio Buffer Manager]
    B --> C[Frame Processor Pipeline]
    C --> D[STT Service]
    D --> E[LLM Service]
    E --> F[TTS Service]
    F --> G[Audio Output Manager]
    G --> H[Audio Output Chunks]

    subgraph "MCP Server"
        I[Tool Handler]
        J[Conversation Context]
        K[Service Registry]
    end

    I --> C
    C --> J
    K --> D
    K --> E
    K --> F

Installation

cd badcat-mcp-server
npm install

Quick Start

Basic Usage

import {
  createMockBadcatServer,
  createTestAudio,
  audioToBase64,
} from 'badcat-mcp-server';

// Create server with mock services
const server = createMockBadcatServer({
  sampleRate: 24000,
  channels: 1,
  debug: true,
});

await server.start();

// Process audio through MCP interface
const mcpServer = server.getMCPServer();
const testAudio = createTestAudio(1.0, 24000, 440); // 1 second, 440Hz
const audioBase64 = audioToBase64(testAudio);

const response = await mcpServer.request({
  method: 'tools/call',
  params: {
    name: 'process_audio_stream',
    arguments: {
      audioChunks: [audioBase64],
    },
  },
});

console.log('Processing result:', response.content[0]);

await server.stop();

Available MCP Tools

The server provides these MCP tools:

process_audio_stream - Process audio chunks through the AI pipeline
get_conversation_context - Retrieve conversation history and state
configure_services - Configure AI service providers
clear_conversation - Clear conversation history

Development

Running Tests

# Run all tests
npm test

# Run only unit tests
npm run test:unit

# Run only integration tests
npm run test:integration

# Run tests with coverage
npm run test:coverage

# Watch mode for development
npm run test:watch

Linting and Formatting

# Lint code
npm run lint

# Fix linting issues
npm run lint:fix

# Type checking
npm run typecheck

Running Examples

# Basic usage example
npm run dev

Core Components

Frame System

The frame system provides typed data containers for audio and control data:

import {
  InputAudioRawFrame,
  OutputAudioRawFrame,
  TextFrame,
} from 'badcat-mcp-server';

// Create audio frame
const audioData = new Float32Array(1024);
const frame = new InputAudioRawFrame(audioData, 24000, 1, 'user');

// Audio properties
console.log(frame.getDurationMs()); // Duration in milliseconds
console.log(frame.getRMSAmplitude()); // Audio amplitude
console.log(frame.isSilent()); // Silence detection

Pipeline Architecture

Build custom processing pipelines with frame processors:

import { Pipeline, FrameProcessor, TransformProcessor } from 'badcat-mcp-server';

// Custom processor
class EchoProcessor extends FrameProcessor {
  async process(frame) {
    if (frame instanceof TextFrame) {
      return [new TextFrame(`Echo: ${frame.text}`)];
    }
    return [frame];
  }
}

// Create pipeline
const pipeline = new Pipeline([
  new EchoProcessor(),
  new TransformProcessor(frame => /* transform logic */)
]);

await pipeline.start();
const results = await pipeline.processFrame(inputFrame);
await pipeline.stop();

Audio Processing

Handle audio format conversion and buffering:

import {
  AudioChunkManager,
  AudioFormatConverter,
  CircularAudioBuffer,
} from 'badcat-mcp-server';

// Chunk management
const chunkManager = new AudioChunkManager(24000, 1, 20, 1000);

// Process variable-sized chunks into fixed frames
for await (const frame of chunkManager.processChunk(audioData)) {
  // Process frame
}

// Format conversion
const buffer = Buffer.from(base64Audio, 'base64');
const audioData = AudioFormatConverter.bufferToFloat32(buffer);
const backToBuffer = AudioFormatConverter.float32ToBuffer(audioData);

Service Integration

import { ServiceRegistry, MockServiceFactory } from 'badcat-mcp-server';

const registry = new ServiceRegistry();
const services = MockServiceFactory.createAll({
  stt: { language: 'en-US' },
  llm: { temperature: 0.7 },
  tts: { voice: 'neural-voice' },
});

registry.register('stt', services.stt);
registry.register('llm', services.llm);
registry.register('tts', services.tts);

await registry.initializeAll();

Testing

The project includes comprehensive test coverage:

Unit Tests: Individual component testing with Vitest
Integration Tests: End-to-end pipeline testing
Mock Services: Realistic service implementations for testing
Performance Tests: Load and concurrency testing

Test Structure

tests/
├── setup.ts                    # Global test configuration
└── integration/
    ├── audio-pipeline.test.ts  # End-to-end pipeline tests
    └── mcp-server.test.ts      # MCP server integration tests

src/
├── frames/__tests__/           # Frame system tests
├── audio/__tests__/            # Audio processing tests
├── pipeline/__tests__/         # Pipeline architecture tests
└── services/__tests__/         # Service system tests

Example Test

it('should process audio through complete pipeline', async () => {
  const pipeline = new Pipeline([audioProcessor]);
  await pipeline.start();

  const audioFrame = new InputAudioRawFrame(testAudio, 24000, 1);
  const results = await pipeline.processFrame(audioFrame);

  expect(results).toHaveLength(3); // Transcription, Response, Audio
  expect(results[2]).toBeInstanceOf(TTSAudioRawFrame);

  await pipeline.cleanup();
});

Configuration

Server Configuration

interface BadcatMCPConfig {
  sampleRate?: number; // Audio sample rate (default: 24000)
  channels?: number; // Audio channels (default: 1)
  targetChunkSizeMs?: number; // Target chunk size (default: 20ms)
  bufferSizeMs?: number; // Buffer size (default: 1000ms)
  debug?: boolean; // Enable debug logging
  defaultProviders?: {
    // Default service providers
    stt?: string;
    llm?: string;
    tts?: string;
  };
}

Service Configuration

// STT Configuration
interface STTConfig {
  language?: string;
  sampleRate?: number;
  enablePunctuation?: boolean;
  enableWordTimestamps?: boolean;
  interimResults?: boolean;
}

// LLM Configuration
interface LLMConfig {
  temperature?: number;
  maxTokens?: number;
  topP?: number;
  systemPrompt?: string;
}

// TTS Configuration
interface TTSConfig {
  voice?: string;
  sampleRate?: number;
  speed?: number;
  pitch?: number;
  format?: 'wav' | 'mp3' | 'pcm';
}

API Reference

MCP Tool: `process_audio_stream`

Process audio chunks through the AI pipeline.

Input:

{
  "audioChunks": ["base64-audio-data", ...],
  "config": {
    "sttProvider": "string",
    "llmProvider": "string",
    "ttsProvider": "string",
    "sampleRate": 24000,
    "channels": 1
  }
}

Output:

{
  "audioChunks": ["base64-audio-output", ...],
  "metadata": {
    "inputDuration": 1000,
    "outputDuration": 1200,
    "processingTime": 500,
    "transcription": "Hello world",
    "responseText": "Hi there! How can I help?",
    "servicesUsed": {
      "stt": "mock-stt",
      "llm": "mock-llm",
      "tts": "mock-tts"
    }
  }
}

MCP Tool: `get_conversation_context`

Retrieve current conversation state.

Output:

{
  "messages": [
    {
      "role": "user",
      "content": "Hello",
      "timestamp": "2024-01-01T12:00:00Z",
      "audioMetadata": {
        "duration": 1000,
        "sampleRate": 24000
      }
    }
  ],
  "userId": "optional",
  "sessionId": "optional"
}

Performance

The server is designed for real-time audio processing:

Latency: Target <200ms end-to-end processing
Throughput: Handles concurrent audio streams
Memory: Efficient circular buffering for audio data
CPU: Optimized frame processing pipeline

Benchmarks

Typical performance on modern hardware:

Audio Processing: ~50ms for 1-second audio chunk
Pipeline Latency: 100-200ms end-to-end
Memory Usage: ~10MB base + audio buffers
Concurrent Streams: 10+ simultaneous conversations

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make changes with tests: npm test
Lint and format: npm run lint:fix
Commit changes: git commit -m "Description"
Push branch: git push origin feature-name
Create pull request

Development Guidelines

Tests Required: All new features must include tests
Type Safety: Full TypeScript typing required
Documentation: Update README and code comments
Performance: Consider impact on real-time processing
Compatibility: Maintain MCP protocol compliance

License

See file.

Related Projects

Pipecat (Python) - Original Python implementation
Model Context Protocol - Protocol specification
MCP Servers - Official MCP server implementations

Support

Issues: GitHub Issues

rquast/badcat-mcp-server