Deepgram-MCP by reddheeraj - MCP Server

Deepgram MCP Server

A Model Context Protocol (MCP) server that provides access to Deepgram's speech recognition and text-to-speech capabilities.

Features

Audio Transcription: Convert audio to text with high accuracy
Text-to-Speech: Generate natural-sounding speech from text with automatic compression
Audio Analysis: Extract insights like sentiment, topics, intents, and entities
Speaker Diarization: Identify different speakers in audio
Language Detection: Automatically detect the language of audio
Multiple Models: Support for various Deepgram models optimized for different use cases
Smart Audio Compression: Automatically compresses generated audio files for efficient transfer

Installation

Clone this repository
Install dependencies:
```
npm install
```

Copy the environment file and add your Deepgram API key:

cp env.example .env
# Edit .env and add your DEEPGRAM_API_KEY, OPENAI_API_KEY or GROQ_API_KEY (whatever you want to use)

Build the project:
```
npm run build
```

Usage

HTTP Transport (Recommended for Production)

npm start
# or
node dist/index.js

The server will start on port 8080 by default. You can specify a different port:

node dist/index.js --port 8081

STDIO Transport (For Development)

npm run start:stdio
# or
node dist/index.js --stdio --port 8081

Available Tools

1. transcribe_audio

Transcribe audio to text with various options for customization.

Parameters:

audioUrl or audioData: Audio source (URL or base64)
model: Deepgram model to use (default: "nova-2-general")
language: Language code (default: "en")
punctuate: Add punctuation (default: true)
diarize: Speaker identification (default: false)
sentiment: Sentiment analysis (default: false)
And many more options...

2. text_to_speech

Convert text to speech using Deepgram's TTS models with automatic compression.

Parameters:

text: Text to convert to speech (required)
model: TTS model to use (default: "aura-asteria-en")
voice: Voice selection
format: Output format (default: "mp3")
speed: Speech speed (default: 1.0)

Output:

Original audio file saved to generated_audio/ folder
Compressed audio data saved to compressed_audio/ folder
Response includes file paths and compression metadata

3. analyze_audio

Perform advanced audio analysis including sentiment, topics, intents, and entities.

Parameters:

audioUrl or audioData: Audio source
features: Analysis features to enable
model: Model for analysis

4. get_models

Get information about available Deepgram models.

Parameters:

model_type: Filter by model type ("transcription", "tts", or "all")

Client Configuration

For MCP clients, use this configuration:

{
  "mcpServers": {
    "deepgram": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Development

# Watch mode for development
npm run watch

# Development with STDIO
npm run dev:stdio

# Development with HTTP
npm run dev

API Key

Get your Deepgram API key from Deepgram Console.

Audio Compression System

The TTS functionality includes an intelligent compression system that:

Automatically compresses generated audio files using gzip compression
Saves compressed data to separate files to avoid large agent responses
Provides decompression tools for easy audio file extraction
Maintains quality while reducing file sizes by 2-4x

File Structure

generated_audio/          # Original audio files
├── tts_2025-01-16T...mp3

compressed_audio/         # Compressed audio data
├── compressed_audio_2025-01-16T...json

decompressed_audio/       # Decompressed audio files (after extraction)
├── decompressed_2025-01-16T...mp3

Decompression Tools

Python Script (Recommended):

python decompress_audio.py <response_file_or_compressed_file>

Node.js Script:

npm run decompress <compressed_data_file>

Agno Integration

This MCP server also includes integration with Agno, a high-performance runtime for multi-agent systems.

Agno Tests

# Text-to-Speech test (saves audio to generated_audio/ and compressed_audio/)
npm run test:agno:tts

# Speech-to-Text test (transcribes sample audio)
npm run test:agno:stt

The TTS test will:

Generate audio with automatic compression
Save the response to tts_response.json
Decompress the audio file to generated_audio/

License

MIT

Developer

Dheeraj Mudireddy (meetdheerajreddy@gmail.com)