Deepgram-MCP

reddheeraj/Deepgram-MCP

3.3

If you are the rightful owner of Deepgram-MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Deepgram MCP Server provides access to advanced speech recognition and text-to-speech capabilities through a Model Context Protocol server.

Tools
4
Resources
0
Prompts
0

Deepgram MCP Server

A Model Context Protocol (MCP) server that provides access to Deepgram's speech recognition and text-to-speech capabilities.

Features

  • Audio Transcription: Convert audio to text with high accuracy
  • Text-to-Speech: Generate natural-sounding speech from text with automatic compression
  • Audio Analysis: Extract insights like sentiment, topics, intents, and entities
  • Speaker Diarization: Identify different speakers in audio
  • Language Detection: Automatically detect the language of audio
  • Multiple Models: Support for various Deepgram models optimized for different use cases
  • Smart Audio Compression: Automatically compresses generated audio files for efficient transfer

Installation

  1. Clone this repository
  2. Install dependencies:
    npm install
    
  3. Copy the environment file and add your Deepgram API key:
    cp env.example .env
    # Edit .env and add your DEEPGRAM_API_KEY, OPENAI_API_KEY or GROQ_API_KEY (whatever you want to use)
    
  4. Build the project:
    npm run build
    

Usage

HTTP Transport (Recommended for Production)

npm start
# or
node dist/index.js

The server will start on port 8080 by default. You can specify a different port:

node dist/index.js --port 8081

STDIO Transport (For Development)

npm run start:stdio
# or
node dist/index.js --stdio --port 8081

Available Tools

1. transcribe_audio

Transcribe audio to text with various options for customization.

Parameters:

  • audioUrl or audioData: Audio source (URL or base64)
  • model: Deepgram model to use (default: "nova-2-general")
  • language: Language code (default: "en")
  • punctuate: Add punctuation (default: true)
  • diarize: Speaker identification (default: false)
  • sentiment: Sentiment analysis (default: false)
  • And many more options...

2. text_to_speech

Convert text to speech using Deepgram's TTS models with automatic compression.

Parameters:

  • text: Text to convert to speech (required)
  • model: TTS model to use (default: "aura-asteria-en")
  • voice: Voice selection
  • format: Output format (default: "mp3")
  • speed: Speech speed (default: 1.0)

Output:

  • Original audio file saved to generated_audio/ folder
  • Compressed audio data saved to compressed_audio/ folder
  • Response includes file paths and compression metadata

3. analyze_audio

Perform advanced audio analysis including sentiment, topics, intents, and entities.

Parameters:

  • audioUrl or audioData: Audio source
  • features: Analysis features to enable
  • model: Model for analysis

4. get_models

Get information about available Deepgram models.

Parameters:

  • model_type: Filter by model type ("transcription", "tts", or "all")

Client Configuration

For MCP clients, use this configuration:

{
  "mcpServers": {
    "deepgram": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Development

# Watch mode for development
npm run watch

# Development with STDIO
npm run dev:stdio

# Development with HTTP
npm run dev

API Key

Get your Deepgram API key from Deepgram Console.

Audio Compression System

The TTS functionality includes an intelligent compression system that:

  • Automatically compresses generated audio files using gzip compression
  • Saves compressed data to separate files to avoid large agent responses
  • Provides decompression tools for easy audio file extraction
  • Maintains quality while reducing file sizes by 2-4x

File Structure

generated_audio/          # Original audio files
ā”œā”€ā”€ tts_2025-01-16T...mp3

compressed_audio/         # Compressed audio data
ā”œā”€ā”€ compressed_audio_2025-01-16T...json

decompressed_audio/       # Decompressed audio files (after extraction)
ā”œā”€ā”€ decompressed_2025-01-16T...mp3

Decompression Tools

Python Script (Recommended):

python decompress_audio.py <response_file_or_compressed_file>

Node.js Script:

npm run decompress <compressed_data_file>

Agno Integration

This MCP server also includes integration with Agno, a high-performance runtime for multi-agent systems.

Agno Tests

# Text-to-Speech test (saves audio to generated_audio/ and compressed_audio/)
npm run test:agno:tts

# Speech-to-Text test (transcribes sample audio)
npm run test:agno:stt

The TTS test will:

  1. Generate audio with automatic compression
  2. Save the response to tts_response.json
  3. Decompress the audio file to generated_audio/

License

MIT

Developer