discord-mcp by mrrustybutter - MCP Server

Discord Bot MCP

A Model Context Protocol (MCP) server that provides Discord bot functionality with voice transcription (using Google Gemini) and text-to-speech (using ElevenLabs).

Features

🤖 Full Discord bot functionality through MCP
🎤 Voice channel transcription using Google Gemini
🔊 Text-to-speech in voice channels using ElevenLabs
💬 Text channel messaging
📝 Transcript logging
🔄 Real-time voice activity detection

Prerequisites

Discord Bot
- Create a bot on Discord Developer Portal
- Enable the following intents:
  - GUILD_MESSAGES
  - MESSAGE_CONTENT
  - GUILD_VOICE_STATES
- Get your bot token and client ID
Google Gemini API
- Get an API key from Google AI Studio
ElevenLabs API
- Get an API key from ElevenLabs
- Note your preferred voice ID

Installation

npm install

Configuration

Copy .env.example to .env:
```
cp .env.example .env
```

Fill in your credentials:

# Discord Bot Configuration
DISCORD_BOT_TOKEN=your_discord_bot_token_here
DISCORD_CLIENT_ID=your_discord_client_id_here

# Google Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-1.5-flash-002

# ElevenLabs Configuration  
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
ELEVENLABS_VOICE_ID=Au8OOcCmvsCaQpmULvvQ

Usage

Development Mode

npm run dev

Production Mode

npm run build
npm start

SSE Mode (for HTTP transport)

npm run dev:sse    # Development
npm run start:sse  # Production

MCP Tools

Bot Management

bot_connect - Connect the Discord bot
bot_disconnect - Disconnect the Discord bot
bot_status - Get bot status and connection info

Voice Channel

join_voice_channel - Join a voice channel
leave_voice_channel - Leave the current voice channel
speak_in_voice - Use TTS to speak in voice channel
start_transcription - Start transcribing voice
stop_transcription - Stop transcribing voice

Text Channel

send_message - Send a message to a text channel

Guild Management

list_guilds - List all guilds the bot is in
list_channels - List all channels in a guild

Voice Transcription

The bot uses:

Voice Activity Detection (VAD) to detect when users are speaking
Google Gemini for accurate speech-to-text transcription
Automatic silence detection to segment speech

Transcripts are saved to the ./transcripts directory in JSON format.

Text-to-Speech

The bot can speak in voice channels using:

ElevenLabs API for natural-sounding speech
Configurable voice selection
Streaming audio playback

Architecture

discord-bot-mcp/
├── src/
│   ├── index.ts          # MCP server entry point
│   ├── config.ts         # Configuration management
│   ├── services/
│   │   ├── discord-bot.ts    # Discord bot service
│   │   ├── transcription.ts  # Voice transcription service
│   │   └── elevenlabs.ts     # TTS service
│   └── utils/
│       ├── logger.ts          # Logging utility
│       └── transcript-logger.ts # Transcript file management

Environment Variables

Variable	Description	Default
`DISCORD_BOT_TOKEN`	Discord bot token	Required
`DISCORD_CLIENT_ID`	Discord application client ID	Required
`GEMINI_API_KEY`	Google Gemini API key	Required
`ELEVENLABS_API_KEY`	ElevenLabs API key	Required
`ELEVENLABS_VOICE_ID`	Voice ID for TTS	`Au8OOcCmvsCaQpmULvvQ`
`GEMINI_MODEL`	Gemini model to use	`gemini-1.5-flash-002`
`LOG_LEVEL`	Logging level	`info`
`ENABLE_TRANSCRIPT_LOGGING`	Save transcripts to files	`true`
`TRANSCRIPT_DIR`	Directory for transcripts	`./transcripts`

Testing

Quick Test with Curl (Stateless)

# Start the server
pnpm dev:sse

# Test with curl (no session needed)
curl -X POST http://localhost:3003/message \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "bot_status",
      "arguments": {}
    },
    "id": 1
  }'

# List available tools
curl -X POST http://localhost:3003/message \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc": "2.0", "method": "tools/list", "params": {}, "id": 1}'

# Check server status
curl http://localhost:3003/status

Stateful Operations with Sessions

For stateful operations (like setting current server/channel), create a session:

# Create a session
SESSION_ID=$(curl -s http://localhost:3003/session | python3 -c "import sys, json; print(json.load(sys.stdin)['sessionId'])")

# Use the session for stateful operations
curl -X POST "http://localhost:3003/message?sessionId=$SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "view_server",
      "arguments": {"server_name": "RustyButter"}
    },
    "id": 1
  }'

# Now list channels will show channels from the current server
curl -X POST "http://localhost:3003/message?sessionId=$SESSION_ID" \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "list_channels",
      "arguments": {}
    },
    "id": 2
  }'

Sessions are maintained server-side for 30 minutes of inactivity.

License

MIT