elevenlabs-podcast-mcp by adamanz - MCP Server

ElevenLabs Podcast MCP Server

A Model Context Protocol (MCP) server for generating professional podcasts using ElevenLabs v3 Text-to-Speech API with Audio Tags support.

🎯 Key Features

🎙️ Multi-speaker dialogue with natural interruptions and overlapping speech
🏷️ Audio Tags for emotional control [excited], delivery [whispers], and effects [laughs]
⏱️ Smart duration control - content-aware with 10-minute maximum
🎨 Multiple podcast styles - interview, narrative, discussion, educational, comedy
🎭 Tone presets - professional, casual, excited, calm, dramatic
🌍 70+ language support with consistent voice quality
📝 AI script generation with Audio Tags
🔄 Batch processing for long-form content
🔊 High-quality audio output (up to 192kbps MP3)
🚀 Built with FastMCP for easy integration

📦 Installation

Clone the repository:

git clone <repository-url>
cd elevenlabs-podcast-mcp

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

cp .env.example .env
# Edit .env and add your ElevenLabs API key

🚀 Quick Start

Running the Server

Development mode:

fastmcp dev server.py

Production mode:

fastmcp run server.py --transport sse

🛠️ Available Tools

Core Tools

`generate_podcast`

Generate a complete podcast with Audio Tags, configurable style and tone.

{
    "script": "Host: [excitedly] Welcome! Guest: [thoughtfully] Great to be here!",
    "style": "interview",  # interview, narrative, discussion, educational, comedy
    "tone": "professional", # professional, casual, excited, calm, dramatic
    "duration_minutes": null,  # Auto-calculates based on content (max 10 min)
    "auto_duration": true,
    "voice_mapping": {"Host": "voice_id_1", "Guest": "voice_id_2"},
    "output_path": "output/episode.mp3"
}

`generate_script`

AI-powered script generation with Audio Tags.

{
    "topic": "Artificial Intelligence",
    "style": "interview",
    "duration_minutes": 5,
    "include_tags": true  # Includes Audio Tags for emotions
}

Example output:

Host: [excitedly] Welcome to Tech Talks! Today we're exploring AI.
Guest: [thoughtfully] This technology is transforming everything.
Host: [interrupting] —That's exactly what our listeners want to know!

`generate_long_podcast`

Handle long-form content with automatic batching (>3000 chars).

{
    "script": "Very long podcast script...",
    "style": "narrative",
    "tone": "dramatic",
    "output_path": "output/long_episode.mp3"
}

`preview_podcast`

Quick preview generation for testing voices and tones.

{
    "text": "[whispers] Testing the preview feature",
    "voice_id": "21m00Tcm4TlvDq8ikWAM",
    "tone": "dramatic"
}

Voice Management

`list_voices`

List all available voices from your ElevenLabs account.

Utility Tools

`create_podcast_project`

Create a structured project directory.

{
    "project_name": "MyPodcast",
    "description": "Weekly tech discussions"
}

🏷️ Audio Tags Reference

Audio Tags are wrapped in square brackets and control voice performance:

Emotions

[excited], [happy], [sad], [angry], [thoughtfully], [nervously]

Delivery

[whispers], [shouts], [quietly], [loudly]
[pause], [stammers], [rushed]

Reactions

[laughs], [sighs], [gasps], [clears throat], [chuckles]

Dialogue Dynamics

[interrupting], [overlapping], [jumping in]

Accents

[British accent], [French accent], [Australian accent]

Example Script with Audio Tags

Host: [excitedly] Welcome to our show! [pause] Today's topic is fascinating.
Guest: [thoughtfully] Indeed. [sighs] Let me explain why...
Host: [interrupting] —Actually, that reminds me of something!
Guest: [laughs] You always do that! [continuing] As I was saying...
Host: [whispers] Sorry, go ahead.
Guest: [normal voice] The key point is... [dramatically] Everything changes now!

🎨 Podcast Styles

Interview

Professional Q&A format with host and guest dynamics.

Narrative

Storytelling format with dramatic elements.

Discussion

Multi-participant roundtable with natural interruptions.

Educational

Clear, structured learning content.

Comedy

Humorous delivery with timing and sarcasm.

🎭 Tone Presets

Each tone adjusts voice parameters:

Professional: Balanced, clear delivery (stability: 0.7)
Casual: Relaxed, conversational (stability: 0.4)
Excited: High energy, enthusiastic (stability: 0.3)
Calm: Soothing, measured pace (stability: 0.8)
Dramatic: Theatrical, expressive (stability: 0.5)

📚 Available Resources

voices://presets - Preset voice configurations
config://settings - Server configuration
templates://podcast-scripts - Script templates with Audio Tags

💡 Usage Examples

Simple Podcast with Emotion

client.call_tool("generate_podcast", {
    "script": "Host: [excitedly] Breaking news everyone!",
    "style": "interview",
    "tone": "excited"
})

Multi-Speaker with Interruptions

script = """
Host: [starting] So the main issue is—
Guest: [interrupting] —Actually, I disagree!
Host: [surprised] Oh? Tell me more.
Guest: [explaining] Well, when you consider...
"""

client.call_tool("generate_podcast", {
    "script": script,
    "style": "discussion"
})

Auto-Generated Script

# First generate the script
script = client.call_tool("generate_script", {
    "topic": "Space Exploration",
    "style": "narrative",
    "include_tags": true
})

# Then create the podcast
client.call_tool("generate_podcast", {
    "script": script,
    "auto_duration": true
})

⚙️ Configuration

Environment Variables

ELEVENLABS_API_KEY=your_api_key_here
ELEVENLABS_MODEL=eleven_v3  # ALWAYS use v3 for Audio Tags
MAX_DURATION_MINUTES=10
DEFAULT_SPEAKING_RATE=150

Voice Defaults

Host: Rachel (21m00Tcm4TlvDq8ikWAM)
Guest: Drew (29vD33N1CtxCmqQRPOHJ)
Narrator: Bella (EXAVITQu4vr4xnSDxMaL)

🔧 Development

Project Structure

elevenlabs-podcast-mcp/
├── server.py              # Main MCP server with all tools
├── requirements.txt       # Python dependencies
├── .env.example          # Environment template
├── CLAUDE.md             # AI context documentation
├── README.md             # This file
└── ai-docs/              # Additional documentation

Adding Custom Tools

@mcp.tool
async def your_custom_tool(param: str) -> Dict:
    """Your tool description."""
    # Implementation
    return {"result": "success"}

Testing

# Inspect available tools
fastmcp inspect server.py

# Test specific tool
fastmcp dev server.py

📋 Requirements

Python 3.11+
ElevenLabs API key (v3 access required)
FastMCP framework
pydub (for audio processing)

⚠️ Important Notes

Always use eleven_v3 model for Audio Tags support
Character limit: 3000 per request (auto-batching for longer content)
Professional Voice Clones (PVCs) not fully optimized for v3 yet
Recommended: Use Instant Voice Clones (IVC) or designed voices

🐛 Troubleshooting

Rate Limiting

The server includes automatic retry with exponential backoff.

Long Content

Use generate_long_podcast for content >3000 characters.

Audio Tags Not Working

Ensure you're using eleven_v3 model, not eleven_turbo_v2_5.

📄 License

MIT

💬 Support

For issues or questions, please open a GitHub issue.