elevenlabs/elevenlabs-mcp

4.6

ai_chatbot communication entertainment_and_media

elevenlabs-mcp is hosted online, so all tools can be tested directly either in theInspector tabor in theOnline Client.

If you are the rightful owner of elevenlabs-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

Official ElevenLabs Model Context Protocol (MCP) server for interaction with Text to Speech and audio processing APIs.

Try elevenlabs-mcp with chat:

Tools

Functions exposed to the LLM to take actions

text_to_speech

Convert text to speech with a given voice. Saves output file to directory (default: $HOME/Desktop).

Only one of voice_id or voice_name can be provided. If none are provided, the default voice will be used.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

 Args:
    text (str): The text to convert to speech.
    voice_name (str, optional): The name of the voice to use.
    model_id (str, optional): The model ID to use for speech synthesis. Options include:
        - eleven_multilingual_v2: High quality multilingual model (29 languages)
        - eleven_flash_v2_5: Fastest model with ultra-low latency (32 languages)
        - eleven_turbo_v2_5: Balanced quality and speed (32 languages)
        - eleven_flash_v2: Fast English-only model
        - eleven_turbo_v2: Balanced English-only model
        - eleven_monolingual_v1: Legacy English model
        Defaults to eleven_multilingual_v2 or environment variable ELEVENLABS_MODEL_ID.
    stability (float, optional): Stability of the generated audio. Determines how stable the voice is and the randomness between each generation. Lower values introduce broader emotional range for the voice. Higher values can result in a monotonous voice with limited emotion. Range is 0 to 1.
    similarity_boost (float, optional): Similarity boost of the generated audio. Determines how closely the AI should adhere to the original voice when attempting to replicate it. Range is 0 to 1.
    style (float, optional): Style of the generated audio. Determines the style exaggeration of the voice. This setting attempts to amplify the style of the original speaker. It does consume additional computational resources and might increase latency if set to anything other than 0. Range is 0 to 1.
    use_speaker_boost (bool, optional): Use speaker boost of the generated audio. This setting boosts the similarity to the original speaker. Using this setting requires a slightly higher computational load, which in turn increases latency.
    speed (float, optional): Speed of the generated audio. Controls the speed of the generated speech. Values range from 0.7 to 1.2, with 1.0 being the default speed. Lower values create slower, more deliberate speech while higher values produce faster-paced speech. Extreme values can impact the quality of the generated speech. Range is 0.7 to 1.2.
    output_directory (str, optional): Directory where files should be saved (only used when saving files).
        Defaults to $HOME/Desktop if not provided.
    language: ISO 639-1 language code for the voice.
    output_format (str, optional): Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs.
        Defaults to "mp3_44100_128". Must be one of:
        mp3_22050_32
        mp3_44100_32
        mp3_44100_64
        mp3_44100_96
        mp3_44100_128
        mp3_44100_192
        pcm_8000
        pcm_16000
        pcm_22050
        pcm_24000
        pcm_44100
        ulaw_8000
        alaw_8000
        opus_48000_32
        opus_48000_64
        opus_48000_96
        opus_48000_128
        opus_48000_192

Returns:
    Text content with file path or MCP resource with audio data, depending on output mode.

speech_to_text

Transcribe speech from an audio file. When save_transcript_to_file=True: Saves output file to directory (default: $HOME/Desktop). When return_transcript_to_client_directly=True, always returns text directly regardless of output mode.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    file_path: Path to the audio file to transcribe
    language_code: ISO 639-3 language code for transcription. If not provided, the language will be detected automatically.
    diarize: Whether to diarize the audio file. If True, which speaker is currently speaking will be annotated in the transcription.
    save_transcript_to_file: Whether to save the transcript to a file.
    return_transcript_to_client_directly: Whether to return the transcript to the client directly.
    output_directory: Directory where files should be saved (only used when saving files).
        Defaults to $HOME/Desktop if not provided.

Returns:
    TextContent containing the transcription or MCP resource with transcript data.

text_to_sound_effects

Convert text description of a sound effect to sound effect with a given duration. Saves output file to directory (default: $HOME/Desktop).

Duration must be between 0.5 and 5 seconds.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    text: Text description of the sound effect
    duration_seconds: Duration of the sound effect in seconds
    output_directory: Directory where files should be saved (only used when saving files).
        Defaults to $HOME/Desktop if not provided.
    loop: Whether to loop the sound effect. Defaults to False.
    output_format (str, optional): Output format of the generated audio. Formatted as codec_sample_rate_bitrate. So an mp3 with 22.05kHz sample rate at 32kbs is represented as mp3_22050_32. MP3 with 192kbps bitrate requires you to be subscribed to Creator tier or above. PCM with 44.1kHz sample rate requires you to be subscribed to Pro tier or above. Note that the μ-law format (sometimes written mu-law, often approximated as u-law) is commonly used for Twilio audio inputs.
        Defaults to "mp3_44100_128". Must be one of:
        mp3_22050_32
        mp3_44100_32
        mp3_44100_64
        mp3_44100_96
        mp3_44100_128
        mp3_44100_192
        pcm_8000
        pcm_16000
        pcm_22050
        pcm_24000
        pcm_44100
        ulaw_8000
        alaw_8000
        opus_48000_32
        opus_48000_64
        opus_48000_96
        opus_48000_128
        opus_48000_192

search_voices

Search for existing voices, a voice that has already been added to the user's ElevenLabs voice library.
Searches in name, description, labels and category.

Args:
    search: Search term to filter voices by. Searches in name, description, labels and category.
    sort: Which field to sort by. `created_at_unix` might not be available for older voices.
    sort_direction: Sort order, either ascending or descending.

Returns:
    List of voices that match the search criteria.

list_models

List all available models

get_voice

Get details of a specific voice

voice_clone

Create an instant voice clone of a voice using provided audio files.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

isolate_audio

Isolate audio from a file. Saves output file to directory (default: $HOME/Desktop).

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

check_subscription

Check the current subscription status. Could be used to measure the usage of the API.

create_agent

Create a conversational AI agent with custom configuration.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    name: Name of the agent
    first_message: First message the agent will say i.e. "Hi, how can I help you today?"
    system_prompt: System prompt for the agent
    voice_id: ID of the voice to use for the agent
    language: ISO 639-1 language code for the agent
    llm: LLM to use for the agent
    temperature: Temperature for the agent. The lower the temperature, the more deterministic the agent's responses will be. Range is 0 to 1.
    max_tokens: Maximum number of tokens to generate.
    asr_quality: Quality of the ASR. `high` or `low`.
    model_id: ID of the ElevenLabs model to use for the agent.
    optimize_streaming_latency: Optimize streaming latency. Range is 0 to 4.
    stability: Stability for the agent. Range is 0 to 1.
    similarity_boost: Similarity boost for the agent. Range is 0 to 1.
    turn_timeout: Timeout for the agent to respond in seconds. Defaults to 7 seconds.
    max_duration_seconds: Maximum duration of a conversation in seconds. Defaults to 600 seconds (10 minutes).
    record_voice: Whether to record the agent's voice.
    retention_days: Number of days to retain the agent's data.

add_knowledge_base_to_agent

Add a knowledge base to ElevenLabs workspace. Allowed types are epub, pdf, docx, txt, html.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    agent_id: ID of the agent to add the knowledge base to.
    knowledge_base_name: Name of the knowledge base.
    url: URL of the knowledge base.
    input_file_path: Path to the file to add to the knowledge base.
    text: Text to add to the knowledge base.

list_agents

List all available conversational AI agents

get_agent

Get details about a specific conversational AI agent

get_conversation

Gets conversation with transcript. Returns: conversation details and full transcript. Use when: analyzing completed agent conversations.

Args:
    conversation_id: The unique identifier of the conversation to retrieve, you can get the ids from the list_conversations tool.

list_conversations

Lists agent conversations. Returns: conversation list with metadata. Use when: asked about conversation history.

Args:
    agent_id (str, optional): Filter conversations by specific agent ID
    cursor (str, optional): Pagination cursor for retrieving next page of results
    call_start_before_unix (int, optional): Filter conversations that started before this Unix timestamp
    call_start_after_unix (int, optional): Filter conversations that started after this Unix timestamp
    page_size (int, optional): Number of conversations to return per page (1-100, defaults to 30)
    max_length (int, optional): Maximum character length of the response text (defaults to 10000)

speech_to_speech

Transform audio from one voice to another using provided audio files. Saves output file to directory (default: $HOME/Desktop).

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

text_to_voice

Create voice previews from a text prompt. Creates three previews with slight variations. Saves output file to directory (default: $HOME/Desktop).

If no text is provided, the tool will auto-generate text.

Voice preview files are saved as: voice_design_(generated_voice_id)_(timestamp).mp3

Example file name: voice_design_Ya2J5uIa5Pq14DNPsbC1_20250403_164949.mp3

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

create_voice_from_preview

Add a generated voice to the voice library. Uses the voice ID from the text_to_voice tool.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

make_outbound_call

Make an outbound call using an ElevenLabs agent. Automatically detects provider type (Twilio or SIP trunk) and uses the appropriate API.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

Args:
    agent_id: The ID of the agent that will handle the call
    agent_phone_number_id: The ID of the phone number to use for the call
    to_number: The phone number to call (E.164 format: +1xxxxxxxxxx)

Returns:
    TextContent containing information about the call

search_voice_library

Search for a voice across the entire ElevenLabs voice library.

Args:
    page: Page number to return (0-indexed)
    page_size: Number of voices to return per page (1-100)
    search: Search term to filter voices by

Returns:
    TextContent containing information about the shared voices

list_phone_numbers

List all phone numbers associated with the ElevenLabs account

play_audio

Play an audio file. Supports WAV and MP3 formats.

compose_music

Convert a prompt to music and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop.

Args:
    prompt: Prompt to convert to music. Must provide either prompt or composition_plan.
    output_directory: Directory to save the output audio file
    composition_plan: Composition plan to use for the music. Must provide either prompt or composition_plan.
    music_length_ms: Length of the generated music in milliseconds. Cannot be used if composition_plan is provided.

⚠️ COST WARNING: This tool makes an API call to ElevenLabs which may incur costs. Only use when explicitly requested by the user.

create_composition_plan

Create a composition plan for music generation. Usage of this endpoint does not cost any credits but is subject to rate limiting depending on your tier. Composition plans can be used when generating music with the compose_music tool.

Args:
    prompt: Prompt to create a composition plan for
    music_length_ms: The length of the composition plan to generate in milliseconds. Must be between 10000ms and 300000ms. Optional - if not provided, the model will choose a length based on the prompt.
    source_composition_plan: An optional composition plan to use as a source for the new composition plan

Prompts

Interactive templates invoked by user choice

No prompts

Resources

Contextual data attached and managed by the client

No resources

Related MCP Servers

View all ai_chatbot servers →

biomcp

4.7

by genomoncology

BioMCP is an open-source toolkit designed to enhance AI assistants with specialized biomedical knowledge by connecting them to authoritative biomedical data sources.

research_and_data

claude-task-master

4.7

by eyaltoledano

Task Master is a task management system for AI-driven development with Claude, designed to work seamlessly with Cursor AI.

ai_chatbot

mindsdb

4.6

by mindsdb

MindsDB is an open-source server that enables seamless interaction with large-scale federated data using the Model Context Protocol (MCP).

databases

tavily-mcp

4.6

by tavily-ai

The Tavily MCP server is a Model Context Protocol server that integrates with AI systems to provide real-time web search and data extraction capabilities.

browser_automation

paelladoc

4.5

by jlcases

PAELLADOC is an AI-First Development framework implementing the Model Context Protocol (MCP) to enhance LLM interactions with external tools and context.

ai_chatbot

mcp-hfspace

4.5

by evalstate

mcp-hfspace MCP Server connects to Hugging Face Spaces with minimal setup, providing Image Generation capabilities to Claude Desktop.

ai_chatbot

notion-mcp-server

4.4

by awkoy

Notion MCP Server is a Model Context Protocol server implementation that enables AI assistants to interact with Notion's API, providing tools for reading, creating, and modifying Notion content through natural language interactions.

ai_chatbot

pg-mcp-server

4.4

by stuzero

unknown

databases

f2c-mcp

4.4

by f2c-ai

A Model Context Protocol server for Figma Design to Code using F2C.

developer_tools

whois-mcp

4.3

by bharathvaj-ganesan

The Whois MCP server allows AI agents to perform WHOIS lookups to retrieve domain details.

research_and_data

runno MCP

4.3

by taybenlor

`@runno/mcp` is a Model Context Protocol server that provides a secure code execution environment for AI assistants.

ai_chatbot

mcp-crawl4ai-rag

4.2

by coleam00

Crawl4AI RAG MCP Server is a powerful implementation of the Model Context Protocol (MCP) integrated with Crawl4AI and Supabase, providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.

browser_automation

just-prompt

4.2

by disler

Just Prompt is a lightweight MCP server providing a unified interface to various LLM providers.

ai_chatbot

perplexity-mcp

4.2

by jsonallen

A Model Context Protocol (MCP) server that provides web search functionality using Perplexity AI's API, compatible with the Anthropic Claude desktop client.

research_and_data

jinni

4.2

by smat-dev

Jinni is a tool designed to efficiently provide Large Language Models (LLMs) with the context of your projects by consolidating relevant project files.

developer_tools

mcp-server-gemini

4.2

by aliargun

Gemini MCP Server is a Model Context Protocol server implementation that allows Claude Desktop to interact with Google's Gemini AI models.

ai_chatbot

bazi-mcp

4.1

by cantian-ai

Unlock precise Bazi insights with the Bazi MCP, the first AI-powered Bazi calculator.

ai_chatbot

backlog-mcp-server

4.1

by nulab

A Model Context Protocol (MCP) server for interacting with the Backlog API, providing tools for managing projects, issues, wiki pages, and more through AI agents.

developer_tools

rapidocr-mcp

4.1

by z4none

RapidOCR MCP Server is a Model Context Protocol server that provides an easy-to-use OCR interface.

ai_chatbot

mcp-perplexity

4.1

by daniel-lxs

The Perplexity MCP Server provides a Python-based interface to the Perplexity API, offering tools for querying responses, maintaining chat history, and managing conversations.

communication

vertex-ai-mcp-server

4.0

by shariqriazz

This project implements a Model Context Protocol (MCP) server that provides a comprehensive suite of tools for interacting with Google Cloud's Vertex AI Gemini models, focusing on coding assistance and general query answering.

ai_chatbot

github-mcp-server

4.0

by github

The GitHub MCP Server is a Model Context Protocol server that integrates with GitHub APIs for automation and interaction.

developer_tools

drawio-mcp-server

4.0

by lgazo

The Draw.io MCP server is a Model Context Protocol implementation that integrates Draw.io's diagramming capabilities with AI agentic systems.

ai_chatbot

mcp-tavily

3.9

by RamXX

Tavily MCP Server is a Model Context Protocol server that provides AI-powered web search capabilities using Tavily's search API.

research_and_data

ableton-mcp

3.8

by ahujasid

AbletonMCP connects Ableton Live to Claude AI through the Model Context Protocol (MCP), allowing Claude to directly interact with and control Ableton Live.

ai_chatbot