MiniMax-MCP

MiniMax-AI/MiniMax-MCP

4.3

MiniMax-MCP is hosted online, so all tools can be tested directly either in theInspector tabor in theOnline Client.

If you are the rightful owner of MiniMax-MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Official MiniMax Model Context Protocol (MCP) server for interaction with Text to Speech and video/image generation APIs.

Try MiniMax-MCP with chat:

Tools

Functions exposed to the LLM to take actions

text_to_audio

Convert text to audio with a given voice and save the output audio file to a given directory. Directory is optional, if not provided, the output file will be saved to $HOME/Desktop. Voice id is optional, if not provided, the default voice will be used.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

Args:
    text (str): The text to convert to speech.
    voice_id (str, optional): The id of the voice to use. For example, "male-qn-qingse"/"audiobook_female_1"/"cute_boy"/"Charming_Lady"...
    model (string, optional): The model to use.
    speed (float, optional): Speed of the generated audio. Controls the speed of the generated speech. Values range from 0.5 to 2.0, with 1.0 being the default speed. 
    vol (float, optional): Volume of the generated audio. Controls the volume of the generated speech. Values range from 0 to 10, with 1 being the default volume.
    pitch (int, optional): Pitch of the generated audio. Controls the speed of the generated speech. Values range from -12 to 12, with 0 being the default speed.
    emotion (str, optional): Emotion of the generated audio. Controls the emotion of the generated speech. Values range ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"], with "happy" being the default emotion.
    sample_rate (int, optional): Sample rate of the generated audio. Controls the sample rate of the generated speech. Values range [8000,16000,22050,24000,32000,44100] with 32000 being the default sample rate.
    bitrate (int, optional): Bitrate of the generated audio. Controls the bitrate of the generated speech. Values range [32000,64000,128000,256000] with 128000 being the default bitrate.
    channel (int, optional): Channel of the generated audio. Controls the channel of the generated speech. Values range [1, 2] with 1 being the default channel.
    format (str, optional): Format of the generated audio. Controls the format of the generated speech. Values range ["pcm", "mp3","flac"] with "mp3" being the default format.
    language_boost (str, optional): Language boost of the generated audio. Controls the language boost of the generated speech. Values range ['Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'auto'] with "auto" being the default language boost.
    output_directory (str): The directory to save the audio to.

Returns:
    Text content with the path to the output file and name of the voice used.

list_voices

List all voices available.

Args:
    voice_type (str, optional): The type of voices to list. Values range ["all", "system", "voice_cloning"], with "all" being the default.
Returns:
    Text content with the list of voices.

voice_clone

Clone a voice using provided audio files. The new voice will be charged upon first use.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

 Args:
    voice_id (str): The id of the voice to use.
    file (str): The path to the audio file to clone or a URL to the audio file.
    text (str, optional): The text to use for the demo audio.
    is_url (bool, optional): Whether the file is a URL. Defaults to False.
    output_directory (str): The directory to save the demo audio to.
Returns:
    Text content with the voice id of the cloned voice.

play_audio

Play an audio file. Supports WAV and MP3 formats. Not supports video.

 Args:
    input_file_path (str): The path to the audio file to play.
    is_url (bool, optional): Whether the audio file is a URL.
Returns:
    Text content with the path to the audio file.

generate_video

Generate a video from a prompt.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

 Args:
    model (str, optional): The model to use. Values range ["T2V-01", "T2V-01-Director", "I2V-01", "I2V-01-Director", "I2V-01-live", "MiniMax-Hailuo-02"]. "Director" supports inserting instructions for camera movement control. "I2V" for image to video. "T2V" for text to video. "MiniMax-Hailuo-02" is the latest model with best effect, ultra-clear quality and precise response.
    prompt (str): The prompt to generate the video from. When use Director model, the prompt supports 15 Camera Movement Instructions (Enumerated Values)
        -Truck: [Truck left], [Truck right]
        -Pan: [Pan left], [Pan right]
        -Push: [Push in], [Pull out]
        -Pedestal: [Pedestal up], [Pedestal down]
        -Tilt: [Tilt up], [Tilt down]
        -Zoom: [Zoom in], [Zoom out]
        -Shake: [Shake]
        -Follow: [Tracking shot]
        -Static: [Static shot]
    first_frame_image (str): The first frame image. The model must be "I2V" Series.
    duration (int, optional): The duration of the video. The model must be "MiniMax-Hailuo-02". Values can be 6 and 10.
    resolution (str, optional): The resolution of the video. The model must be "MiniMax-Hailuo-02". Values range ["768P", "1080P"]
    output_directory (str): The directory to save the video to.
    async_mode (bool, optional): Whether to use async mode. Defaults to False. If True, the video generation task will be submitted asynchronously and the response will return a task_id. Should use `query_video_generation` tool to check the status of the task and get the result.
Returns:
    Text content with the path to the output video file.

query_video_generation

Query the status of a video generation task.

Args:
    task_id (str): The task ID to query. Should be the task_id returned by `generate_video` tool if `async_mode` is True.
    output_directory (str): The directory to save the video to.
Returns:
    Text content with the status of the task.

text_to_image

Generate a image from a prompt.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

 Args:
    model (str, optional): The model to use. Values range ["image-01"], with "image-01" being the default.
    prompt (str): The prompt to generate the image from.
    aspect_ratio (str, optional): The aspect ratio of the image. Values range ["1:1", "16:9","4:3", "3:2", "2:3", "3:4", "9:16", "21:9"], with "1:1" being the default.
    n (int, optional): The number of images to generate. Values range [1, 9], with 1 being the default.
    prompt_optimizer (bool, optional): Whether to optimize the prompt. Values range [True, False], with True being the default.
    output_directory (str): The directory to save the image to.
Returns:
    Text content with the path to the output image file.

music_generation

Create a music generation task using AI models. Generate music from prompt and lyrics.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

Args:
    prompt (str): Music creation inspiration describing style, mood, scene, etc.
        Example: "Pop music, sad, suitable for rainy nights". Character range: [10, 300]
    lyrics (str): Song lyrics for music generation.
        Use newline (\n) to separate each line of lyrics. Supports lyric structure tags [Intro][Verse][Chorus][Bridge][Outro] 
        to enhance musicality. Character range: [10, 600] (each Chinese character, punctuation, and letter counts as 1 character)
    stream (bool, optional): Whether to enable streaming mode. Defaults to False
    sample_rate (int, optional): Sample rate of generated music. Values: [16000, 24000, 32000, 44100]
    bitrate (int, optional): Bitrate of generated music. Values: [32000, 64000, 128000, 256000]
    format (str, optional): Format of generated music. Values: ["mp3", "wav", "pcm"]. Defaults to "mp3"
    output_directory (str, optional): Directory to save the generated music file
    
Note: Currently supports generating music up to 1 minute in length.

Returns:
    Text content with the path to the generated music file or generation status.

voice_design

Generate a voice based on description prompts.

COST WARNING: This tool makes an API call to Minimax which may incur costs. Only use when explicitly requested by the user.

 Args:
    prompt (str): The prompt to generate the voice from.
    preview_text (str): The text to preview the voice.
    voice_id (str, optional): The id of the voice to use. For example, "male-qn-qingse"/"audiobook_female_1"/"cute_boy"/"Charming_Lady"...
    output_directory (str, optional): The directory to save the voice to.
Returns:
    Text content with the path to the output voice file.

Prompts

Interactive templates invoked by user choice

No prompts

Resources

Contextual data attached and managed by the client

No resources