mcp-fish-audio-server

mcp-fish-audio-server

3.4

If you are the rightful owner of mcp-fish-audio-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Fish Audio MCP Server is a Model Context Protocol server that integrates Fish Audio's Text-to-Speech API with LLMs like Claude for natural language-driven speech synthesis.

Fish Audio MCP Server

An MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.

What is Fish Audio?

Fish Audio is a cutting-edge Text-to-Speech platform that offers:

  • ๐ŸŒŠ State-of-the-art voice synthesis with natural-sounding output
  • ๐ŸŽฏ Voice cloning capabilities to create custom voice models
  • ๐ŸŒ Multilingual support including English, Japanese, Chinese, and more
  • โšก Low-latency streaming for real-time applications
  • ๐ŸŽจ Fine-grained control over speech prosody and emotions

This MCP server brings Fish Audio's powerful capabilities directly to your LLM workflows.

Features

  • ๐ŸŽ™๏ธ High-Quality TTS: Leverage Fish Audio's state-of-the-art TTS models
  • ๐ŸŒŠ Streaming Support: Real-time audio streaming for low-latency applications
  • ๐ŸŽจ Multiple Voices: Support for custom voice models via reference IDs
  • ๐ŸŽฏ Smart Voice Selection: Select voices by ID, name, or tags
  • ๐Ÿ“š Voice Library Management: Configure and manage multiple voice references
  • ๐Ÿ”ง Flexible Configuration: Environment variable-based configuration
  • ๐Ÿ“ฆ Multiple Audio Formats: Support for MP3, WAV, PCM, and Opus
  • ๐Ÿš€ Easy Integration: Simple setup with any MCP-compatible client

Quick Start

Installation

You can run this MCP server directly using npx:

npx @alanse/fish-audio-mcp-server

Or install it globally:

npm install -g @alanse/fish-audio-mcp-server

Configuration

  1. Get your Fish Audio API key from Fish Audio

  2. Set up environment variables:

export FISH_API_KEY=your_fish_audio_api_key_here
  1. Add to your MCP settings configuration:
Single Voice Mode (Simple)
{
  "mcpServers": {
    "fish-audio": {
      "command": "npx",
      "args": ["-y", "@alanse/fish-audio-mcp-server"],
      "env": {
        "FISH_API_KEY": "your_fish_audio_api_key_here",
        "FISH_MODEL_ID": "speech-1.6",
        "FISH_REFERENCE_ID": "your_voice_reference_id_here",
        "FISH_OUTPUT_FORMAT": "mp3",
        "FISH_STREAMING": "false",
        "FISH_LATENCY": "balanced",
        "FISH_MP3_BITRATE": "128",
        "FISH_AUTO_PLAY": "false",
        "AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output"
      }
    }
  }
}
Multiple Voice Mode (Advanced)
{
  "mcpServers": {
    "fish-audio": {
      "command": "npx",
      "args": ["-y", "@alanse/fish-audio-mcp-server"],
      "env": {
        "FISH_API_KEY": "your_fish_audio_api_key_here",
        "FISH_MODEL_ID": "speech-1.6",
        "FISH_REFERENCES": "[{'reference_id':'id1','name':'Alice','tags':['female','english']},{'reference_id':'id2','name':'Bob','tags':['male','japanese']},{'reference_id':'id3','name':'Carol','tags':['female','japanese','anime']}]",
        "FISH_DEFAULT_REFERENCE": "id1",
        "FISH_OUTPUT_FORMAT": "mp3",
        "FISH_STREAMING": "false",
        "FISH_LATENCY": "balanced",
        "FISH_MP3_BITRATE": "128",
        "FISH_AUTO_PLAY": "false",
        "AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output"
      }
    }
  }
}

Environment Variables

VariableDescriptionDefaultRequired
FISH_API_KEYYour Fish Audio API key-Yes
FISH_MODEL_IDTTS model to use (s1, speech-1.5, speech-1.6)s1Optional
FISH_REFERENCE_IDDefault voice reference ID (single reference mode)-Optional
FISH_REFERENCESMultiple voice references (see below)-Optional
FISH_DEFAULT_REFERENCEDefault reference ID when using multiple references-Optional
FISH_OUTPUT_FORMATDefault audio format (mp3, wav, pcm, opus)mp3Optional
FISH_STREAMINGEnable streaming mode (HTTP/WebSocket)falseOptional
FISH_LATENCYLatency mode (normal, balanced)balancedOptional
FISH_MP3_BITRATEMP3 bitrate (64, 128, 192)128Optional
FISH_AUTO_PLAYAuto-play audio and enable real-time playbackfalseOptional
AUDIO_OUTPUT_DIRDirectory for audio file output~/.fish-audio-mcp/audio_outputOptional

Configuring Multiple Voice References

You can configure multiple voice references in two ways:

JSON Array Format (Recommended)

Use the FISH_REFERENCES environment variable with a JSON array:

FISH_REFERENCES='[
  {"reference_id":"id1","name":"Alice","tags":["female","english"]},
  {"reference_id":"id2","name":"Bob","tags":["male","japanese"]},
  {"reference_id":"id3","name":"Carol","tags":["female","japanese","anime"]}
]'
FISH_DEFAULT_REFERENCE="id1"
Individual Format (Backward Compatibility)

Use numbered environment variables:

FISH_REFERENCE_1_ID=id1
FISH_REFERENCE_1_NAME=Alice
FISH_REFERENCE_1_TAGS=female,english

FISH_REFERENCE_2_ID=id2
FISH_REFERENCE_2_NAME=Bob
FISH_REFERENCE_2_TAGS=male,japanese

Usage

Once configured, the Fish Audio MCP server provides two tools to LLMs.

Tool 1: fish_audio_tts

Generates speech from text using Fish Audio's TTS API.

Parameters
  • text (required): Text to convert to speech (max 10,000 characters)
  • reference_id (optional): Voice model reference ID
  • reference_name (optional): Select voice by name
  • reference_tag (optional): Select voice by tag
  • streaming (optional): Enable streaming mode
  • format (optional): Output format (mp3, wav, pcm, opus)
  • mp3_bitrate (optional): MP3 bitrate (64, 128, 192)
  • normalize (optional): Enable text normalization (default: true)
  • latency (optional): Latency mode (normal, balanced)
  • output_path (optional): Custom output file path
  • auto_play (optional): Automatically play the generated audio
  • websocket_streaming (optional): Use WebSocket streaming instead of HTTP
  • realtime_play (optional): Play audio in real-time during WebSocket streaming

Voice Selection Priority: reference_id > reference_name > reference_tag > default

Tool 2: fish_audio_list_references

Lists all configured voice references.

Parameters

No parameters required.

Returns
  • List of configured voice references with their IDs, names, and tags
  • Default reference ID

Examples

Basic Text-to-Speech
User: "Generate speech saying 'Hello, world! Welcome to Fish Audio TTS.'"

Claude: I'll generate speech for that text using Fish Audio TTS.

[Uses fish_audio_tts tool with text parameter]

Result: Audio file saved to ./audio_output/tts_2025-01-03T10-30-00.mp3
Using Custom Voice by ID
User: "Generate speech with voice model xyz123 saying 'This is a custom voice test'"

Claude: I'll generate speech using the specified voice model.

[Uses fish_audio_tts tool with text and reference_id parameters]

Result: Audio generated with custom voice model xyz123
Using Voice by Name
User: "Use Alice's voice to say 'Hello from Alice'"

Claude: I'll generate speech using Alice's voice.

[Uses fish_audio_tts tool with reference_name: "Alice"]

Result: Audio generated with Alice's voice
Using Voice by Tag
User: "Generate Japanese speech saying 'ใ“ใ‚“ใซใกใฏ' with an anime voice"

Claude: I'll generate Japanese speech with an anime-style voice.

[Uses fish_audio_tts tool with reference_tag: "anime"]

Result: Audio generated with anime voice style
List Available Voices
User: "What voices are available?"

Claude: I'll list all configured voice references.

[Uses fish_audio_list_references tool]

Result:
- Alice (id: id1) - Tags: female, english [Default]
- Bob (id: id2) - Tags: male, japanese
- Carol (id: id3) - Tags: female, japanese, anime
HTTP Streaming Mode
User: "Generate a long speech in streaming mode about the benefits of AI"

Claude: I'll generate the speech in streaming mode for faster response.

[Uses fish_audio_tts tool with streaming: true]

Result: Streaming audio saved to ./audio_output/tts_2025-01-03T10-35-00.mp3
WebSocket Real-time Streaming
User: "Stream and play in real-time: 'Welcome to the future of AI'"

Claude: I'll stream the speech via WebSocket and play it in real-time.

[Uses fish_audio_tts tool with websocket_streaming: true, realtime_play: true]

Result: Audio streamed and played in real-time via WebSocket

Development

Local Development

  1. Clone the repository:
git clone https://github.com/da-okazaki/mcp-fish-audio-server.git
cd mcp-fish-audio-server
  1. Install dependencies:
npm install
  1. Create .env file:
cp .env.example .env
# Edit .env with your API key
  1. Build the project:
npm run build
  1. Run in development mode:
npm run dev

Testing

Run the test suite:

npm test

Project Structure

mcp-fish-audio-server/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ index.ts          # MCP server entry point
โ”‚   โ”œโ”€โ”€ tools/
โ”‚   โ”‚   โ””โ”€โ”€ tts.ts        # TTS tool implementation
โ”‚   โ”œโ”€โ”€ services/
โ”‚   โ”‚   โ””โ”€โ”€ fishAudio.ts  # Fish Audio API client
โ”‚   โ”œโ”€โ”€ types/
โ”‚   โ”‚   โ””โ”€โ”€ index.ts      # TypeScript definitions
โ”‚   โ””โ”€โ”€ utils/
โ”‚       โ””โ”€โ”€ config.ts     # Configuration management
โ”œโ”€โ”€ tests/                # Test files
โ”œโ”€โ”€ audio_output/         # Default audio output directory
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ tsconfig.json
โ””โ”€โ”€ README.md

API Documentation

Fish Audio Service

The service provides two main methods:

  1. generateSpeech: Standard TTS generation

    • Returns audio buffer
    • Suitable for short texts
    • Lower memory usage
  2. generateSpeechStream: Streaming TTS generation

    • Returns audio stream
    • Suitable for long texts
    • Real-time processing

Error Handling

The server handles various error scenarios:

  • INVALID_API_KEY: Invalid or missing API key
  • NETWORK_ERROR: Connection issues with Fish Audio API
  • INVALID_PARAMS: Invalid request parameters
  • QUOTA_EXCEEDED: API rate limit exceeded
  • SERVER_ERROR: Fish Audio server errors

Troubleshooting

Common Issues

  1. "FISH_API_KEY environment variable is required"

    • Ensure you've set the FISH_API_KEY environment variable
    • Check that the API key is valid
  2. "Network error: Unable to reach Fish Audio API"

    • Check your internet connection
    • Verify Fish Audio API is accessible
    • Check for proxy/firewall issues
  3. "Text length exceeds maximum limit"

    • Split long texts into smaller chunks
    • Maximum supported length is 10,000 characters
  4. Audio files not appearing

    • Check the AUDIO_OUTPUT_DIR path exists
    • Ensure write permissions for the directory

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the file for details.

Acknowledgments

  • Fish Audio for providing the excellent TTS API
  • Anthropic for creating the Model Context Protocol
  • The MCP community for inspiration and examples

Support

For issues, questions, or contributions, please visit the GitHub repository.

Changelog

See for a detailed list of changes.