mcp-server-funasr

mcp-server-funasr

3.3

If you are the rightful owner of mcp-server-funasr and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

MCPServer is a Python-based server that utilizes Alibaba's FunASR library to provide advanced speech processing services through the FastMCP framework.

MCPServer is a robust server application designed to offer comprehensive speech processing capabilities using Alibaba's FunASR library. It operates within the FastMCP framework, enabling efficient and scalable handling of audio files for various speech-related tasks. The server supports audio validation, asynchronous speech transcription, and voice activity detection (VAD), making it a versatile tool for developers and researchers working with audio data. MCPServer is highly extensible, allowing users to dynamically load and switch between different ASR and VAD models from FunASR's extensive model zoo. This flexibility ensures that users can tailor the server's functionality to meet specific project requirements. Additionally, the server provides detailed transcription results, including segment-level and word-level timestamps, enhancing the granularity of speech analysis. With its asynchronous processing capabilities, MCPServer is well-suited for handling long audio files without blocking operations, making it an ideal choice for applications requiring real-time or batch processing of speech data.

Features

  • Audio File Validation: Ensures audio files are valid and provides their properties.
  • Asynchronous Speech-to-Text Transcription: Non-blocking transcription for long audio files.
  • Transcription Task Management: Manage tasks, query status, and retrieve results.
  • Voice Activity Detection (VAD): Identifies speech segments with precise timestamps.
  • Dynamic Model Configuration: Load and switch ASR and VAD models as needed.

Tools

  1. validate_audio_file

    Validates an audio file and provides its properties.

  2. start_speech_transcription

    Starts an asynchronous speech transcription task.

  3. get_transcription_task_status

    Queries the status of a transcription task.

  4. get_transcription_result

    Retrieves the result of a completed transcription task.

  5. load_asr_model

    Loads or reloads a specific ASR model.

  6. get_voice_activity_segments

    Detects speech segments using a VAD model.

  7. load_vad_model

    Loads or reloads a specific VAD model.