mcp-server-funasr
If you are the rightful owner of mcp-server-funasr and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
MCPServer is a Python-based server that utilizes Alibaba's FunASR library to provide advanced speech processing services through the FastMCP framework.
MCPServer is a robust server application designed to offer comprehensive speech processing capabilities using Alibaba's FunASR library. It operates within the FastMCP framework, enabling efficient and scalable handling of audio files for various speech-related tasks. The server supports audio validation, asynchronous speech transcription, and voice activity detection (VAD), making it a versatile tool for developers and researchers working with audio data. MCPServer is highly extensible, allowing users to dynamically load and switch between different ASR and VAD models from FunASR's extensive model zoo. This flexibility ensures that users can tailor the server's functionality to meet specific project requirements. Additionally, the server provides detailed transcription results, including segment-level and word-level timestamps, enhancing the granularity of speech analysis. With its asynchronous processing capabilities, MCPServer is well-suited for handling long audio files without blocking operations, making it an ideal choice for applications requiring real-time or batch processing of speech data.
Features
- Audio File Validation: Ensures audio files are valid and provides their properties.
- Asynchronous Speech-to-Text Transcription: Non-blocking transcription for long audio files.
- Transcription Task Management: Manage tasks, query status, and retrieve results.
- Voice Activity Detection (VAD): Identifies speech segments with precise timestamps.
- Dynamic Model Configuration: Load and switch ASR and VAD models as needed.
Tools
validate_audio_file
Validates an audio file and provides its properties.
start_speech_transcription
Starts an asynchronous speech transcription task.
get_transcription_task_status
Queries the status of a transcription task.
get_transcription_result
Retrieves the result of a completed transcription task.
load_asr_model
Loads or reloads a specific ASR model.
get_voice_activity_segments
Detects speech segments using a VAD model.
load_vad_model
Loads or reloads a specific VAD model.