mcp-server-whisper

mcp-server-whisper

3.8

MCP Server Whisper is a Model Context Protocol server designed for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.

MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. By implementing the Model Context Protocol, it enables AI assistants like Claude to seamlessly interact with audio processing capabilities. The server offers advanced file searching, parallel batch processing, format conversion, automatic compression, multi-model transcription, interactive audio chat, enhanced transcription, text-to-speech generation, comprehensive metadata, and high-performance caching. It supports a wide range of audio formats and provides tools for managing and processing audio files efficiently.

Features

  • Advanced file searching with regex patterns and metadata filtering
  • Parallel batch processing for multiple audio files
  • Format conversion between supported audio types
  • Multi-model transcription with support for all OpenAI audio models
  • Text-to-speech generation with customizable voices and speed

Tools

  1. list_audio_files

    Lists audio files with comprehensive filtering and sorting options.

  2. get_latest_audio

    Gets the most recently modified audio file with model support info.

  3. convert_audio

    Converts audio files to supported formats (mp3 or wav).

  4. compress_audio

    Compresses audio files that exceed size limits.

  5. transcribe_audio

    Advanced transcription using OpenAI's models with custom prompts and timestamp support.

  6. chat_with_audio

    Interactive audio analysis using GPT-4o audio models.

  7. transcribe_with_enhancement

    Enhanced transcription with specialized templates.

  8. create_claudecast

    Generate text-to-speech audio using OpenAI's TTS API.