mcp-server-whisper
MCP Server Whisper is a Model Context Protocol server designed for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.
MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. By implementing the Model Context Protocol, it enables AI assistants like Claude to seamlessly interact with audio processing capabilities. The server offers advanced file searching, parallel batch processing, format conversion, automatic compression, multi-model transcription, interactive audio chat, enhanced transcription, text-to-speech generation, comprehensive metadata, and high-performance caching. It supports a wide range of audio formats and provides tools for managing and processing audio files efficiently.
Features
- Advanced file searching with regex patterns and metadata filtering
- Parallel batch processing for multiple audio files
- Format conversion between supported audio types
- Multi-model transcription with support for all OpenAI audio models
- Text-to-speech generation with customizable voices and speed
Tools
list_audio_files
Lists audio files with comprehensive filtering and sorting options.
get_latest_audio
Gets the most recently modified audio file with model support info.
convert_audio
Converts audio files to supported formats (mp3 or wav).
compress_audio
Compresses audio files that exceed size limits.
transcribe_audio
Advanced transcription using OpenAI's models with custom prompts and timestamp support.
chat_with_audio
Interactive audio analysis using GPT-4o audio models.
transcribe_with_enhancement
Enhanced transcription with specialized templates.
create_claudecast
Generate text-to-speech audio using OpenAI's TTS API.