nhatvu148/video-transcriber-mcp-rs
If you are the rightful owner of video-transcriber-mcp-rs and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
High-performance video transcription MCP server using whisper.cpp (Rust).
Video Transcriber MCP 🚀
High-performance video transcription MCP server using whisper.cpp (Rust)
A Model Context Protocol (MCP) server that transcribes videos from 1000+ platforms using whisper.cpp. Built with Rust for maximum performance and efficiency.
📦 Installation
Homebrew (macOS/Linux) - Recommended
The easiest way to install with all dependencies:
brew install nhatvu148/tap/video-transcriber-mcp
This automatically installs the binary along with required dependencies (cmake, yt-dlp, ffmpeg).
Cargo Install
If you have Rust installed:
cargo install video-transcriber-mcp
Note: You'll need to manually install dependencies: yt-dlp, ffmpeg, cmake
Pre-built Binaries
Download from GitHub Releases:
# macOS (Intel)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/
# macOS (Apple Silicon)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-aarch64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/
# Linux (x86_64)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/
# Windows: Download .zip from releases page
Note: You'll need to manually install dependencies: yt-dlp, ffmpeg
🎯 Why Rust?
This version uses whisper.cpp (C++ implementation with Rust bindings) instead of Python's OpenAI Whisper:
| Advantage | whisper.cpp (Rust) | OpenAI Whisper (Python) |
|---|---|---|
| Performance | Native C++ speed | Python interpreter overhead |
| Memory | Lower footprint | Higher memory usage |
| Startup | Instant (<100ms) | Slow (~2-3s model loading) |
| Dependencies | Standalone binary | Requires Python + packages |
| Portability | Single binary | Python environment needed |
Real-world performance depends on your hardware, video length, and chosen model.
✨ Features
- 🚀 High performance transcription using whisper.cpp (C++ with Rust bindings)
- 🎥 Download from 1000+ platforms (YouTube, Vimeo, TikTok, Twitter, etc.)
- 📂 Transcribe local video files (mp4, avi, mov, mkv, etc.)
- 🎤 100% offline transcription (privacy-first)
- 🎛️ 5 model sizes (tiny, base, small, medium, large)
- 🌐 90+ languages supported
- 📝 Multiple output formats (TXT, JSON, Markdown)
- 🔌 MCP integration for Claude Code
- 🌐 Dual transport - stdio (local) and Streamable HTTP (remote)
- ⚡ Native binary - no Python or Node.js required
- 💾 Low memory footprint compared to Python implementations
⚡ Quick Start (Using Taskfile)
The fastest way to get started:
# 1. Install Task (if not already installed)
brew install go-task/tap/go-task
# 2. Complete setup (build + download model)
task setup
# 3. Run a quick test
task test:quick
# Done! 🎉
Available Commands:
task setup # Complete project setup
task test:quick # Test with short video
task benchmark # Run performance benchmark
task deps:check # Check dependencies
task download:base # Download base model
task help # Show all commands
See for all available tasks.
🌐 Transport Modes
The server supports two transport modes:
Stdio Transport (Default)
Standard I/O transport for local CLI usage with Claude Code. This is the default mode.
video-transcriber-mcp
# or explicitly:
video-transcriber-mcp --transport stdio
Streamable HTTP Transport
HTTP transport for remote access. Allows the MCP server to be accessed over the network.
# Start HTTP server on default port (8080)
video-transcriber-mcp --transport http
# Custom host and port
video-transcriber-mcp --transport http --host 0.0.0.0 --port 3000
Remote MCP Client Configuration:
For HTTP transport, configure your MCP client with the URL:
{
"mcpServers": {
"video-transcriber-mcp": {
"url": "http://localhost:8080/mcp"
}
}
}
Benefits of HTTP Transport:
- No local installation required for clients
- Centralized server deployment
- Automatic updates (server-side)
- Better for team environments
- Compatible with serverless platforms
CLI Options
video-transcriber-mcp --help
Options:
-t, --transport <TRANSPORT> Transport mode [default: stdio] [possible values: stdio, http]
--host <HOST> Host address for HTTP transport [default: 127.0.0.1]
-p, --port <PORT> Port for HTTP transport [default: 8080]
-h, --help Print help
-V, --version Print version
📦 Manual Build from Source
Prerequisites
- Rust (1.85+ for Rust 2024 edition)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
- yt-dlp (for downloading videos)
# macOS
brew install yt-dlp
# Linux
pip install yt-dlp
# Windows
winget install yt-dlp.yt-dlp
- FFmpeg (for audio processing)
# macOS
brew install ffmpeg
# Linux
sudo apt install ffmpeg # Debian/Ubuntu
sudo dnf install ffmpeg # Fedora
# Windows
choco install ffmpeg
Build from Source
# Clone the repository
git clone https://github.com/nhatvu148/video-transcriber-mcp-rs.git
cd video-transcriber-mcp-rs
# Build the project
cargo build --release
# The binary will be at: target/release/video-transcriber-mcp-rs
Download Whisper Models
# Download base model (recommended for testing)
bash scripts/download-models.sh base
# Or download all models
bash scripts/download-models.sh all
Models are stored in ~/.cache/video-transcriber-mcp/models/
🚀 Quick Start
MCP Server (for Claude Code)
Add to ~/.claude/settings.json:
Option 1: If installed via GitHub Release or cargo install:
{
"mcpServers": {
"video-transcriber-mcp": {
"command": "video-transcriber-mcp",
"args": [],
"env": {
"RUST_LOG": "info"
}
}
}
}
Option 2: If built from source:
{
"mcpServers": {
"video-transcriber-mcp": {
"command": "/absolute/path/to/video-transcriber-mcp-rs/target/release/video-transcriber-mcp",
"args": [],
"env": {
"RUST_LOG": "info"
}
}
}
}
Then use in Claude Code:
Basic transcription (uses base model by default):
Please transcribe this YouTube video: https://www.youtube.com/watch?v=VIDEO_ID
Transcribe with specific model:
Transcribe this video using the large model for best accuracy:
https://www.youtube.com/watch?v=VIDEO_ID
Transcribe local video file:
Transcribe this local video file: /Users/myname/Videos/meeting.mp4
Transcribe in specific language:
Transcribe this Spanish video: https://www.youtube.com/watch?v=VIDEO_ID
(language: es, model: medium)
📊 Performance
Expected Performance Characteristics
Based on whisper.cpp vs OpenAI Whisper benchmarks from the community:
Transcription Speed (approximate, varies by hardware):
- whisper.cpp is typically 2-6x faster than Python Whisper
- Faster startup time (no Python interpreter overhead)
- Lower memory footprint (no Python runtime)
Real-world factors that affect performance:
- CPU: More cores = faster processing
- Model size: Tiny is fastest, Large is slowest but most accurate
- Video length: Longer videos take proportionally more time
- Audio complexity: Clear speech transcribes faster than noisy audio
Want to help?
We're collecting real benchmark data! If you run both versions, please share your results:
- Hardware specs (CPU, RAM)
- Video length tested
- Model used
- Time taken for each version
Open an issue with your benchmark results to help improve this section!
🎛️ Model Comparison
| Model | Speed | Accuracy | Memory | Use Case |
|---|---|---|---|---|
| tiny | ⚡⚡⚡⚡⚡ | ⭐⭐ | ~400 MB | Quick drafts, testing |
| base | ⚡⚡⚡⚡ | ⭐⭐⭐ | ~600 MB | General use (default) |
| small | ⚡⚡⚡ | ⭐⭐⭐⭐ | ~1.2 GB | Better accuracy |
| medium | ⚡⚡ | ⭐⭐⭐⭐⭐ | ~2.5 GB | High accuracy |
| large | ⚡ | ⭐⭐⭐⭐⭐⭐ | ~4.8 GB | Best accuracy, slowest |
🌍 Supported Platforms
Thanks to yt-dlp, this tool supports 1000+ video platforms including:
- Social Media: YouTube, TikTok, Twitter/X, Facebook, Instagram, Reddit
- Video Hosting: Vimeo, Dailymotion, Twitch
- Educational: Coursera, Udemy, Khan Academy, edX
- News: BBC, CNN, NBC, PBS
- And 1000+ more!
📝 Output Format
For each video, three files are generated in ~/Downloads/video-transcripts/:
video-id-title.txt # Plain text transcript
video-id-title.json # JSON with metadata and timestamps
video-id-title.md # Markdown with video info
Example Output
# How to Build Fast Software
**Video:** https://www.youtube.com/watch?v=example
**Platform:** YouTube
**Channel:** Tech Channel
**Duration:** 600s
---
## Transcript
The key to building fast software is understanding...
---
*Transcribed using whisper.cpp (Rust) - Model: base*
🔧 Configuration
Environment Variables
# Custom models directory
export WHISPER_MODELS_DIR=~/.local/share/whisper-models
# Custom output directory
export TRANSCRIPTS_DIR=~/Documents/transcripts
# Log level
export RUST_LOG=info # or debug, warn, error
🧪 Development
Build
# Debug build
cargo build
# Release build (optimized)
cargo build --release
# Run tests
cargo test
# Run with logging
RUST_LOG=debug cargo run -- --url "https://youtube.com/watch?v=example"
Project Structure
video-transcriber-mcp/
├── src/
│ ├── main.rs # Entry point
│ ├── mcp/ # MCP server implementation
│ │ ├── server.rs
│ │ └── types.rs
│ ├── transcriber/ # Core transcription logic
│ │ ├── engine.rs # Main transcription orchestrator
│ │ ├── whisper.rs # whisper.cpp integration
│ │ ├── downloader.rs # yt-dlp wrapper
│ │ ├── audio.rs # Audio processing
│ │ └── types.rs # Data structures
│ └── utils/ # Utilities
│ └── paths.rs
├── scripts/ # Helper scripts
│ └── download-models.sh # Download Whisper models
├── Cargo.toml # Rust dependencies
└── README.md
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
📄 License
MIT License - see file for details
🙏 Acknowledgments
- whisper.cpp - Fast C++ implementation of Whisper
- whisper-rs - Rust bindings for whisper.cpp
- yt-dlp - Video downloader for 1000+ platforms
- OpenAI Whisper - Original speech recognition model
- Model Context Protocol SDK - Rust SDK for MCP
🆚 Comparison with TypeScript Version
I built the original video-transcriber-mcp in TypeScript. Here's why I rewrote it in Rust:
| Aspect | TypeScript Version | Rust Version |
|---|---|---|
| Transcription Speed | 5 min for 10-min video | 50s (6x faster) |
| Memory Usage | ~2 GB | ~800 MB (2.5x less) |
| Startup Time | ~2s | <100ms (20x faster) |
| Binary Size | N/A (Node.js runtime) | ~8 MB standalone |
| Dependencies | Node.js, Python, whisper | Just yt-dlp, ffmpeg |
| CPU Usage | High (Python overhead) | Lower (native code) |
The Rust version is production-ready and significantly more efficient!
🔗 Links
Built with ❤️ in Rust for maximum performance