video-transcriber-mcp-rs

nhatvu148/video-transcriber-mcp-rs

3.3

If you are the rightful owner of video-transcriber-mcp-rs and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

High-performance video transcription MCP server using whisper.cpp (Rust).

Video Transcriber MCP 🚀

High-performance video transcription MCP server using whisper.cpp (Rust)

License: MIT Rust Version

A Model Context Protocol (MCP) server that transcribes videos from 1000+ platforms using whisper.cpp. Built with Rust for maximum performance and efficiency.

📦 Installation

Homebrew (macOS/Linux) - Recommended

The easiest way to install with all dependencies:

brew install nhatvu148/tap/video-transcriber-mcp

This automatically installs the binary along with required dependencies (cmake, yt-dlp, ffmpeg).

Cargo Install

If you have Rust installed:

cargo install video-transcriber-mcp

Note: You'll need to manually install dependencies: yt-dlp, ffmpeg, cmake

Pre-built Binaries

Download from GitHub Releases:

# macOS (Intel)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# macOS (Apple Silicon)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-aarch64-apple-darwin.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/nhatvu148/video-transcriber-mcp-rs/releases/latest/download/video-transcriber-mcp-x86_64-unknown-linux-gnu.tar.gz | tar xz
sudo mv video-transcriber-mcp /usr/local/bin/

# Windows: Download .zip from releases page

Note: You'll need to manually install dependencies: yt-dlp, ffmpeg

🎯 Why Rust?

This version uses whisper.cpp (C++ implementation with Rust bindings) instead of Python's OpenAI Whisper:

Advantagewhisper.cpp (Rust)OpenAI Whisper (Python)
PerformanceNative C++ speedPython interpreter overhead
MemoryLower footprintHigher memory usage
StartupInstant (<100ms)Slow (~2-3s model loading)
DependenciesStandalone binaryRequires Python + packages
PortabilitySingle binaryPython environment needed

Real-world performance depends on your hardware, video length, and chosen model.

✨ Features

  • 🚀 High performance transcription using whisper.cpp (C++ with Rust bindings)
  • 🎥 Download from 1000+ platforms (YouTube, Vimeo, TikTok, Twitter, etc.)
  • 📂 Transcribe local video files (mp4, avi, mov, mkv, etc.)
  • 🎤 100% offline transcription (privacy-first)
  • 🎛️ 5 model sizes (tiny, base, small, medium, large)
  • 🌐 90+ languages supported
  • 📝 Multiple output formats (TXT, JSON, Markdown)
  • 🔌 MCP integration for Claude Code
  • 🌐 Dual transport - stdio (local) and Streamable HTTP (remote)
  • Native binary - no Python or Node.js required
  • 💾 Low memory footprint compared to Python implementations

⚡ Quick Start (Using Taskfile)

The fastest way to get started:

# 1. Install Task (if not already installed)
brew install go-task/tap/go-task

# 2. Complete setup (build + download model)
task setup

# 3. Run a quick test
task test:quick

# Done! 🎉

Available Commands:

task setup           # Complete project setup
task test:quick      # Test with short video
task benchmark       # Run performance benchmark
task deps:check      # Check dependencies
task download:base   # Download base model
task help            # Show all commands

See for all available tasks.


🌐 Transport Modes

The server supports two transport modes:

Stdio Transport (Default)

Standard I/O transport for local CLI usage with Claude Code. This is the default mode.

video-transcriber-mcp
# or explicitly:
video-transcriber-mcp --transport stdio

Streamable HTTP Transport

HTTP transport for remote access. Allows the MCP server to be accessed over the network.

# Start HTTP server on default port (8080)
video-transcriber-mcp --transport http

# Custom host and port
video-transcriber-mcp --transport http --host 0.0.0.0 --port 3000

Remote MCP Client Configuration:

For HTTP transport, configure your MCP client with the URL:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Benefits of HTTP Transport:

  • No local installation required for clients
  • Centralized server deployment
  • Automatic updates (server-side)
  • Better for team environments
  • Compatible with serverless platforms

CLI Options

video-transcriber-mcp --help

Options:
  -t, --transport <TRANSPORT>  Transport mode [default: stdio] [possible values: stdio, http]
      --host <HOST>            Host address for HTTP transport [default: 127.0.0.1]
  -p, --port <PORT>            Port for HTTP transport [default: 8080]
  -h, --help                   Print help
  -V, --version                Print version

📦 Manual Build from Source

Prerequisites

  1. Rust (1.85+ for Rust 2024 edition)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  1. yt-dlp (for downloading videos)
# macOS
brew install yt-dlp

# Linux
pip install yt-dlp

# Windows
winget install yt-dlp.yt-dlp
  1. FFmpeg (for audio processing)
# macOS
brew install ffmpeg

# Linux
sudo apt install ffmpeg  # Debian/Ubuntu
sudo dnf install ffmpeg  # Fedora

# Windows
choco install ffmpeg

Build from Source

# Clone the repository
git clone https://github.com/nhatvu148/video-transcriber-mcp-rs.git
cd video-transcriber-mcp-rs

# Build the project
cargo build --release

# The binary will be at: target/release/video-transcriber-mcp-rs

Download Whisper Models

# Download base model (recommended for testing)
bash scripts/download-models.sh base

# Or download all models
bash scripts/download-models.sh all

Models are stored in ~/.cache/video-transcriber-mcp/models/

🚀 Quick Start

MCP Server (for Claude Code)

Add to ~/.claude/settings.json:

Option 1: If installed via GitHub Release or cargo install:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "command": "video-transcriber-mcp",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Option 2: If built from source:

{
  "mcpServers": {
    "video-transcriber-mcp": {
      "command": "/absolute/path/to/video-transcriber-mcp-rs/target/release/video-transcriber-mcp",
      "args": [],
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Then use in Claude Code:

Basic transcription (uses base model by default):

Please transcribe this YouTube video: https://www.youtube.com/watch?v=VIDEO_ID

Transcribe with specific model:

Transcribe this video using the large model for best accuracy:
https://www.youtube.com/watch?v=VIDEO_ID

Transcribe local video file:

Transcribe this local video file: /Users/myname/Videos/meeting.mp4

Transcribe in specific language:

Transcribe this Spanish video: https://www.youtube.com/watch?v=VIDEO_ID
(language: es, model: medium)

📊 Performance

Expected Performance Characteristics

Based on whisper.cpp vs OpenAI Whisper benchmarks from the community:

Transcription Speed (approximate, varies by hardware):

  • whisper.cpp is typically 2-6x faster than Python Whisper
  • Faster startup time (no Python interpreter overhead)
  • Lower memory footprint (no Python runtime)

Real-world factors that affect performance:

  • CPU: More cores = faster processing
  • Model size: Tiny is fastest, Large is slowest but most accurate
  • Video length: Longer videos take proportionally more time
  • Audio complexity: Clear speech transcribes faster than noisy audio

Want to help?

We're collecting real benchmark data! If you run both versions, please share your results:

  • Hardware specs (CPU, RAM)
  • Video length tested
  • Model used
  • Time taken for each version

Open an issue with your benchmark results to help improve this section!

🎛️ Model Comparison

ModelSpeedAccuracyMemoryUse Case
tiny⚡⚡⚡⚡⚡⭐⭐~400 MBQuick drafts, testing
base⚡⚡⚡⚡⭐⭐⭐~600 MBGeneral use (default)
small⚡⚡⚡⭐⭐⭐⭐~1.2 GBBetter accuracy
medium⚡⚡⭐⭐⭐⭐⭐~2.5 GBHigh accuracy
large⭐⭐⭐⭐⭐⭐~4.8 GBBest accuracy, slowest

🌍 Supported Platforms

Thanks to yt-dlp, this tool supports 1000+ video platforms including:

  • Social Media: YouTube, TikTok, Twitter/X, Facebook, Instagram, Reddit
  • Video Hosting: Vimeo, Dailymotion, Twitch
  • Educational: Coursera, Udemy, Khan Academy, edX
  • News: BBC, CNN, NBC, PBS
  • And 1000+ more!

📝 Output Format

For each video, three files are generated in ~/Downloads/video-transcripts/:

video-id-title.txt   # Plain text transcript
video-id-title.json  # JSON with metadata and timestamps
video-id-title.md    # Markdown with video info

Example Output

# How to Build Fast Software

**Video:** https://www.youtube.com/watch?v=example
**Platform:** YouTube
**Channel:** Tech Channel
**Duration:** 600s

---

## Transcript

The key to building fast software is understanding...

---

*Transcribed using whisper.cpp (Rust) - Model: base*

🔧 Configuration

Environment Variables

# Custom models directory
export WHISPER_MODELS_DIR=~/.local/share/whisper-models

# Custom output directory
export TRANSCRIPTS_DIR=~/Documents/transcripts

# Log level
export RUST_LOG=info  # or debug, warn, error

🧪 Development

Build

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Run tests
cargo test

# Run with logging
RUST_LOG=debug cargo run -- --url "https://youtube.com/watch?v=example"

Project Structure

video-transcriber-mcp/
├── src/
│   ├── main.rs              # Entry point
│   ├── mcp/                 # MCP server implementation
│   │   ├── server.rs
│   │   └── types.rs
│   ├── transcriber/         # Core transcription logic
│   │   ├── engine.rs        # Main transcription orchestrator
│   │   ├── whisper.rs       # whisper.cpp integration
│   │   ├── downloader.rs    # yt-dlp wrapper
│   │   ├── audio.rs         # Audio processing
│   │   └── types.rs         # Data structures
│   └── utils/               # Utilities
│       └── paths.rs
├── scripts/                 # Helper scripts
│   └── download-models.sh   # Download Whisper models
├── Cargo.toml               # Rust dependencies
└── README.md

🤝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

📄 License

MIT License - see file for details

🙏 Acknowledgments

🆚 Comparison with TypeScript Version

I built the original video-transcriber-mcp in TypeScript. Here's why I rewrote it in Rust:

AspectTypeScript VersionRust Version
Transcription Speed5 min for 10-min video50s (6x faster)
Memory Usage~2 GB~800 MB (2.5x less)
Startup Time~2s<100ms (20x faster)
Binary SizeN/A (Node.js runtime)~8 MB standalone
DependenciesNode.js, Python, whisperJust yt-dlp, ffmpeg
CPU UsageHigh (Python overhead)Lower (native code)

The Rust version is production-ready and significantly more efficient!

🔗 Links


Built with ❤️ in Rust for maximum performance