youtube-transcribe-mcp by Takuma-AI - MCP Server

YouTube Transcribe MCP Server

Enables Claude to download and transcribe YouTube videos directly into Takuma OS's knowledge base.

Features

Download & Transcribe: Automatically downloads YouTube videos and transcribes them using OpenAI's Whisper
Knowledge Integration: Saves transcripts directly to knowledge/_inbox/ for processing
Configurable Models: Choose from tiny to large Whisper models based on accuracy needs
Metadata Preservation: Captures channel, upload date, and URL alongside transcripts
Timestamp Support: Includes VTT files for precise timestamp reference

Prerequisites

The following tools must be installed on your system:

yt-dlp: pip install yt-dlp
whisper: pip install openai-whisper
ffmpeg: brew install ffmpeg (macOS) or apt-get install ffmpeg (Linux)

Installation

Already installed! The server is connected to Claude Code at:

/Users/kate/Documents/Manual Library/Projects/takuma-os/tools/mcp/youtube-transcribe

Usage

Basic Transcription

Ask Claude to transcribe any YouTube video:

"Transcribe this video: https://youtube.com/watch?v=..."

With Options

Specify model or keep audio:

"Transcribe this video using the large model and keep the audio file"

Extract Playlist Videos

"Get all video URLs from this playlist: [playlist URL]"
"Get playlist videos with titles and metadata"

Batch Transcribe

"Transcribe all videos from this playlist"
"Transcribe these 5 videos: [URLs]"

Check Dependencies

"Check if YouTube transcription dependencies are installed"

Configure Settings

"Configure YouTube transcribe to use the turbo model"

Available Tools

transcribe_video - Main transcription tool
- url: YouTube video URL (required)
- whisper_model: Model size (tiny/base/small/medium/large/turbo)
- keep_audio: Whether to save the audio file
- save_path: Custom save location
get_playlist_videos - Extract video URLs from a playlist
- playlist_url: YouTube playlist URL (required)
- include_metadata: Include titles and durations (optional)
batch_transcribe_videos - Transcribe multiple videos in sequence
- video_urls: List of YouTube video URLs (required)
- whisper_model: Model to use for all videos
- save_path: Base path for all transcripts
check_youtube_dependencies - Verify all dependencies are installed
configure_youtube_transcribe - Update default settings
get_youtube_config - View current configuration

Whisper Models

tiny: 39M params, fastest, basic accuracy
base: 74M params, good balance (default)
small: 244M params, better accuracy
medium: 769M params, high accuracy
large: 1550M params, best accuracy
turbo: 809M params, fast and accurate

Output Structure

Transcripts are saved to:

knowledge/_inbox/[video-title]/
├── transcript.md      # Formatted transcript
├── timestamps.vtt     # Timestamp file
└── audio.mp3         # (optional) Audio file

Configuration

Default settings in config.json:

{
  "whisper_model": "base",
  "keep_audio": false,
  "audio_format": "mp3",
  "default_save_path": "knowledge/_inbox"
}

Integration with Takuma OS

This MCP server extends Takuma OS's knowledge capture capabilities by:

Automatically organizing video content in the inbox for processing
Preserving metadata for context and attribution
Creating markdown-formatted transcripts ready for knowledge synthesis
Supporting the OS's philosophy of capturing authentic content from external sources

Troubleshooting

Missing dependencies: Run check_youtube_dependencies() to see what's missing
Transcription fails: Check ffmpeg is installed and accessible
Download fails: Verify the YouTube URL is valid and accessible
Out of memory: Use a smaller Whisper model (tiny or base)

Privacy Note

Audio files are processed locally using Whisper
No data is sent to external transcription services
Downloaded content respects YouTube's terms of service via yt-dlp