antigravity-youtube-rag

mufradhossain/antigravity-youtube-rag

3.1

If you are the rightful owner of antigravity-youtube-rag and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Model Context Protocol (MCP) server is designed to retrieve YouTube video transcripts, index them in a vector database, and enable semantic search over individual videos or across an entire indexed collection.

Tools
4
Resources
0
Prompts
0

🎥 YouTube Transcript RAG System

100% AI-Generated | Built entirely by Antigravity (Google's Gemini 3 Pro IDE) with zero human-written code

An MCP (Model Context Protocol) server that retrieves YouTube video transcripts, indexes them in a vector database (ChromaDB), and enables semantic search over individual videos or across your entire indexed collection.

Docker Python MCP AI Generated


✨ Features

  • 🎯 Fetch & Index: Automatically download YouTube transcripts, chunk them by time, and store embeddings
  • 🔍 Semantic Search: Query specific videos or your entire collection using natural language
  • 💾 Persistence: Data saved to disk and survives container restarts
  • 🐳 Dockerized: Easy deployment with Docker Compose
  • 🤖 MCP Integration: Works seamlessly with Claude Desktop and other MCP clients
  • ⚡ Local Embeddings: Uses all-MiniLM-L6-v2 for fast, offline semantic search

🚀 Quick Start

Prerequisites

  • Docker and Docker Compose installed
  • Python 3.11+ (for model setup)

1️⃣ Download the Embedding Model

Before first run, download the model locally:

python setup_model.py

This downloads the all-MiniLM-L6-v2 model (~100MB) to the models/ directory.

2️⃣ Start the Server

docker-compose up --build

Wait for the log message:

Starting YouTube Transcript RAG MCP Server...

3️⃣ Connect to Claude Desktop

  1. Locate your Claude Desktop config file:

    • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
  2. Add this configuration:

{
  "mcpServers": {
    "youtube-rag": {
      "command": "docker",
      "args": [
        "exec",
        "-i",
        "youtube-rag-mcp",
        "python",
        "stdio_launcher.py"
      ]
    }
  }
}
  1. Restart Claude Desktop

🛠️ Available MCP Tools

fetch_and_index_transcript

Retrieves and indexes a YouTube video's transcript.

Input:

  • youtube_url (string): Full YouTube video URL

Output: JSON with status and chunk count

Example:

Please fetch and index this video: https://www.youtube.com/watch?v=T-D1KVIuvjA

query_video

Semantic search within a specific video's transcript.

Input:

  • youtube_url (string): Target video URL
  • question (string): Search query
  • top_k (int, default=3): Number of results to return

Output: JSON with relevant transcript chunks and timestamps

Example:

Query the video https://www.youtube.com/watch?v=T-D1KVIuvjA with the question "What is Python used for?"

query_all_videos

Semantic search across all indexed video transcripts.

Input:

  • question (string): Search query
  • top_k (int, default=5): Number of results to return

Output: JSON with relevant results from any indexed video

Example:

Search all videos for "machine learning concepts"

list_indexed_videos

Show all videos currently in the database.

Input: None

Output: JSON list of videos with metadata

Example:

List all indexed videos

🔧 Configuration

Environment variables in .env:

VariableDescriptionDefault
CHROMA_PERSIST_DIRECTORYPath to ChromaDB data/app/chroma-data
EMBEDDING_MODELSentence Transformer model path/app/models/all-MiniLM-L6-v2
LOG_LEVELLogging levelINFO
CHUNK_TARGET_DURATIONTarget chunk duration (seconds)150
CHUNK_OVERLAPChunk overlap (seconds)30
DEFAULT_TOP_KDefault number of results3

🔍 Database Inspection

Use the included wrapper script to inspect your database:

# Show statistics
python inspect_docker.py stats

# List all indexed videos
python inspect_docker.py list

# Inspect a specific video
python inspect_docker.py inspect VIDEO_ID --full

📁 Project Structure

youtube-transcript-rag/
├── mcp-server/              # Main application directory
│   ├── services/            # Core services (ChromaDB, embeddings, chunking)
│   ├── tools/               # MCP tool implementations
│   ├── utils/               # Utility functions
│   ├── server.py            # FastMCP server
│   ├── stdio_launcher.py    # Claude Desktop integration
│   ├── db_inspector.py      # Database inspection tool
│   └── Dockerfile           # Container definition
├── docker-compose.yml       # Docker Compose configuration
├── setup_model.py           # Model download script
├── inspect_docker.py        # Database inspection wrapper
└── .env                     # Environment variables

🎬 Demo Workflow

See for a complete demonstration flow including:

  • Indexing multiple videos
  • Querying specific videos
  • Cross-video semantic search
  • Persistence verification

🐛 Troubleshooting

No transcript available

Some videos do not have captions/transcripts. The tool will return an error in this case.

Docker build is slow

The first build downloads the embedding model (~100MB) and dependencies. Subsequent builds use Docker's cache.

Container not found

Make sure the container is running:

docker ps | grep youtube-rag-mcp

If not running, start it:

docker-compose up -d

🤖 About This Project

This entire project was generated by Antigravity, Google's new AI-powered IDE running Gemini 3 Pro.

The Challenge: Create a production-ready MCP server for YouTube transcript RAG

The Result: A fully functional, dockerized, production-ready application with:

  • ✅ Clean architecture with proper separation of concerns
  • ✅ Comprehensive error handling
  • ✅ Docker containerization
  • ✅ MCP integration
  • ✅ Database inspection tools
  • ✅ Complete documentation

Lines of human-written code: 0

This demonstrates the power of AI-assisted development and the capabilities of modern AI coding assistants.


📝 License

MIT License - Feel free to use this project as a template or learning resource!


🙏 Acknowledgments


Made with ✨ by Antigravity (Gemini 3 Pro)