gemini-video-mcp-server

adamanz/gemini-video-mcp-server

3.2

If you are the rightful owner of gemini-video-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Gemini Video Understanding MCP Server enables AI coding assistants to analyze and understand video content using Google's Gemini AI, bridging the gap for video content analysis.

Tools
5
Resources
0
Prompts
0

Gemini Video Understanding MCP Server

License: MIT Python 3.10+ MCP Gemini

Give your AI coding assistant the power to understand videos!

An MCP (Model Context Protocol) server that enables Claude Code, Cursor, and other AI coding assistants to analyze and understand video content using Google's Gemini AI. Process security footage, lecture recordings, tutorials, and more - directly from your terminal.

Why This Exists

AI coding assistants like Claude Code and Cursor are incredibly powerful, but they can't natively understand video content. This MCP server bridges that gap by:

  • Using Gemini's industry-leading video understanding (up to 6 hours of video!)
  • Providing a standardized MCP interface that works with any MCP-compatible client
  • Offering smart time estimation so you know what you're getting into before processing
  • Supporting segment analysis for efficient processing of long videos

Features

FeatureDescription
Long Video SupportAnalyze videos up to 6 hours using Gemini 2.5's 2M token context
Smart EstimationGet accurate time/cost estimates before processing
Segment AnalysisAnalyze specific time ranges for faster results
Multiple ModesSummary, detailed analysis, transcript, or timeline
Q&A CapabilityAsk specific questions about video content
User PromptsConfirms before processing long videos

Use Cases

Security & Surveillance

"Analyze this security footage and tell me if anyone approaches the car between 2am and 4am"
"What time does the person in the dark hoodie appear in this footage?"
"Summarize all activity in this 8-hour security recording"

Education & Learning

"Transcribe this 2-hour lecture on machine learning"
"Create a timeline of topics covered in this computer science class"
"What does the professor say about recursion? Include timestamps"

Code Tutorials & Demos

"What VS Code extensions does the instructor install in this tutorial?"
"At what timestamp does the presenter start explaining the API integration?"
"Summarize the debugging techniques shown in this video"

Meeting Recordings

"What action items were discussed in this meeting?"
"Summarize the key decisions made in this product review"
"Who presented the sales figures and what were the highlights?"

Content Analysis

"What products are shown in this unboxing video?"
"Describe the UI/UX changes demonstrated in this app walkthrough"
"What error messages appear in this bug report screen recording?"

Installation

Prerequisites

  1. Python 3.10+
  2. Google AI Studio API Key (free): https://aistudio.google.com/apikey
  3. ffmpeg (optional, for segment analysis):
    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt install ffmpeg
    
    # Windows
    choco install ffmpeg
    

Quick Install

# Clone the repository
git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server

# Create virtual environment and install
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

Using uv (recommended)

git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server
uv venv && source .venv/bin/activate
uv pip install -e .

Configuration

For Claude Code

Add to your Claude Code MCP settings (~/.claude/claude_desktop_config.json or via settings):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

For Cursor

Add to your Cursor MCP configuration (Settings > MCP Servers):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Using with uv (all platforms)

{
  "mcpServers": {
    "gemini-video": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/gemini-video-mcp-server", "python", "server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Available Tools

estimate_video_analysis

Always call this first! Get time and resource estimates before processing.

Parameters:
- video_path (required): Path to the video file
- known_duration_seconds (optional): Video duration if known

Returns: File size, duration, upload time, processing time, token estimate, recommendations

analyze_video

Full video analysis with multiple modes.

Parameters:
- video_path (required): Path to the video file
- mode: "summary" | "detailed" | "transcript" | "timeline" (default: "summary")
- custom_prompt (optional): Custom analysis prompt
- confirm_long_video: Set True for videos over 30 minutes

Modes:
- summary: Quick 2-3 paragraph overview
- detailed: Comprehensive scene-by-scene analysis
- transcript: Extract and transcribe speech
- timeline: Timestamped list of events

analyze_video_segment

Analyze a specific time range (requires ffmpeg).

Parameters:
- video_path (required): Path to the video file
- start_time (required): Start time ("HH:MM:SS", "MM:SS", or seconds)
- end_time (required): End time (same formats)
- prompt (optional): What to analyze

ask_video_question

Ask specific questions about video content.

Parameters:
- video_path (required): Path to the video file
- question (required): Your question
- provide_timestamps: Include timestamps in answer (default: true)

list_supported_formats

List all supported video formats and limits.

Example Workflow

You: Analyze this security camera footage: /path/to/footage.mp4

Claude: Let me first estimate the analysis time...

[Calls estimate_video_analysis]

This video is 2 hours and 15 minutes long. Full analysis will take approximately 25 minutes.

Would you like to:
1. Analyze the entire video
2. Analyze specific time ranges (faster)
3. Get a quick summary only

You: Just analyze from 2:00:00 to 2:30:00

Claude: [Calls analyze_video_segment with start_time="2:00:00", end_time="2:30:00"]

Here's what I found in that 30-minute segment...

Processing Time Estimates

Video LengthUpload TimeProcessingTotal
5 minutes~30s~1 min~1.5 min
30 minutes~2 min~3 min~5 min
1 hour~5 min~6 min~11 min
3 hours~15 min~18 min~33 min
6 hours~30 min~36 min~66 min

Actual times vary based on file size and network speed.

Supported Formats

  • Video: MP4 (recommended), MPEG, MOV, AVI, FLV, WebM, WMV, 3GP, MPG
  • Max Duration: 6 hours (Gemini 2.5 Pro)
  • Max File Size: 2GB per file

Troubleshooting

"GEMINI_API_KEY environment variable is not set"

Ensure the API key is in your MCP server configuration's env block.

"ffmpeg not found"

Install ffmpeg for segment analysis. Full video analysis works without it.

"Video processing failed"

  • Check the video file isn't corrupted
  • Ensure format is supported
  • Try a smaller segment first

Slow processing

  • Use estimate_video_analysis to set expectations
  • Use analyze_video_segment for specific sections
  • Check your internet upload speed

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
GEMINI_API_KEY=your-key python test_server.py

# Run the server directly
GEMINI_API_KEY=your-key python server.py

How It Works

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Claude Code    │     │   MCP Server    │     │   Gemini API    │
│  Cursor/etc     │────▶│  (this repo)    │────▶│  (Google)       │
└─────────────────┘     └─────────────────┘     └─────────────────┘
        │                       │                       │
        │   "Analyze video"     │   Upload & Process    │
        │──────────────────────▶│──────────────────────▶│
        │                       │                       │
        │   Text description    │   Video understanding │
        │◀──────────────────────│◀──────────────────────│
  1. Your AI assistant receives a request about a video
  2. It calls this MCP server with the video path
  3. The server uploads the video to Gemini API
  4. Gemini processes the video (1 frame/second, ~66 tokens/frame)
  5. The analysis is returned as text your assistant can understand

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see file for details.

Acknowledgments


Made with love to give AI coding assistants superpowers