gemini-video-mcp-server by adamanz - MCP Server

Gemini Video Understanding MCP Server

Give your AI coding assistant the power to understand videos!

An MCP (Model Context Protocol) server that enables Claude Code, Cursor, and other AI coding assistants to analyze and understand video content using Google's Gemini AI. Process security footage, lecture recordings, tutorials, and more - directly from your terminal.

Why This Exists

AI coding assistants like Claude Code and Cursor are incredibly powerful, but they can't natively understand video content. This MCP server bridges that gap by:

Using Gemini 3 Flash - latest model with 1M context, 3x faster (up to 6 hours of video!)
Providing a standardized MCP interface that works with any MCP-compatible client
Offering smart time estimation so you know what you're getting into before processing
Supporting segment analysis for efficient processing of long videos

Features

Feature	Description
Long Video Support	Analyze videos up to 6 hours using Gemini 3 Flash's 1M token context
Smart Estimation	Get accurate time/cost estimates before processing
Segment Analysis	Analyze specific time ranges for faster results
Multiple Modes	Summary, detailed analysis, transcript, or timeline
Q&A Capability	Ask specific questions about video content
User Prompts	Confirms before processing long videos

Use Cases

Security & Surveillance

"Analyze this security footage and tell me if anyone approaches the car between 2am and 4am"
"What time does the person in the dark hoodie appear in this footage?"
"Summarize all activity in this 8-hour security recording"

Education & Learning

"Transcribe this 2-hour lecture on machine learning"
"Create a timeline of topics covered in this computer science class"
"What does the professor say about recursion? Include timestamps"

Code Tutorials & Demos

"What VS Code extensions does the instructor install in this tutorial?"
"At what timestamp does the presenter start explaining the API integration?"
"Summarize the debugging techniques shown in this video"

Meeting Recordings

"What action items were discussed in this meeting?"
"Summarize the key decisions made in this product review"
"Who presented the sales figures and what were the highlights?"

Content Analysis

"What products are shown in this unboxing video?"
"Describe the UI/UX changes demonstrated in this app walkthrough"
"What error messages appear in this bug report screen recording?"

Installation

Prerequisites

Python 3.10+
Google AI Studio API Key (free): https://aistudio.google.com/apikey

ffmpeg (optional, for segment analysis):

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows
choco install ffmpeg

Quick Install

# Clone the repository
git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server

# Create virtual environment and install
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -e .

Using uv (recommended)

git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server
uv venv && source .venv/bin/activate
uv pip install -e .

Configuration

For Claude Code

Add to your Claude Code MCP settings (~/.claude/claude_desktop_config.json or via settings):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

For Cursor

Add to your Cursor MCP configuration (Settings > MCP Servers):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

For Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "gemini-video": {
      "command": "python",
      "args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Using with uv (all platforms)

{
  "mcpServers": {
    "gemini-video": {
      "command": "uv",
      "args": ["run", "--directory", "/absolute/path/to/gemini-video-mcp-server", "python", "server.py"],
      "env": {
        "GEMINI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Available Tools

`estimate_video_analysis`

Always call this first! Get time and resource estimates before processing.

Parameters:
- video_path (required): Path to the video file
- known_duration_seconds (optional): Video duration if known

Returns: File size, duration, upload time, processing time, token estimate, recommendations

`analyze_video`

Full video analysis with multiple modes.

Parameters:
- video_path (required): Path to the video file
- mode: "summary" | "detailed" | "transcript" | "timeline" (default: "summary")
- custom_prompt (optional): Custom analysis prompt
- confirm_long_video: Set True for videos over 30 minutes

Modes:
- summary: Quick 2-3 paragraph overview
- detailed: Comprehensive scene-by-scene analysis
- transcript: Extract and transcribe speech
- timeline: Timestamped list of events

`analyze_video_segment`

Analyze a specific time range (requires ffmpeg).

Parameters:
- video_path (required): Path to the video file
- start_time (required): Start time ("HH:MM:SS", "MM:SS", or seconds)
- end_time (required): End time (same formats)
- prompt (optional): What to analyze

`ask_video_question`

Ask specific questions about video content.

Parameters:
- video_path (required): Path to the video file
- question (required): Your question
- provide_timestamps: Include timestamps in answer (default: true)

`list_supported_formats`

List all supported video formats and limits.

Example Workflow

You: Analyze this security camera footage: /path/to/footage.mp4

Claude: Let me first estimate the analysis time...

[Calls estimate_video_analysis]

This video is 2 hours and 15 minutes long. Full analysis will take approximately 25 minutes.

Would you like to:
1. Analyze the entire video
2. Analyze specific time ranges (faster)
3. Get a quick summary only

You: Just analyze from 2:00:00 to 2:30:00

Claude: [Calls analyze_video_segment with start_time="2:00:00", end_time="2:30:00"]

Here's what I found in that 30-minute segment...

Processing Time Estimates

Video Length	Upload Time	Processing	Total
5 minutes	~30s	~1 min	~1.5 min
30 minutes	~2 min	~3 min	~5 min
1 hour	~5 min	~6 min	~11 min
3 hours	~15 min	~18 min	~33 min
6 hours	~30 min	~36 min	~66 min

Actual times vary based on file size and network speed.

Supported Formats

Video: MP4 (recommended), MPEG, MOV, AVI, FLV, WebM, WMV, 3GP, MPG
Max Duration: 6 hours (Gemini 2.5 Pro)
Max File Size: 2GB per file

Troubleshooting

"GEMINI_API_KEY environment variable is not set"

Ensure the API key is in your MCP server configuration's env block.

"ffmpeg not found"

Install ffmpeg for segment analysis. Full video analysis works without it.

"Video processing failed"

Check the video file isn't corrupted
Ensure format is supported
Try a smaller segment first

Slow processing

Use estimate_video_analysis to set expectations
Use analyze_video_segment for specific sections
Check your internet upload speed

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
GEMINI_API_KEY=your-key python test_server.py

# Run the server directly
GEMINI_API_KEY=your-key python server.py

How It Works

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Claude Code    │     │   MCP Server    │     │   Gemini API    │
│  Cursor/etc     │────▶│  (this repo)    │────▶│  (Google)       │
└─────────────────┘     └─────────────────┘     └─────────────────┘
        │                       │                       │
        │   "Analyze video"     │   Upload & Process    │
        │──────────────────────▶│──────────────────────▶│
        │                       │                       │
        │   Text description    │   Video understanding │
        │◀──────────────────────│◀──────────────────────│

Your AI assistant receives a request about a video
It calls this MCP server with the video path
The server uploads the video to Gemini API
Gemini processes the video (1 frame/second, ~66 tokens/frame)
The analysis is returned as text your assistant can understand

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see file for details.

Acknowledgments

Built with FastMCP
Powered by Google Gemini
Follows MCP Specification

Made with love to give AI coding assistants superpowers