adamanz/gemini-video-mcp-server
If you are the rightful owner of gemini-video-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Gemini Video Understanding MCP Server enables AI coding assistants to analyze and understand video content using Google's Gemini AI, bridging the gap for video content analysis.
Gemini Video Understanding MCP Server
Give your AI coding assistant the power to understand videos!
An MCP (Model Context Protocol) server that enables Claude Code, Cursor, and other AI coding assistants to analyze and understand video content using Google's Gemini AI. Process security footage, lecture recordings, tutorials, and more - directly from your terminal.
Why This Exists
AI coding assistants like Claude Code and Cursor are incredibly powerful, but they can't natively understand video content. This MCP server bridges that gap by:
- Using Gemini's industry-leading video understanding (up to 6 hours of video!)
- Providing a standardized MCP interface that works with any MCP-compatible client
- Offering smart time estimation so you know what you're getting into before processing
- Supporting segment analysis for efficient processing of long videos
Features
| Feature | Description |
|---|---|
| Long Video Support | Analyze videos up to 6 hours using Gemini 2.5's 2M token context |
| Smart Estimation | Get accurate time/cost estimates before processing |
| Segment Analysis | Analyze specific time ranges for faster results |
| Multiple Modes | Summary, detailed analysis, transcript, or timeline |
| Q&A Capability | Ask specific questions about video content |
| User Prompts | Confirms before processing long videos |
Use Cases
Security & Surveillance
"Analyze this security footage and tell me if anyone approaches the car between 2am and 4am"
"What time does the person in the dark hoodie appear in this footage?"
"Summarize all activity in this 8-hour security recording"
Education & Learning
"Transcribe this 2-hour lecture on machine learning"
"Create a timeline of topics covered in this computer science class"
"What does the professor say about recursion? Include timestamps"
Code Tutorials & Demos
"What VS Code extensions does the instructor install in this tutorial?"
"At what timestamp does the presenter start explaining the API integration?"
"Summarize the debugging techniques shown in this video"
Meeting Recordings
"What action items were discussed in this meeting?"
"Summarize the key decisions made in this product review"
"Who presented the sales figures and what were the highlights?"
Content Analysis
"What products are shown in this unboxing video?"
"Describe the UI/UX changes demonstrated in this app walkthrough"
"What error messages appear in this bug report screen recording?"
Installation
Prerequisites
- Python 3.10+
- Google AI Studio API Key (free): https://aistudio.google.com/apikey
- ffmpeg (optional, for segment analysis):
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg # Windows choco install ffmpeg
Quick Install
# Clone the repository
git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server
# Create virtual environment and install
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -e .
Using uv (recommended)
git clone https://github.com/yourusername/gemini-video-mcp-server.git
cd gemini-video-mcp-server
uv venv && source .venv/bin/activate
uv pip install -e .
Configuration
For Claude Code
Add to your Claude Code MCP settings (~/.claude/claude_desktop_config.json or via settings):
{
"mcpServers": {
"gemini-video": {
"command": "python",
"args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
For Cursor
Add to your Cursor MCP configuration (Settings > MCP Servers):
{
"mcpServers": {
"gemini-video": {
"command": "python",
"args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
For Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"gemini-video": {
"command": "python",
"args": ["/absolute/path/to/gemini-video-mcp-server/server.py"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
Using with uv (all platforms)
{
"mcpServers": {
"gemini-video": {
"command": "uv",
"args": ["run", "--directory", "/absolute/path/to/gemini-video-mcp-server", "python", "server.py"],
"env": {
"GEMINI_API_KEY": "your-api-key-here"
}
}
}
}
Available Tools
estimate_video_analysis
Always call this first! Get time and resource estimates before processing.
Parameters:
- video_path (required): Path to the video file
- known_duration_seconds (optional): Video duration if known
Returns: File size, duration, upload time, processing time, token estimate, recommendations
analyze_video
Full video analysis with multiple modes.
Parameters:
- video_path (required): Path to the video file
- mode: "summary" | "detailed" | "transcript" | "timeline" (default: "summary")
- custom_prompt (optional): Custom analysis prompt
- confirm_long_video: Set True for videos over 30 minutes
Modes:
- summary: Quick 2-3 paragraph overview
- detailed: Comprehensive scene-by-scene analysis
- transcript: Extract and transcribe speech
- timeline: Timestamped list of events
analyze_video_segment
Analyze a specific time range (requires ffmpeg).
Parameters:
- video_path (required): Path to the video file
- start_time (required): Start time ("HH:MM:SS", "MM:SS", or seconds)
- end_time (required): End time (same formats)
- prompt (optional): What to analyze
ask_video_question
Ask specific questions about video content.
Parameters:
- video_path (required): Path to the video file
- question (required): Your question
- provide_timestamps: Include timestamps in answer (default: true)
list_supported_formats
List all supported video formats and limits.
Example Workflow
You: Analyze this security camera footage: /path/to/footage.mp4
Claude: Let me first estimate the analysis time...
[Calls estimate_video_analysis]
This video is 2 hours and 15 minutes long. Full analysis will take approximately 25 minutes.
Would you like to:
1. Analyze the entire video
2. Analyze specific time ranges (faster)
3. Get a quick summary only
You: Just analyze from 2:00:00 to 2:30:00
Claude: [Calls analyze_video_segment with start_time="2:00:00", end_time="2:30:00"]
Here's what I found in that 30-minute segment...
Processing Time Estimates
| Video Length | Upload Time | Processing | Total |
|---|---|---|---|
| 5 minutes | ~30s | ~1 min | ~1.5 min |
| 30 minutes | ~2 min | ~3 min | ~5 min |
| 1 hour | ~5 min | ~6 min | ~11 min |
| 3 hours | ~15 min | ~18 min | ~33 min |
| 6 hours | ~30 min | ~36 min | ~66 min |
Actual times vary based on file size and network speed.
Supported Formats
- Video: MP4 (recommended), MPEG, MOV, AVI, FLV, WebM, WMV, 3GP, MPG
- Max Duration: 6 hours (Gemini 2.5 Pro)
- Max File Size: 2GB per file
Troubleshooting
"GEMINI_API_KEY environment variable is not set"
Ensure the API key is in your MCP server configuration's env block.
"ffmpeg not found"
Install ffmpeg for segment analysis. Full video analysis works without it.
"Video processing failed"
- Check the video file isn't corrupted
- Ensure format is supported
- Try a smaller segment first
Slow processing
- Use
estimate_video_analysisto set expectations - Use
analyze_video_segmentfor specific sections - Check your internet upload speed
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
GEMINI_API_KEY=your-key python test_server.py
# Run the server directly
GEMINI_API_KEY=your-key python server.py
How It Works
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Claude Code │ │ MCP Server │ │ Gemini API │
│ Cursor/etc │────▶│ (this repo) │────▶│ (Google) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ "Analyze video" │ Upload & Process │
│──────────────────────▶│──────────────────────▶│
│ │ │
│ Text description │ Video understanding │
│◀──────────────────────│◀──────────────────────│
- Your AI assistant receives a request about a video
- It calls this MCP server with the video path
- The server uploads the video to Gemini API
- Gemini processes the video (1 frame/second, ~66 tokens/frame)
- The analysis is returned as text your assistant can understand
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see file for details.
Acknowledgments
- Built with FastMCP
- Powered by Google Gemini
- Follows MCP Specification
Made with love to give AI coding assistants superpowers