stex2005/vision-mcp-server
If you are the rightful owner of vision-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Model Context Protocol (MCP) server is a specialized server designed to facilitate communication and data exchange between machine learning models and various client applications.
video-summary-mcp-server
🚀 fastMCP Example: Vision Tools
This server exposes several MCP tools, including:
summarize_video(video_path: string) → summary: string
summarize_rtsp_stream(rtsp_url: string) → summary: string
You can drop this file anywhere and run:
python server.py
and a Model Context Protocol–compatible client (ROS-MCP client, Claude Desktop, ChatGPT MCP) can use it immediately.
📌 Install
pip install -e .
Or install dependencies directly:
pip install fastmcp openai pillow opencv-python
🧠 What this server does
When a client calls:
{
"tool": "summarize_video",
"arguments": { "video_path": "demo.mp4" }
}
fastMCP:
- Loads the video
- Extracts frames every 2 seconds
- Encodes frames as JPEG
- Sends them to GPT-4.1 Vision
- Returns a natural language video summary
Need to summarize a live camera or drone feed instead? Use summarize_rtsp_stream:
{
"tool": "summarize_rtsp_stream",
"arguments": {
"rtsp_url": "rtsp://user:pass@camera-ip:554/stream",
"duration_sec": 20,
"interval_sec": 4
}
}
The tool opens the RTSP stream with OpenCV, samples frames for the requested duration, and sends them to GPT-4.1 Vision for summarization.
This works on any computer with Python, no GPU required.
🧪 How to run
As MCP Server
Start the MCP server:
python server.py
As Standalone CLI
Run directly from command line:
python main.py videos/demo.mp4 [style]
Available styles:
short- Concise summary (default)timeline- Summary with timestampsdetailed- Comprehensive summarytechnical- Technical bullet-point summary
Example:
python main.py videos/demo.mp4 timeline
Then use any MCP client, for example:
✔ ChatGPT MCP (Custom tools)
Add a new connection:
command: python /path/to/server.py
✔ ROS-MCP client
ros_mcp connect video-summarizer python server.py
✔ Claude Desktop (Tools → Add Tool)
Set executable to:
python3 server.py
📁 Project Structure
video-summary-mcp-server/
│
├── server.py # MCP server (fastMCP wrapper)
├── main.py # CLI runner for standalone use
├── summarizer.py # Core summarization logic
├── frame_extractor.py # Video frame extraction
├── encoder.py # JPEG encoding
├── prompt.py # Prompt building for different styles
├── pyproject.toml # Project configuration
└── videos/ # Directory for video files
Module Overview
server.py- MCP server exposingsummarize_videotoolmain.py- Standalone CLI for direct video summarizationsummarizer.py- Core module that orchestrates the summarization pipelineframe_extractor.py- Extracts keyframes from videos at regular intervalsencoder.py- Compresses frames to JPEG for efficient API callsprompt.py- Builds prompts for different summary styles (short, timeline, detailed, technical)
🔑 Environment Setup
Option 1: Using .env file (Recommended)
-
Create a
.envfile in the project root (any text editor is fine). -
Add your OpenAI API key:
OPENAI_API_KEY=your-actual-api-key-here
The .env file is automatically loaded when you run the server.
Option 2: Environment Variable
Set it as an environment variable:
export OPENAI_API_KEY='your-api-key-here'
Get your API key from: https://platform.openai.com/api-keys