vision-mcp-server by stex2005 - MCP Server

video-summary-mcp-server

🚀 fastMCP Example: Vision Tools

This server exposes several MCP tools, including:

summarize_video(video_path: string) → summary: string
summarize_rtsp_stream(rtsp_url: string) → summary: string

You can drop this file anywhere and run:

python server.py

and a Model Context Protocol–compatible client (ROS-MCP client, Claude Desktop, ChatGPT MCP) can use it immediately.

📌 Install

pip install -e .

Or install dependencies directly:

pip install fastmcp openai pillow opencv-python

🧠 What this server does

When a client calls:

{
  "tool": "summarize_video",
  "arguments": { "video_path": "demo.mp4" }
}

fastMCP:

Loads the video
Extracts frames every 2 seconds
Encodes frames as JPEG
Sends them to GPT-4.1 Vision
Returns a natural language video summary

Need to summarize a live camera or drone feed instead? Use summarize_rtsp_stream:

{
  "tool": "summarize_rtsp_stream",
  "arguments": {
    "rtsp_url": "rtsp://user:pass@camera-ip:554/stream",
    "duration_sec": 20,
    "interval_sec": 4
  }
}

The tool opens the RTSP stream with OpenCV, samples frames for the requested duration, and sends them to GPT-4.1 Vision for summarization.

This works on any computer with Python, no GPU required.

🧪 How to run

As MCP Server

Start the MCP server:

python server.py

As Standalone CLI

Run directly from command line:

python main.py videos/demo.mp4 [style]

Available styles:

short - Concise summary (default)
timeline - Summary with timestamps
detailed - Comprehensive summary
technical - Technical bullet-point summary

Example:

python main.py videos/demo.mp4 timeline

Then use any MCP client, for example:

✔ ChatGPT MCP (Custom tools)

Add a new connection:

command: python /path/to/server.py

✔ ROS-MCP client

ros_mcp connect video-summarizer python server.py

✔ Claude Desktop (Tools → Add Tool)

Set executable to:

python3 server.py

📁 Project Structure

video-summary-mcp-server/
│
├── server.py              # MCP server (fastMCP wrapper)
├── main.py                # CLI runner for standalone use
├── summarizer.py          # Core summarization logic
├── frame_extractor.py     # Video frame extraction
├── encoder.py             # JPEG encoding
├── prompt.py              # Prompt building for different styles
├── pyproject.toml         # Project configuration
└── videos/                # Directory for video files

Module Overview

server.py - MCP server exposing summarize_video tool
main.py - Standalone CLI for direct video summarization
summarizer.py - Core module that orchestrates the summarization pipeline
frame_extractor.py - Extracts keyframes from videos at regular intervals
encoder.py - Compresses frames to JPEG for efficient API calls
prompt.py - Builds prompts for different summary styles (short, timeline, detailed, technical)

🔑 Environment Setup

Option 1: Using .env file (Recommended)

Create a .env file in the project root (any text editor is fine).
Add your OpenAI API key:

OPENAI_API_KEY=your-actual-api-key-here

The .env file is automatically loaded when you run the server.

Option 2: Environment Variable

Set it as an environment variable:

export OPENAI_API_KEY='your-api-key-here'

Get your API key from: https://platform.openai.com/api-keys