vision-mcp-server

stex2005/vision-mcp-server

3.2

If you are the rightful owner of vision-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Model Context Protocol (MCP) server is a specialized server designed to facilitate communication and data exchange between machine learning models and various client applications.

video-summary-mcp-server

🚀 fastMCP Example: Vision Tools

This server exposes several MCP tools, including:

summarize_video(video_path: string) → summary: string
summarize_rtsp_stream(rtsp_url: string) → summary: string

You can drop this file anywhere and run:

python server.py

and a Model Context Protocol–compatible client (ROS-MCP client, Claude Desktop, ChatGPT MCP) can use it immediately.


📌 Install

pip install -e .

Or install dependencies directly:

pip install fastmcp openai pillow opencv-python

🧠 What this server does

When a client calls:

{
  "tool": "summarize_video",
  "arguments": { "video_path": "demo.mp4" }
}

fastMCP:

  1. Loads the video
  2. Extracts frames every 2 seconds
  3. Encodes frames as JPEG
  4. Sends them to GPT-4.1 Vision
  5. Returns a natural language video summary

Need to summarize a live camera or drone feed instead? Use summarize_rtsp_stream:

{
  "tool": "summarize_rtsp_stream",
  "arguments": {
    "rtsp_url": "rtsp://user:pass@camera-ip:554/stream",
    "duration_sec": 20,
    "interval_sec": 4
  }
}

The tool opens the RTSP stream with OpenCV, samples frames for the requested duration, and sends them to GPT-4.1 Vision for summarization.

This works on any computer with Python, no GPU required.


🧪 How to run

As MCP Server

Start the MCP server:

python server.py

As Standalone CLI

Run directly from command line:

python main.py videos/demo.mp4 [style]

Available styles:

  • short - Concise summary (default)
  • timeline - Summary with timestamps
  • detailed - Comprehensive summary
  • technical - Technical bullet-point summary

Example:

python main.py videos/demo.mp4 timeline

Then use any MCP client, for example:

✔ ChatGPT MCP (Custom tools)

Add a new connection:

command: python /path/to/server.py

✔ ROS-MCP client

ros_mcp connect video-summarizer python server.py

✔ Claude Desktop (Tools → Add Tool)

Set executable to:

python3 server.py

📁 Project Structure

video-summary-mcp-server/
│
├── server.py              # MCP server (fastMCP wrapper)
├── main.py                # CLI runner for standalone use
├── summarizer.py          # Core summarization logic
├── frame_extractor.py     # Video frame extraction
├── encoder.py             # JPEG encoding
├── prompt.py              # Prompt building for different styles
├── pyproject.toml         # Project configuration
└── videos/                # Directory for video files

Module Overview

  • server.py - MCP server exposing summarize_video tool
  • main.py - Standalone CLI for direct video summarization
  • summarizer.py - Core module that orchestrates the summarization pipeline
  • frame_extractor.py - Extracts keyframes from videos at regular intervals
  • encoder.py - Compresses frames to JPEG for efficient API calls
  • prompt.py - Builds prompts for different summary styles (short, timeline, detailed, technical)

🔑 Environment Setup

Option 1: Using .env file (Recommended)

  1. Create a .env file in the project root (any text editor is fine).

  2. Add your OpenAI API key:

OPENAI_API_KEY=your-actual-api-key-here

The .env file is automatically loaded when you run the server.

Option 2: Environment Variable

Set it as an environment variable:

export OPENAI_API_KEY='your-api-key-here'

Get your API key from: https://platform.openai.com/api-keys