qwen-video-mcp-server by adamanz - MCP Server

Qwen Video Understanding MCP Server

An MCP (Model Context Protocol) server that enables Claude and other AI agents to analyze videos and images using Qwen3-VL deployed on Modal.

Highlights

Hours-long video support with full recall
Timestamp grounding - second-level precision
256K context (expandable to 1M)
32-language OCR support
Free/self-hosted on Modal serverless GPU

Features

Video Analysis: Analyze videos via URL with custom prompts
Image Analysis: Analyze images via URL
Video Summarization: Generate brief, standard, or detailed summaries
Text Extraction: Extract on-screen text and transcribe speech
Video Q&A: Ask specific questions about video content
Frame Comparison: Analyze changes and progression in videos

Architecture

Claude/Agent → MCP Server → Modal API → Qwen3-VL (GPU)

The MCP server acts as a bridge between Claude and your Qwen2.5-VL model deployed on Modal's serverless GPU infrastructure.

Prerequisites

Modal Account: Sign up at modal.com
Deployed Qwen Model: Deploy the video understanding model to Modal (see below)
Python 3.10+

Quick Start

1. Deploy the Model to Modal (if not already done)

cd ~/qwen-video-modal
modal deploy qwen_video.py

2. Install the MCP Server

cd ~/qwen-video-mcp-server
pip install -e .

Or with uv:

uv pip install -e .

3. Configure Environment

cp .env.example .env
# Edit .env with your Modal workspace name

4. Add to Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "qwen-video": {
      "command": "uv",
      "args": [
        "--directory",
        "/Users/adamanz/qwen-video-mcp-server",
        "run",
        "server.py"
      ],
      "env": {
        "MODAL_WORKSPACE": "adam-31541",
        "MODAL_APP": "qwen-video-understanding"
      }
    }
  }
}

5. Restart Claude Desktop

The qwen-video tools should now be available.

Available Tools

`analyze_video`

Analyze a video with a custom prompt.

analyze_video(
  video_url="https://example.com/video.mp4",
  question="What happens in this video?",
  max_frames=16
)

`analyze_image`

Analyze an image with a custom prompt.

analyze_image(
  image_url="https://example.com/image.jpg",
  question="Describe this image"
)

`summarize_video`

Generate a video summary in different styles.

summarize_video(
  video_url="https://example.com/video.mp4",
  style="detailed"  # brief, standard, or detailed
)

`extract_video_text`

Extract text and transcribe speech from a video.

extract_video_text(
  video_url="https://example.com/presentation.mp4"
)

`video_qa`

Ask specific questions about a video.

video_qa(
  video_url="https://example.com/video.mp4",
  question="How many people appear in this video?"
)

`compare_video_frames`

Analyze changes throughout a video.

compare_video_frames(
  video_url="https://example.com/timelapse.mp4",
  comparison_prompt="How does the scene change?"
)

`check_endpoint_status`

Check the Modal endpoint configuration.

`list_capabilities`

List all server capabilities and supported formats.

Configuration

Environment Variable	Description	Default
`MODAL_WORKSPACE`	Your Modal workspace/username	`adam-31541`
`MODAL_APP`	Name of the Modal app	`qwen-video-understanding`
`QWEN_IMAGE_ENDPOINT`	Override image endpoint URL	Auto-generated
`QWEN_VIDEO_ENDPOINT`	Override video endpoint URL	Auto-generated

Supported Formats

Video: mp4, webm, mov, avi, mkv

Image: jpg, jpeg, png, gif, webp, bmp

Limitations

Videos must be accessible via public URL
Maximum 64 frames extracted per video
Recommended video length: under 10 minutes for best results
First request may have cold start delay (Modal serverless)

Cost

The Modal backend uses A100-40GB GPUs:

~$3.30/hour while processing
Scales to zero when idle (no cost)
Only charged for actual processing time

Troubleshooting

"Request timed out"

Video may be too large
Try a shorter video or reduce max_frames

"HTTP error 502/503"

Modal container is starting up (cold start)
Wait a few seconds and retry

"Video URL not accessible"

Ensure the URL is publicly accessible
Check for authentication requirements

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

License

MIT