Skirja/vision-mcp
If you are the rightful owner of vision-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Vision MCP Server enhances text-only coding models with visual understanding capabilities, supporting OpenAI GPT-4V and compatible vision APIs.
Vision MCP Server
MCP server that adds visual understanding capabilities to text-only coding models. Supports OpenAI GPT-4V and compatible vision APIs.
Features
- Tools: analyze_image, extract_ui_components, generate_layout_description, analyze_visual_hierarchy
- Resources: vision-model-info, analysis-templates
- Prompts: code-generation-prompt
- File Support: Local files (relative/absolute paths), file:// URLs, http(s) URLs, data URIs
Installation & Setup
Requirements
- Node.js >= 18
Environment Configuration
# Required: OpenAI API Key (or compatible provider)
OPENAI_API_KEY=your_api_key_here
# Optional: Custom API endpoint (defaults to OpenAI)
API_BASE_URL=https://api.openai.com/v1
# Optional: Vision model (defaults to gpt-4o)
VISION_MODEL=gpt-4o
Provider Examples
OpenAI (Default):
OPENAI_API_KEY=sk-...
# API_BASE_URL=https://api.openai.com/v1 (default)
# VISION_MODEL=gpt-4o (default)
Chutes AI:
OPENAI_API_KEY=cpk_...
API_BASE_URL=https://llm.chutes.ai/v1
VISION_MODEL=zai-org/GLM-4.5V
Other Compatible Providers:
OPENAI_API_KEY=your_key
API_BASE_URL=https://your-provider.com/v1
VISION_MODEL=your-vision-model
Running the Server
Development and Running
npm install
npm run build
MCP Client Integration
Roo Code (VS Code Extension)
{
"vision-mcp": {
"command": "node",
"args": ["./bin/vision-mcp"],
"type": "stdio",
"cwd": "/path/to/vision-mcp",
"env": {
"OPENAI_API_KEY": "your_api_key_here",
"API_BASE_URL": "https://api.openai.com/v1",
"VISION_MODEL": "gpt-4o"
},
"alwaysAllow": ["analyze_image", "extract_ui_components"]
}
}
Usage Examples
Analyze UI Screenshot
{
"image_url": "./screenshot.png",
"analysis_type": "ui"
}
Extract Components from Design
{
"image_url": "https://example.com/design.jpg",
"component_types": ["button", "input", "card"]
}
Generate Layout Description
{
"image_url": "file:///C:/designs/layout.png",
"framework": "react"
}
Recommended Workflow
For the best results when analyzing UI/UX designs, follow this step-by-step workflow:
Step 1: Initial Analysis
Tool: analyze_image
with analysis_type: "general"
Purpose: Get a comprehensive overview of the image to understand its type, components, and general structure
{
"image_url": "./design.png",
"analysis_type": "general"
}
Step 2: Visual Hierarchy Analysis
Tool: analyze_visual_hierarchy
Purpose: Understand the design system foundations - colors, typography, spacing, and visual weight
{
"image_url": "./design.png"
}
Step 3: Component Extraction
Tool: extract_ui_components
Purpose: Identify specific UI components with detailed specifications for implementation
{
"image_url": "./design.png",
"component_types": ["button", "input", "card"] // Optional: filter specific components
}
Step 4: Layout Generation
Tool: generate_layout_description
with your target framework
Purpose: Generate framework-specific implementation structure
{
"image_url": "./design.png",
"framework": "react" // Options: react, vue, html, tailwind
}
Workflow Benefits
- Progressive Understanding: Each step builds on the previous analysis
- Comprehensive Coverage: From general overview to specific implementation details
- Framework-Ready: Final step provides targeted implementation guidance
- Flexible: Can skip steps if you already have specific requirements
Quick Start Options
- For quick analysis: Use only
analyze_image
with appropriateanalysis_type
- For component extraction: Skip to
extract_ui_components
if you only need components - For layout implementation: Use
generate_layout_description
directly if you have a design
Supported Image Formats
- Local files:
./image.jpg
,C:/path/image.png
,/path/to/image.jpg
- File URLs:
file:///C:/path/image.png
- Web URLs:
https://example.com/image.jpg
- Data URIs:
data:image/jpeg;base64,...
License
MIT