manhhaycode/mcp-ui-analyzer
If you are the rightful owner of mcp-ui-analyzer and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP UI Screenshot Analyzer is a server that integrates with GitHub Copilot to provide AI-powered UI analysis using local models only, ensuring zero API costs and a privacy-first design.
MCP UI Screenshot Analyzer
An MCP (Model Context Protocol) server that integrates with GitHub Copilot to provide AI-powered UI analysis using local models only (zero API costs, privacy-first design).
Features
- UI Screenshot Analysis: Semantic understanding of UI layouts and structure
- Color Palette Extraction: Extract dominant colors using OpenCV k-means
- Text Extraction: OCR capabilities via Gemma 3 vision
- Bug Detection: Identify layout issues and accessibility problems
- Depth Levels: Configurable analysis depth (quick/standard/deep)
- Smart Caching: Performance optimization for repeated analyses
Current Status
Week 1 MVP - COMPLETE ✓
- ✅ Gemma 3 12B vision integration via Ollama
- ✅ OpenCV color extraction
- ✅ Result caching (1-hour TTL)
- ✅ All 6 MCP tools implemented (4 fully functional)
- ✅ Error handling and validation
- ⏳ YOLOv8 component detection (Week 2)
- ⏳ Code generation (Week 3)
Quick Start
Prerequisites
- macOS or Linux
- Python 3.10+
- 8GB+ RAM (16GB recommended)
- Ollama installed
Installation
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Pull Gemma 3 12B model
ollama pull gemma3:12b
# 3. Activate virtual environment (already created)
source venv/bin/activate
# 4. Run the MCP server
python server.py
Verify Installation
# Check Ollama is running
pgrep -x "ollama"
# Check models are available
ollama list
# Test server initialization
python -c "from server import mcp, gemma_analyzer; print('Server ready!')"
MCP Tools
1. analyze_ui_screenshot
Main analysis tool with configurable depth levels.
Parameters:
image_path(str): Absolute path to screenshotdepth(str): "quick" | "standard" | "deep"
Depth Levels:
- quick: Gemma 3 only (2-4s) - Basic description
- standard: Gemma 3 + colors (5-8s) - Detailed analysis
- deep: Full pipeline (12-18s) - Comprehensive analysis
Example:
# Via GitHub Copilot Chat:
Analyze the UI screenshot at /Users/me/screenshot.png
# Programmatic:
result = analyze_ui_screenshot("/path/to/screenshot.png", depth="standard")
2. extract_color_palette
Extract dominant colors using OpenCV k-means clustering.
Parameters:
image_path(str): Absolute path to imagen_colors(int): Number of colors (2-10, default: 5)
Returns: Color palette with hex codes, RGB values, and percentages
3. extract_ui_text
Extract text from UI using Gemma 3 OCR capabilities.
Parameters:
image_path(str): Absolute path to screenshot
Returns: List of extracted text elements
4. detect_ui_bugs
Detect layout issues and accessibility problems.
Parameters:
image_path(str): Absolute path to screenshot
Returns: List of issues with severity and suggestions
5. detect_ui_components
Coming in Week 2 - YOLOv8 integration
6. generate_component_code
Coming in Week 3 - Code generation
GitHub Copilot Integration
Configuration
Create .vscode/mcp-settings.json:
{
"mcpServers": {
"ui-analyzer": {
"command": "python",
"args": ["/Users/manhhaycode/Developer/image-analysis/server.py"],
"env": {}
}
}
}
Usage in Copilot Chat
Analyze the UI screenshot at /path/to/screenshot.png
Extract colors from /path/to/design.png
Detect bugs in ~/Desktop/app-screenshot.png
Performance
| Operation | Time (CPU) | Time (GPU) | Cached |
|---|---|---|---|
| Quick analysis | 2-4s | 1-2s | <1s |
| Standard analysis | 5-8s | 2-3s | <1s |
| Deep analysis | 12-18s | 4-6s | <1s |
| Color extraction | <1s | <0.5s | <0.1s |
Cache: Results cached for 1 hour, automatic invalidation
Project Structure
image-analysis/
├── server.py # MCP server entry point
├── config.yaml # Configuration
├── analyzers/
│ ├── gemma_analyzer.py # Ollama Gemma 3 integration
│ ├── color_extractor.py # OpenCV color extraction
│ └── __init__.py
├── orchestrator/
│ ├── cache.py # Result caching
│ └── __init__.py
├── tests/
│ ├── fixtures/ # Sample screenshots
│ └── __init__.py
├── utils/
│ └── __init__.py
└── venv/ # Virtual environment
Configuration
Edit config.yaml to customize:
vision:
model: "gemma3:12b" # Primary model
fallback: "gemma3:2b" # Low RAM fallback
performance:
enable_caching: true
cache_ttl_seconds: 3600 # 1 hour
color_extraction:
default_n_colors: 5
Troubleshooting
Server fails to start:
# Verify Ollama is running
pgrep -x "ollama" || ollama serve &
# Check Gemma 3 model
ollama list | grep gemma3:12b
# Reinstall dependencies
pip install -r requirements.txt
Out of memory:
# Use quantized model (6.6GB vs 9GB)
ollama pull gemma3:12b-q4
# Edit config.yaml:
vision:
model: "gemma3:12b-q4"
Image not found errors:
- Always use absolute paths
- Verify file exists:
ls -la /path/to/image.png - Check file permissions
Development
Running Tests
# (Tests to be implemented)
python -m pytest tests/
Adding New Tools
- Implement analyzer in
analyzers/ - Add MCP tool decorator in
server.py - Integrate caching
- Update documentation
Roadmap
- Week 1 (COMPLETE): MVP with Gemma 3 + color extraction + caching ✓
- Week 2: YOLOv8 component detection
- Week 3-4: Code generation, comprehensive testing, documentation
Performance Optimization
The system includes several optimizations:
- Smart Caching: MD5-based image hashing with 1-hour TTL
- Depth Levels: User-controlled trade-off between speed and detail
- Lazy Loading: Components loaded only when needed
- Error Recovery: Graceful degradation if optional features fail
Hardware Requirements
- Minimum: 8GB RAM, CPU only (using quantized model)
- Recommended: 16GB RAM, any GPU
- Optimal: 32GB RAM, GPU with 8GB+ VRAM
License
MIT License
Contributing
Contributions welcome! Please open issues or PRs on GitHub.
Support
For issues or questions:
- Check for detailed documentation
- Review troubleshooting section above
- Open a GitHub issue