mcp-ui-analyzer by manhhaycode - MCP Server

MCP UI Screenshot Analyzer

An MCP (Model Context Protocol) server that integrates with GitHub Copilot to provide AI-powered UI analysis using local models only (zero API costs, privacy-first design).

Features

UI Screenshot Analysis: Semantic understanding of UI layouts and structure
Color Palette Extraction: Extract dominant colors using OpenCV k-means
Text Extraction: OCR capabilities via Gemma 3 vision
Bug Detection: Identify layout issues and accessibility problems
Depth Levels: Configurable analysis depth (quick/standard/deep)
Smart Caching: Performance optimization for repeated analyses

Current Status

Week 1 MVP - COMPLETE ✓

✅ Gemma 3 12B vision integration via Ollama
✅ OpenCV color extraction
✅ Result caching (1-hour TTL)
✅ All 6 MCP tools implemented (4 fully functional)
✅ Error handling and validation
⏳ YOLOv8 component detection (Week 2)
⏳ Code generation (Week 3)

Quick Start

Prerequisites

macOS or Linux
Python 3.10+
8GB+ RAM (16GB recommended)
Ollama installed

Installation

# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull Gemma 3 12B model
ollama pull gemma3:12b

# 3. Activate virtual environment (already created)
source venv/bin/activate

# 4. Run the MCP server
python server.py

Verify Installation

# Check Ollama is running
pgrep -x "ollama"

# Check models are available
ollama list

# Test server initialization
python -c "from server import mcp, gemma_analyzer; print('Server ready!')"

MCP Tools

1. analyze_ui_screenshot

Main analysis tool with configurable depth levels.

Parameters:

image_path (str): Absolute path to screenshot
depth (str): "quick" | "standard" | "deep"

Depth Levels:

quick: Gemma 3 only (2-4s) - Basic description
standard: Gemma 3 + colors (5-8s) - Detailed analysis
deep: Full pipeline (12-18s) - Comprehensive analysis

Example:

# Via GitHub Copilot Chat:
Analyze the UI screenshot at /Users/me/screenshot.png

# Programmatic:
result = analyze_ui_screenshot("/path/to/screenshot.png", depth="standard")

2. extract_color_palette

Extract dominant colors using OpenCV k-means clustering.

Parameters:

image_path (str): Absolute path to image
n_colors (int): Number of colors (2-10, default: 5)

Returns: Color palette with hex codes, RGB values, and percentages

3. extract_ui_text

Extract text from UI using Gemma 3 OCR capabilities.

Parameters:

image_path (str): Absolute path to screenshot

Returns: List of extracted text elements

4. detect_ui_bugs

Detect layout issues and accessibility problems.

Parameters:

image_path (str): Absolute path to screenshot

Returns: List of issues with severity and suggestions

5. detect_ui_components

Coming in Week 2 - YOLOv8 integration

6. generate_component_code

Coming in Week 3 - Code generation

GitHub Copilot Integration

Configuration

Create .vscode/mcp-settings.json:

{
  "mcpServers": {
    "ui-analyzer": {
      "command": "python",
      "args": ["/Users/manhhaycode/Developer/image-analysis/server.py"],
      "env": {}
    }
  }
}

Usage in Copilot Chat

Analyze the UI screenshot at /path/to/screenshot.png

Extract colors from /path/to/design.png

Detect bugs in ~/Desktop/app-screenshot.png

Performance

Operation	Time (CPU)	Time (GPU)	Cached
Quick analysis	2-4s	1-2s	<1s
Standard analysis	5-8s	2-3s	<1s
Deep analysis	12-18s	4-6s	<1s
Color extraction	<1s	<0.5s	<0.1s

Cache: Results cached for 1 hour, automatic invalidation

Project Structure

image-analysis/
├── server.py                  # MCP server entry point
├── config.yaml               # Configuration
├── analyzers/
│   ├── gemma_analyzer.py     # Ollama Gemma 3 integration
│   ├── color_extractor.py    # OpenCV color extraction
│   └── __init__.py
├── orchestrator/
│   ├── cache.py              # Result caching
│   └── __init__.py
├── tests/
│   ├── fixtures/             # Sample screenshots
│   └── __init__.py
├── utils/
│   └── __init__.py
└── venv/                     # Virtual environment

Configuration

Edit config.yaml to customize:

vision:
  model: "gemma3:12b"           # Primary model
  fallback: "gemma3:2b"          # Low RAM fallback

performance:
  enable_caching: true
  cache_ttl_seconds: 3600       # 1 hour

color_extraction:
  default_n_colors: 5

Troubleshooting

Server fails to start:

# Verify Ollama is running
pgrep -x "ollama" || ollama serve &

# Check Gemma 3 model
ollama list | grep gemma3:12b

# Reinstall dependencies
pip install -r requirements.txt

Out of memory:

# Use quantized model (6.6GB vs 9GB)
ollama pull gemma3:12b-q4

# Edit config.yaml:
vision:
  model: "gemma3:12b-q4"

Image not found errors:

Always use absolute paths
Verify file exists: ls -la /path/to/image.png
Check file permissions

Development

Running Tests

# (Tests to be implemented)
python -m pytest tests/

Adding New Tools

Implement analyzer in analyzers/
Add MCP tool decorator in server.py
Integrate caching
Update documentation

Roadmap

Week 1 (COMPLETE): MVP with Gemma 3 + color extraction + caching ✓
Week 2: YOLOv8 component detection
Week 3-4: Code generation, comprehensive testing, documentation

Performance Optimization

The system includes several optimizations:

Smart Caching: MD5-based image hashing with 1-hour TTL
Depth Levels: User-controlled trade-off between speed and detail
Lazy Loading: Components loaded only when needed
Error Recovery: Graceful degradation if optional features fail

Hardware Requirements

Minimum: 8GB RAM, CPU only (using quantized model)
Recommended: 16GB RAM, any GPU
Optimal: 32GB RAM, GPU with 8GB+ VRAM

License

MIT License

Contributing

Contributions welcome! Please open issues or PRs on GitHub.

Support

For issues or questions:

Check for detailed documentation
Review troubleshooting section above
Open a GitHub issue