Digital-Defiance/mcp-screenshot
If you are the rightful owner of mcp-screenshot and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The MCP Screenshot Server is an enterprise-grade server that provides screenshot capture capabilities for AI agents through the Model Context Protocol (MCP).
📸 MCP ACS Screenshot Server
Give AI agents visual superpowers to see, analyze, and document your applications like senior UX designers.
This enterprise-grade MCP server transforms AI from code-only assistants into visual experts capable of UI analysis, accessibility auditing, documentation generation, and responsive design testing.
🔗 Repository
This package is now maintained in its own repository: https://github.com/Digital-Defiance/mcp-screenshot
This repository is part of the AI Capabilitites Suite on GitHub.
🤔 Why Do AI Agents Need Visual Capabilities?
AI agents today are powerful but visually blind:
- ❌ Can read HTML/CSS but can't see actual layouts
- ❌ Can suggest UI improvements without seeing the real user experience
- ❌ Can't detect accessibility issues like poor contrast or spacing
- ❌ Can't create visual documentation or bug reports
- ❌ Can't analyze responsive design across different screen sizes
Result: You're stuck manually creating screenshots, documentation, and visual analysis that AI should handle.
🎯 Revolutionary Use Cases
📚 "AI, create professional documentation"
You: "Document the checkout process with screenshots"
AI: *Captures each step, analyzes UI flow*
AI: "Generated complete user guide with 6 annotated screenshots, highlighting key interactions and error states..."
🔍 "AI, audit this page for accessibility"
You: "Check if this form meets WCAG guidelines"
AI: *Captures form, analyzes visual elements*
AI: "Found 4 accessibility issues: contrast ratio 2.1:1 (needs 4.5:1), missing focus indicators, touch targets too small..."
🐛 "AI, create a detailed bug report"
You: "The mobile navigation is broken"
AI: *Captures different viewport sizes*
AI: "Created bug report with before/after screenshots, exact breakpoint where it fails, and CSS fix suggestions..."
🎨 "AI, compare these design variations"
You: "Which hero section design works better?"
AI: *Captures both versions, analyzes visual hierarchy*
AI: "Version B has 28% better visual flow—CTA more prominent, text hierarchy clearer, better use of whitespace..."
📱 "AI, test responsive design"
You: "How does this look on different screen sizes?"
AI: *Captures multiple viewport sizes*
AI: "Layout breaks at 768px—sidebar overlaps content. Here's the media query fix with visual proof..."
✨ What This Changes
Before: AI worked blind, relying on code descriptions
- ❌ "The button looks wrong" → AI guesses the issue
- ❌ "Create documentation" → AI writes generic text
- ❌ "Check accessibility" → AI only reviews code
- ❌ "Test responsive design" → AI can't see actual breakpoints
After: AI sees and analyzes your actual user interface
- ✅ Visual debugging - AI identifies exact pixel-level issues
- ✅ Smart documentation - AI creates guides with real screenshots and annotations
- ✅ Accessibility audits - AI measures actual contrast ratios and spacing
- ✅ Responsive testing - AI captures and compares different screen sizes
- ✅ Design analysis - AI evaluates visual hierarchy and user experience
- ✅ Professional reports - AI creates detailed visual evidence for bugs and improvements
🚀 Features
- Multi-format Support: PNG, JPEG, WebP, BMP with configurable quality
- Flexible Capture: Full screen, specific windows, or custom regions
- Privacy Protection: PII masking with OCR-based detection for emails, phone numbers, and credit cards
- Security Controls: Path validation, rate limiting, audit logging, and configurable policies
- Cross-platform: Linux (X11/Wayland), macOS, Windows with native APIs
- Multi-monitor Support: Capture from specific displays in multi-monitor setups
- Enterprise Security: Window exclusion, audit logging, rate limiting
- AI-Optimized: Structured responses perfect for AI agent workflows
Installation
NPM Installation
npm install @ai-capabilities-suite/mcp-screenshot
System Requirements
Linux:
- X11:
imagemagickpackage (providesimportcommand) - Wayland:
grimpackage
# Ubuntu/Debian
sudo apt-get install imagemagick grim
# Fedora
sudo dnf install ImageMagick grim
# Arch
sudo pacman -S imagemagick grim
macOS:
- Built-in
screencapturecommand (no additional dependencies) - Screen Recording permission required (System Preferences > Security & Privacy > Privacy > Screen Recording)
Windows:
- No additional dependencies required
MCP Configuration
Add to your MCP settings file (e.g., ~/.kiro/settings/mcp.json or .kiro/settings/mcp.json):
{
"mcpServers": {
"screenshot": {
"command": "node",
"args": ["/path/to/mcp-screenshot/dist/cli.js"],
"env": {
"SCREENSHOT_ALLOWED_DIRS": "/home/user/screenshots,/tmp",
"SCREENSHOT_MAX_CAPTURES_PER_MIN": "60",
"SCREENSHOT_ENABLE_AUDIT_LOG": "true"
}
}
}
}
🛠️ 5 Professional MCP Tools
Purpose-built for AI agents to capture, analyze, and work with visual information:
The server exposes 5 comprehensive MCP tools that enable AI agents to see and understand your applications:
1. screenshot_capture_full
Capture full screen or specific display.
Parameters:
display(string, optional): Display ID to capture (defaults to primary display)format(string, optional): Image format -png,jpeg,webp, orbmp(default:png)quality(number, optional): Compression quality 1-100 for lossy formats (default: 90)savePath(string, optional): File path to save screenshot (returns base64 if not provided)enablePIIMasking(boolean, optional): Enable PII detection and masking (default: false)
Example:
{
"name": "screenshot_capture_full",
"arguments": {
"format": "png",
"savePath": "/home/user/screenshots/desktop.png",
"enablePIIMasking": true
}
}
Response:
{
"status": "success",
"filePath": "/home/user/screenshots/desktop.png",
"metadata": {
"width": 1920,
"height": 1080,
"format": "png",
"fileSize": 245678,
"timestamp": "2024-12-01T10:30:00.000Z",
"display": {
"id": "0",
"name": "Primary Display",
"resolution": { "width": 1920, "height": 1080 },
"position": { "x": 0, "y": 0 },
"isPrimary": true
},
"piiMasking": {
"emailsRedacted": 2,
"phonesRedacted": 1,
"creditCardsRedacted": 0,
"customPatternsRedacted": 0
}
}
}
2. screenshot_capture_window
Capture specific application window by ID or title pattern.
Parameters:
windowId(string, optional): Window identifier (usewindowIdorwindowTitle)windowTitle(string, optional): Window title pattern to match (usewindowIdorwindowTitle)includeFrame(boolean, optional): Include window frame and title bar (default: false)format(string, optional): Image format (default:png)quality(number, optional): Compression quality 1-100 (default: 90)savePath(string, optional): File path to save screenshot
Example:
{
"name": "screenshot_capture_window",
"arguments": {
"windowTitle": "Chrome",
"includeFrame": false,
"format": "jpeg",
"quality": 85
}
}
Response:
{
"status": "success",
"data": "iVBORw0KGgoAAAANSUhEUgAA...",
"mimeType": "image/jpeg",
"metadata": {
"width": 1280,
"height": 720,
"format": "jpeg",
"fileSize": 89234,
"timestamp": "2024-12-01T10:31:00.000Z",
"window": {
"id": "12345",
"title": "Google Chrome",
"processName": "chrome",
"pid": 5678,
"bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 }
}
}
}
3. screenshot_capture_region
Capture specific rectangular region of the screen.
Parameters:
x(number, required): X coordinate of top-left cornery(number, required): Y coordinate of top-left cornerwidth(number, required): Width of region in pixelsheight(number, required): Height of region in pixelsformat(string, optional): Image format (default:png)quality(number, optional): Compression quality 1-100 (default: 90)savePath(string, optional): File path to save screenshot
Example:
{
"name": "screenshot_capture_region",
"arguments": {
"x": 100,
"y": 100,
"width": 800,
"height": 600,
"format": "png"
}
}
Response:
{
"status": "success",
"data": "iVBORw0KGgoAAAANSUhEUgAA...",
"mimeType": "image/png",
"metadata": {
"width": 800,
"height": 600,
"format": "png",
"fileSize": 123456,
"timestamp": "2024-12-01T10:32:00.000Z",
"region": {
"x": 100,
"y": 100,
"width": 800,
"height": 600
}
}
}
4. screenshot_list_displays
List all connected displays with resolution and position information.
Parameters: None
Example:
{
"name": "screenshot_list_displays",
"arguments": {}
}
Response:
{
"status": "success",
"displays": [
{
"id": "0",
"name": "Primary Display",
"resolution": { "width": 1920, "height": 1080 },
"position": { "x": 0, "y": 0 },
"isPrimary": true
},
{
"id": "1",
"name": "Secondary Display",
"resolution": { "width": 1920, "height": 1080 },
"position": { "x": 1920, "y": 0 },
"isPrimary": false
}
]
}
5. screenshot_list_windows
List all visible windows with title, process, and position information.
Parameters: None
Example:
{
"name": "screenshot_list_windows",
"arguments": {}
}
Response:
{
"status": "success",
"windows": [
{
"id": "12345",
"title": "Google Chrome",
"processName": "chrome",
"pid": 5678,
"bounds": { "x": 100, "y": 100, "width": 1280, "height": 720 },
"isMinimized": false
},
{
"id": "67890",
"title": "Terminal",
"processName": "gnome-terminal",
"pid": 9012,
"bounds": { "x": 200, "y": 200, "width": 800, "height": 600 },
"isMinimized": false
}
]
}
Security Configuration
The server enforces security policies to control screenshot operations. Configure via environment variables or security policy file.
Environment Variables
SCREENSHOT_ALLOWED_DIRS: Comma-separated list of allowed directories for saving screenshotsSCREENSHOT_MAX_CAPTURES_PER_MIN: Maximum captures per minute (default: 60)SCREENSHOT_ENABLE_AUDIT_LOG: Enable audit logging (default: true)SCREENSHOT_BLOCKED_WINDOWS: Comma-separated list of window title patterns to exclude
Security Policy File
Create a security-policy.json file:
{
"allowedDirectories": ["/home/user/screenshots", "/tmp/screenshots"],
"blockedWindowPatterns": [
".*Password.*",
".*1Password.*",
".*LastPass.*",
".*Bitwarden.*",
".*Authentication.*"
],
"maxCapturesPerMinute": 60,
"enableAuditLog": true
}
Load the policy when starting the server:
import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot";
import * as fs from "fs";
const policy = JSON.parse(fs.readFileSync("security-policy.json", "utf-8"));
const server = new MCPScreenshotServer(policy);
await server.start();
Error Handling
All tools return structured error responses with error codes and remediation suggestions.
Error Codes
| Code | Description | Remediation |
|---|---|---|
PERMISSION_DENIED | Insufficient permissions to capture | Grant Screen Recording permission (macOS) or check user permissions |
INVALID_PATH | File path outside allowed directories | Use a path within configured allowed directories |
WINDOW_NOT_FOUND | Specified window does not exist | Use screenshot_list_windows to find available windows |
DISPLAY_NOT_FOUND | Specified display does not exist | Use screenshot_list_displays to find available displays |
UNSUPPORTED_FORMAT | Requested format not supported | Use png, jpeg, webp, or bmp |
CAPTURE_FAILED | Screenshot capture failed | Check permissions and try again |
RATE_LIMIT_EXCEEDED | Too many captures in time window | Wait before making additional requests |
INVALID_REGION | Invalid region coordinates or dimensions | Ensure coordinates are non-negative and dimensions are positive |
OUT_OF_MEMORY | Insufficient memory for operation | Reduce capture size or close other applications |
ENCODING_FAILED | Image encoding failed | Try different format or reduce quality |
FILE_SYSTEM_ERROR | File system operation failed | Check permissions and disk space |
Error Response Format
{
"status": "error",
"error": {
"code": "WINDOW_NOT_FOUND",
"message": "Window with ID '12345' not found",
"details": {
"windowId": "12345"
},
"remediation": "Verify the window exists and is visible. Use screenshot_list_windows to see available windows."
}
}
Troubleshooting
Linux Issues
Problem: import: command not found or grim: command not found
Solution: Install required packages:
# X11
sudo apt-get install imagemagick
# Wayland
sudo apt-get install grim
Problem: Black screen or empty captures
Solution: Check display server environment variables:
echo $DISPLAY # Should show :0 or similar for X11
echo $WAYLAND_DISPLAY # Should show wayland-0 or similar for Wayland
macOS Issues
Problem: PERMISSION_DENIED error
Solution: Grant Screen Recording permission:
- Open System Preferences > Security & Privacy > Privacy
- Select "Screen Recording" from the list
- Add your terminal application or Node.js to the allowed list
- Restart the application
Problem: Retina display captures are double resolution
Solution: This is expected behavior. Retina displays have 2x pixel density. Use the width and height from metadata to determine actual dimensions.
Windows Issues
Problem: Capture fails with access denied
Solution: Run the application with administrator privileges or check Windows Defender settings.
Problem: Multi-monitor captures show wrong display
Solution: Use screenshot_list_displays to get correct display IDs and positions.
General Issues
Problem: RATE_LIMIT_EXCEEDED error
Solution: The server limits captures to prevent abuse. Wait 60 seconds or adjust maxCapturesPerMinute in security policy.
Problem: INVALID_PATH error when saving
Solution: Ensure the save path is within allowed directories configured in security policy.
Problem: PII masking not working
Solution:
- Ensure tesseract.js is properly installed
- Check that
eng.traineddatalanguage file is available - PII masking requires OCR which may be slow on large images
Problem: Large file sizes
Solution:
- Use JPEG format with lower quality (60-80) for smaller files
- Use WebP format for best compression
- Reduce capture region size if possible
Problem: Out of memory errors
Solution:
- Capture smaller regions instead of full screen
- Reduce quality settings
- Close other applications to free memory
- Use streaming for very large captures
Programmatic Usage
TypeScript/JavaScript
import { MCPScreenshotServer } from "@ai-capabilities-suite/mcp-screenshot";
// Create server with custom security policy
const server = new MCPScreenshotServer({
allowedDirectories: ["/home/user/screenshots"],
maxCapturesPerMinute: 30,
enableAuditLog: true,
blockedWindowPatterns: [".*Password.*"],
});
// Start server
await server.start();
// Server will handle MCP protocol requests via stdio
// Keep process running
process.on("SIGINT", async () => {
await server.stop();
process.exit(0);
});
Direct Capture Engine Usage
import { createCaptureEngine } from "@ai-capabilities-suite/mcp-screenshot";
// Create platform-specific capture engine
const engine = createCaptureEngine();
// Capture full screen
const fullScreen = await engine.captureScreen();
// List and capture windows
const windows = await engine.getWindows();
const window = windows.find((w) => w.title.includes("Chrome"));
if (window) {
const buffer = await engine.captureWindow(window.id, false);
}
// Capture region
const region = await engine.captureRegion(100, 100, 800, 600);
// List displays
const displays = await engine.getDisplays();
console.log(`Found ${displays.length} displays`);
Development
This package is part of the AI Capabilities Suite monorepo.
Build
npm run build
Test
# Run all tests
npm test
# Run specific test suites
npm test -- capture
npm test -- security
npm test -- property
# Run with coverage
npm test -- --coverage
Project Structure
packages/mcp-screenshot/
├── src/
│ ├── capture/ # Platform-specific capture engines
│ ├── processing/ # Image processing and encoding
│ ├── privacy/ # PII detection and masking
│ ├── security/ # Security policy enforcement
│ ├── storage/ # File operations
│ ├── tools/ # MCP tool implementations
│ ├── interfaces/ # TypeScript interfaces
│ ├── types/ # Type definitions
│ ├── errors/ # Error classes
│ ├── server.ts # MCP server implementation
│ └── cli.ts # CLI entry point
├── README.md
├── TESTING.md
└── package.json
Contributing
Contributions are welcome! Please ensure:
- All tests pass (
npm test) - Code follows TypeScript best practices
- New features include tests and documentation
- Security considerations are addressed
License
MIT
Support
For issues and questions:
- GitHub Issues: Create an issue
- Documentation: See TESTING.md for testing guide
- Security: Report security issues privately to security@example.com