screen-vision-mcp by TIMBOTGPT - MCP Server

Screen Vision MCP Server

A Model Context Protocol (MCP) server that provides comprehensive screen capture, OCR, and visual understanding capabilities for macOS.

Features

capture_fullscreen: Capture the entire screen
capture_window: Capture specific application windows
capture_region: Capture defined screen regions
extract_text_from_screen: OCR text extraction from screenshots
find_text_on_screen: Locate text on screen and return coordinates
get_window_list: List all open windows with details
get_screen_info: Get display and screen information
click_at_position: Automated clicking at specific coordinates
monitor_screen_region: Monitor regions for changes over time
Screenshot resource management and retrieval

Installation

Quick Install

npm install -g screen-vision-mcp

From Source

Clone the repository:

git clone https://github.com/TIMBOTGPT/screen-vision-mcp.git
cd screen-vision-mcp

Install dependencies:
```
npm install
```
Test the server:
```
npm start
```

Usage with Claude Desktop

Add this server to your Claude Desktop MCP configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "screen-vision": {
      "command": "npx",
      "args": ["-y", "screen-vision-mcp"],
      "description": "Screen capture and vision analysis"
    }
  }
}

Or if installed locally:

{
  "mcpServers": {
    "screen-vision": {
      "command": "node",
      "args": ["/path/to/screen-vision-mcp/index.js"],
      "description": "Screen capture and vision analysis"
    }
  }
}

Available Tools

capture_fullscreen

Capture the entire screen.

Parameters:

save_path (optional): Custom save path for the screenshot

Example:

{
  "name": "capture_fullscreen",
  "arguments": {
    "save_path": "/path/to/save/screenshot.png"
  }
}

capture_window

Capture a specific application window.

Parameters:

app_name (required): Name of the application (e.g., "Safari", "Terminal")
save_path (optional): Custom save path

Example:

{
  "name": "capture_window",
  "arguments": {
    "app_name": "Safari",
    "save_path": "/path/to/save/window.png"
  }
}

capture_region

Capture a specific region of the screen.

Parameters:

x (required): X coordinate
y (required): Y coordinate
width (required): Width of region
height (required): Height of region
save_path (optional): Custom save path

extract_text_from_screen

Capture screen and extract text using OCR.

Parameters:

region (optional): Specific region to capture
- x, y, width, height: Region coordinates

find_text_on_screen

Find text on screen and return its location.

Parameters:

text (required): Text to search for
case_sensitive (optional): Whether search should be case sensitive (default: false)

get_window_list

Get list of all open windows with their positions.

get_screen_info

Get information about available screens/displays.

click_at_position

Click at a specific screen position.

Parameters:

x (required): X coordinate
y (required): Y coordinate
button (optional): Mouse button ('left', 'right', 'middle', default: 'left')
double_click (optional): Whether to double-click (default: false)

monitor_screen_region

Monitor a screen region for changes over time.

Parameters:

x, y, width, height (required): Region to monitor
duration_seconds (optional): How long to monitor (max 30 seconds, default: 5)
interval_ms (optional): Check interval in milliseconds (default: 1000)

Requirements

macOS (uses native screencapture command)
Node.js 16+
Claude Desktop with MCP support
Screen recording permissions for automation features

Permissions

On first use, macOS may request permissions for:

Screen recording
Accessibility (for clicking automation)
File system access (for saving screenshots)

Grant these permissions in System Preferences > Security & Privacy.

Screenshots Storage

Screenshots are automatically saved to a screenshots/ directory within the server folder. You can:

Access screenshots via the resource URI system
Specify custom save paths for individual captures
View saved screenshots through Claude's resource system

Development

# Install dependencies
npm install

# Start development server
npm run dev

# Run tests
npm test

Advanced Features

OCR Integration

The server includes hooks for macOS Vision framework integration for advanced OCR capabilities. Full OCR requires additional setup with native macOS Vision APIs.

Automation

The clicking and monitoring features enable automation workflows when combined with other MCP servers.

Security

All screen captures require explicit permission
File system access is controlled by macOS permissions
No network access required for core functionality

License

MIT License - see LICENSE file for details

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Support

For issues and questions, please use the GitHub Issues page.