TIMBOTGPT/screen-vision-mcp
If you are the rightful owner of screen-vision-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Screen Vision MCP Server is a Model Context Protocol server designed for macOS, offering advanced screen capture, OCR, and visual understanding capabilities.
capture_fullscreen
Capture the entire screen.
capture_window
Capture a specific application window.
capture_region
Capture a specific region of the screen.
extract_text_from_screen
Capture screen and extract text using OCR.
find_text_on_screen
Find text on screen and return its location.
get_window_list
Get list of all open windows with their positions.
get_screen_info
Get information about available screens/displays.
click_at_position
Click at a specific screen position.
monitor_screen_region
Monitor a screen region for changes over time.
Screen Vision MCP Server
A Model Context Protocol (MCP) server that provides comprehensive screen capture, OCR, and visual understanding capabilities for macOS.
Features
- capture_fullscreen: Capture the entire screen
- capture_window: Capture specific application windows
- capture_region: Capture defined screen regions
- extract_text_from_screen: OCR text extraction from screenshots
- find_text_on_screen: Locate text on screen and return coordinates
- get_window_list: List all open windows with details
- get_screen_info: Get display and screen information
- click_at_position: Automated clicking at specific coordinates
- monitor_screen_region: Monitor regions for changes over time
- Screenshot resource management and retrieval
Installation
Quick Install
npm install -g screen-vision-mcp
From Source
-
Clone the repository:
git clone https://github.com/TIMBOTGPT/screen-vision-mcp.git cd screen-vision-mcp
-
Install dependencies:
npm install
-
Test the server:
npm start
Usage with Claude Desktop
Add this server to your Claude Desktop MCP configuration (claude_desktop_config.json
):
{
"mcpServers": {
"screen-vision": {
"command": "npx",
"args": ["-y", "screen-vision-mcp"],
"description": "Screen capture and vision analysis"
}
}
}
Or if installed locally:
{
"mcpServers": {
"screen-vision": {
"command": "node",
"args": ["/path/to/screen-vision-mcp/index.js"],
"description": "Screen capture and vision analysis"
}
}
}
Available Tools
capture_fullscreen
Capture the entire screen.
Parameters:
save_path
(optional): Custom save path for the screenshot
Example:
{
"name": "capture_fullscreen",
"arguments": {
"save_path": "/path/to/save/screenshot.png"
}
}
capture_window
Capture a specific application window.
Parameters:
app_name
(required): Name of the application (e.g., "Safari", "Terminal")save_path
(optional): Custom save path
Example:
{
"name": "capture_window",
"arguments": {
"app_name": "Safari",
"save_path": "/path/to/save/window.png"
}
}
capture_region
Capture a specific region of the screen.
Parameters:
x
(required): X coordinatey
(required): Y coordinatewidth
(required): Width of regionheight
(required): Height of regionsave_path
(optional): Custom save path
extract_text_from_screen
Capture screen and extract text using OCR.
Parameters:
region
(optional): Specific region to capturex
,y
,width
,height
: Region coordinates
find_text_on_screen
Find text on screen and return its location.
Parameters:
text
(required): Text to search forcase_sensitive
(optional): Whether search should be case sensitive (default: false)
get_window_list
Get list of all open windows with their positions.
get_screen_info
Get information about available screens/displays.
click_at_position
Click at a specific screen position.
Parameters:
x
(required): X coordinatey
(required): Y coordinatebutton
(optional): Mouse button ('left', 'right', 'middle', default: 'left')double_click
(optional): Whether to double-click (default: false)
monitor_screen_region
Monitor a screen region for changes over time.
Parameters:
x
,y
,width
,height
(required): Region to monitorduration_seconds
(optional): How long to monitor (max 30 seconds, default: 5)interval_ms
(optional): Check interval in milliseconds (default: 1000)
Requirements
- macOS (uses native
screencapture
command) - Node.js 16+
- Claude Desktop with MCP support
- Screen recording permissions for automation features
Permissions
On first use, macOS may request permissions for:
- Screen recording
- Accessibility (for clicking automation)
- File system access (for saving screenshots)
Grant these permissions in System Preferences > Security & Privacy.
Screenshots Storage
Screenshots are automatically saved to a screenshots/
directory within the server folder. You can:
- Access screenshots via the resource URI system
- Specify custom save paths for individual captures
- View saved screenshots through Claude's resource system
Development
# Install dependencies
npm install
# Start development server
npm run dev
# Run tests
npm test
Advanced Features
OCR Integration
The server includes hooks for macOS Vision framework integration for advanced OCR capabilities. Full OCR requires additional setup with native macOS Vision APIs.
Automation
The clicking and monitoring features enable automation workflows when combined with other MCP servers.
Security
- All screen captures require explicit permission
- File system access is controlled by macOS permissions
- No network access required for core functionality
License
MIT License - see LICENSE file for details
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
Support
For issues and questions, please use the GitHub Issues page.