chrimage/shotty
If you are the rightful owner of shotty and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Shotty is a Model Context Protocol (MCP) server designed for screenshot capture and window management in GNOME Wayland environments, specifically for multimodal LLMs.
list_windows
Get all visible windows with IDs and titles.
capture_screenshot
Capture screenshots and save to disk.
Shotty ๐ธ
โ ๏ธ ACTIVE DEVELOPMENT - This project is currently under active development. Core technical challenges have been solved, but the implementation is still evolving. Expect breaking changes and incomplete features.
A Model Context Protocol (MCP) server that provides screenshot capture and window management tools for GNOME Wayland environments. Designed specifically for multimodal LLMs that need to "see" desktop content.
๐ฏ Project Status
- โ Core Technical Challenges Solved
- โ Window listing via GNOME extensions
- โ Window-specific screenshot capture
- โ Base64 image encoding for LLM consumption
- ๐ง In Active Development - APIs and features may change
๐ Features
Current Capabilities
- Window Listing: Enumerate all visible windows on GNOME desktop
- Screenshot Capture: Full screen and window-specific screenshots
- Window State Management: Remember and restore active windows during capture
- XDG Portal Integration: Modern, secure screenshot capture with user permissions
- Window Activation: Focus specific windows before capture
- MCP Integration: FastMCP-based server for LLM integration
- Base64 Encoding: Images ready for multimodal LLM consumption
GNOME Wayland Support
- Primary: XDG Desktop Portal (modern, secure, requires user permission)
- Secondary: GNOME Shell D-Bus API via window-calls extension
- Fallback: Legacy tools (gnome-screenshot, ImageMagick)
- Optimized UX: Automatic window state restoration after captures
๐ Prerequisites
Required
- GNOME Shell on Wayland (GNOME 42+)
- Python 3.10+
gnome-screenshot
utility
Python Dependencies
fastmcp>=1.2.0
(core MCP functionality)PyGObject>=3.42.0
(XDG Portal integration)pydbus>=0.6.0
(D-Bus communication)
System Packages (Ubuntu/Debian)
sudo apt install python3-gi python3-gi-cairo gir1.2-gtk-4.0 libgirepository1.0-dev
Recommended Extensions
- window-calls - Enables true window-specific capture and listing
๐ Setup
Shotty is an MCP server designed to be launched by MCP clients, not run directly by users. You configure your MCP client (Claude Code, Gemini CLI, etc.) to automatically launch the server when needed.
No separate installation required! Your MCP client handles launching the server using uvx
or local Python.
System Requirements
- GNOME Shell on Wayland (GNOME 42+)
- Python 3.10+
- System packages:
python3-gi python3-gi-cairo gir1.2-gtk-4.0 libgirepository1.0-dev
Recommended Extensions
- window-calls - Enables true window-specific capture and listing
๐ง Available Tools
Once configured with your MCP client, Shotty provides these tools:
list_windows()
- Get all visible windows with IDs and titlescapture_screenshot(window_id=None, include_cursor=False)
- Capture screenshots and save to disk
The MCP client automatically launches the server when you use these tools.
๐ MCP Client Integration
Claude Desktop
Add Shotty to your Claude Desktop configuration file:
Configuration file location:
- Linux:
~/.config/Claude/claude_desktop_config.json
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
- Windows:
%APPDATA%\Claude\claude_desktop_config.json
Add this to the mcpServers
section:
{
"mcpServers": {
"shotty": {
"command": "uvx",
"args": [
"--from",
"https://github.com/chrimage/shotty.git",
"shotty"
]
}
}
}
Then restart Claude Desktop and use the tools: list_windows()
and capture_screenshot()
Claude Code
To test Shotty with Claude Code, add it as an MCP server:
# Add the server to Claude Code using uvx
claude mcp add shotty uvx --from https://github.com/chrimage/shotty.git shotty
# Or using local path
claude mcp add shotty python /path/to/shotty/server.py
# Verify the server is added
claude mcp list
# Test in Claude Code
# Use the tools directly: list_windows() and capture_screenshot()
Gemini CLI
To use Shotty with Gemini CLI, add it to your settings.json
configuration file:
Settings file locations:
- User settings:
~/.gemini/settings.json
- Project settings:
.gemini/settings.json
(in project root) - System settings:
/etc/gemini-cli/settings.json
(Linux)
{
"mcpServers": {
"shotty": {
"command": "uvx",
"args": ["--from", "https://github.com/chrimage/shotty.git", "shotty"]
}
}
}
Or for local development:
{
"mcpServers": {
"shotty": {
"command": "python",
"args": ["/path/to/shotty/server.py"],
"cwd": "/path/to/shotty"
}
}
}
Then restart Gemini CLI to load the MCP server and use the screenshot tools in your conversations.
Expected Behavior
- Window Listing: Returns JSON array of windows with IDs and titles
- Screenshot Capture: Returns images that display directly in Claude Code
- Window-Specific Capture: Focuses target window, then captures it
- Storage: Screenshots saved to
~/Pictures/shotty/
directory
Permissions & First Run
- XDG Portal: First screenshot may show permission dialog - grant access for persistent permissions
- Extension Required: Window-specific features need the window-calls GNOME extension
- User Interaction: Some portal operations require active window/user interaction
Troubleshooting
- Ensure the window-calls GNOME extension is installed and enabled
- Check that
gnome-screenshot
is available in your PATH - Verify Python dependencies are installed:
pip install fastmcp PyGObject pydbus
- For "Permission denied" errors, try taking a screenshot manually first to grant portal permissions
๐๏ธ Architecture
The server implements a dual-approach strategy:
- Primary: GNOME extension integration for accurate window data
- Fallback: Process-based detection for basic functionality
- Screenshot: Multiple capture methods with automatic fallback
โก Testing
Standalone Testing
# Test window listing
python -c "from server import _list_windows_via_extension; print(_list_windows_via_extension())"
# Test full screen capture (creates Image object)
python -c "from server import _capture_full_screen; from fastmcp.utilities.types import Image; import base64; data=_capture_full_screen(); img=Image(data=base64.b64decode(data), format='image/png'); print(f'Created {len(img.data)} byte image')"
MCP Integration Testing
# Add to Claude Code using uvx (recommended)
claude mcp add shotty uvx --from https://github.com/chrimage/shotty.git shotty
# Or using local path (replace with your actual path)
claude mcp add shotty python /home/chris/code/mcp-servers/shotty/server.py
# In Claude Code, test with:
# list_windows()
# capture_screenshot()
# capture_screenshot(window_id="WINDOW_ID_FROM_LIST")
๐ Known Limitations
- Wayland Security: Some window operations require GNOME extensions
- Extension Dependency: Best functionality requires window-calls extension
- Development Status: APIs may change without notice
๐ค Contributing
This project is in active development. Core technical challenges have been solved, but the implementation is rapidly evolving.
- ๐ฌ Research Phase: Understanding GNOME Wayland capabilities
- ๐ ๏ธ Implementation Phase: Building robust capture mechanisms
- ๐งช Testing Phase: Validating with multimodal LLMs
๐ License
MIT License - See LICENSE file for details
๐ Acknowledgments
- Built with FastMCP framework
- GNOME Shell extension ecosystem
- Model Context Protocol specification
โ ๏ธ Remember: This project is under active development. Star and watch for updates!