sandraschi/pywinauto-mcp
If you are the rightful owner of pywinauto-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
PyWinAuto MCP is a modular, FastMCP 2.10+ compliant server for Windows UI automation using PyWinAuto, featuring a plugin-based architecture for extensibility and maintainability.
PyWinAuto MCP
Version 0.2.0 | 22 Comprehensive Automation Tools | Enterprise-Grade Windows UI Automation
A sophisticated, FastMCP 2.10+ compliant server for Windows UI automation using PyWinAuto. Features a comprehensive tool ecosystem, face recognition security, and professional DXT packaging with extensive prompt templates for conversational AI interaction.
🚀 Features
🏆 22 Comprehensive Automation Tools
PyWinAuto MCP provides the most complete Windows automation toolkit available:
- 🔍 Window Management (6 tools): Find, activate, maximize, minimize, position, and close windows
- 🖱️ Mouse Control (7 tools): Click, move, scroll, drag-and-drop with precision coordinates
- ⌨️ Keyboard Input (3 tools): Type text, send key combinations, special shortcuts
- 🎯 UI Elements (6 tools): Click, type, inspect, verify text, get info, check states
- 📸 Visual Intelligence (3 tools): Screenshots, OCR text extraction, image recognition
- 🔒 Face Recognition (4 tools): Add faces, recognize, list known faces, webcam verification
🤖 Conversational AI Integration
- Extensive Prompt Templates: 100+ detailed prompts for natural language interaction
- Contextual Examples: Real-world usage scenarios for each tool
- Smart Defaults: Intelligent parameter handling and error recovery
- Desktop State Capture: Complete UI element discovery with visual annotations
🏗️ Enterprise Architecture
- Dual Interface Design: MCP tools + REST API with complete feature parity
- Security-First: Face recognition authentication and access controls
- Professional Packaging: Complete DXT distribution with all dependencies
- Plugin System: Extensible architecture for custom automation tools
Example: Finding a Window
Using MCP Tools (Claude/LLM)
# Claude can call this directly as an MCP tool
window = find_window(
title="Untitled - Notepad",
class_name="Notepad"
)
Using REST API
GET /api/v1/windows/find?title=Untitled%20-%20Notepad&class_name=Notepad
Response (both interfaces)
{
"window_handle": 123456,
"title": "Untitled - Notepad",
"class_name": "Notepad",
"process_id": 9876,
"is_visible": true,
"is_enabled": true,
"rectangle": {
"left": 100,
"top": 100,
"right": 800,
"bottom": 600,
"width": 700,
"height": 500
},
"process_name": "notepad.exe"
}
Core Features
- Window Management: Find, activate, and manipulate windows
- UI Automation: Interact with controls, type text, click elements
- Element Inspection: Get detailed information about UI elements
- Screenshots: Capture window or element screenshots
- Robust Error Handling: Built-in retry mechanisms and timeouts
- MCP Integration: Seamless integration with the MCP ecosystem
Plugin System
- Modular Architecture: Extend functionality through plugins
- Built-in Plugins:
- OCR: Text extraction from images and windows
- Security: Application monitoring and access control
- Easy to Extend: Create custom plugins for specialized automation needs
🛠 Installation
-
Prerequisites:
- Windows 10/11
- Python 3.10+
- Microsoft UI Automation (UIA) support
-
Install from source:
# Clone the repository git clone https://github.com/sandraschi/pywinauto-mcp.git cd pywinauto-mcp # Create and activate a virtual environment python -m venv venv .\venv\Scripts\Activate.ps1 # Install with all dependencies (including OCR and security plugins) pip install -e ".[all]" # Or install only core dependencies # pip install -e .
-
Install Tesseract OCR (required for OCR plugin):
- Download and install Tesseract from UB Mannheim
- Add Tesseract to your system PATH
🚀 Quick Start
Option 1: DXT Package (Recommended)
-
Install the DXT CLI:
npm install -g @anthropic-ai/dxt
-
Download the latest DXT package from the releases page
-
Install the DXT package:
dxt install dist/pywinauto-mcp-0.2.0.dxt
Package Features:
- 281KB comprehensive package with all dependencies
- 23 automation tools across 7 categories (including Desktop State Capture)
- 100+ prompt templates for conversational AI
- Face recognition security and webcam integration
- OCR and visual intelligence capabilities
- Complete desktop UI analysis with element discovery
-
Start the server:
dxt run pywinauto-mcp
Option 2: From Source
-
Start the MCP server:
uvicorn pywinauto_mcp.main:app --reload
-
Example: Find and interact with Notepad
# Find Notepad window $window = Invoke-RestMethod -Uri "http://localhost:8000/api/v1/windows/find" -Method Post -Body (@{title="Untitled - Notepad"; timeout=5} | ConvertTo-Json) -ContentType "application/json" # Type some text Invoke-RestMethod -Uri "http://localhost:8000/api/v1/element/type" -Method Post -Body (@{ window_id = $window.window_handle control_id = "Edit" text = "Hello from PyWinAuto MCP!" } | ConvertTo-Json) -ContentType "application/json"
🛠️ Tool Discovery & Help
PyWinAuto MCP includes a comprehensive help system to discover and understand all available tools:
Get Help Tool
# Get overview of all tools
help_info = get_help()
# Get tools by category
window_tools = get_help(category="windows")
# Get detailed tool information
click_details = get_help(tool_name="click_element")
Tool Categories
- System Tools (4): Health checks, clipboard, timing
- Window Management (6): Find, manipulate, and control windows
- UI Elements (6): Click, type, inspect, and interact with controls
- Mouse Control (7): Precise cursor movement and clicking
- Keyboard Input (3): Text input and key combinations
- Visual Intelligence (3): Screenshots, OCR, image recognition
- Face Recognition (4): Security authentication features
- Desktop State (1): Complete UI analysis and discovery
📊 Desktop State Capture - Revolutionary Deep UI Analysis
The get_desktop_state
tool provides revolutionary deep UI introspection capabilities, going far beyond traditional window enumeration to analyze the internal state of complex applications including development environments.
🚀 Revolutionary Capabilities:
- Deep IDE Inspection: Analyzes open Cursor/VSCode instances, discovering file contents, linter errors, and development status
- Complete UI Analysis: Discovers all interactive and informative elements across the entire desktop
- Development Environment Awareness: Identifies open repositories, current files, error states, and project status
- Visual Annotations: Color-coded element boundaries on screenshots with intelligent highlighting
- OCR Enhancement: Extracts text from visual elements that standard APIs can't access
- Real-time Development Status: Monitors coding activity, error detection, and project health
Usage Examples:
# Basic UI discovery
state = get_desktop_state()
print(f"Found {state['element_count']} elements")
# Development environment analysis - discovers open IDEs and errors
dev_state = get_desktop_state(max_depth=15)
for element in dev_state['interactive_elements']:
if 'cursor' in element['app'].lower() or 'vscode' in element['app'].lower():
print(f"IDE Element: {element['name']} in {element['app']}")
# Fast IDE analysis (20-45 seconds)
state = get_desktop_state(max_depth=15, element_timeout=0.2)
# Quick scan of development environments
# With visual annotations - highlights development errors and status
state = get_desktop_state(use_vision=True, element_timeout=0.3)
# Includes base64-encoded annotated screenshot with error highlighting
# Complete analysis with OCR - reads error messages and code content
state = get_desktop_state(use_vision=True, use_ocr=True, max_depth=20, element_timeout=0.5)
# Extracts text from linter errors, terminal output, and code editors
# ⚠️ Note: Deep analysis may take 1-3 minutes for comprehensive results
📖
🧩 Plugin System
PyWinAuto MCP uses a modular plugin system to extend its functionality. Plugins can be enabled/disabled via configuration.
Available Plugins
OCR Plugin
Extract text from windows and images using Tesseract OCR.
Features:
- Extract text from any window or image
- Find text positions within images
- Support for multiple languages
- Region-based text extraction
Example:
# Extract text from a window region
text = await mcp.extract_text(
window_handle=window_handle,
x=100, y=100, width=200, height=50,
lang="eng"
)
Security Plugin
Monitor applications and detect unauthorized access.
Features:
- Application whitelisting/blacklisting
- Unauthorized access detection
- Activity logging
- Configurable alerts
Creating Custom Plugins
- Create a new Python module in
src/pywinauto_mcp/plugins/
- Create a class that inherits from
PyWinAutoPlugin
- Implement the required methods
- Register your plugin using the
@register_plugin
decorator
Example Plugin:
from pywinauto_mcp.core.plugin import PyWinAutoPlugin, register_plugin
@register_plugin
class MyCustomPlugin(PyWinAutoPlugin):
@classmethod
def get_name(cls) -> str:
return "my_plugin"
def register_tools(self):
@self.app.mcp.tool("my_tool")
async def my_tool(param: str):
return {"result": f"Processed: {param}"}
📚 API Documentation
Windows
POST /api/v1/windows/find
- Find a window by attributesGET /api/v1/windows
- List all top-level windowsGET /api/v1/windows/{handle}
- Get window detailsPOST /api/v1/windows/{handle}/activate
- Activate a windowPOST /api/v1/windows/{handle}/close
- Close a window
Elements
POST /api/v1/element/click
- Click an elementPOST /api/v1/element/type
- Type text into an elementPOST /api/v1/element/get
- Get element informationPOST /api/v1/element/screenshot
- Take a screenshot of an element
Desktop State
POST /api/v1/desktop_state/capture
- Capture complete desktop state with UI elements- Optional:
use_vision=true
for annotated screenshots - Optional:
use_ocr=true
for text extraction from elements - Optional:
max_depth=10
for UI tree traversal depth
- Optional:
🔧 Configuration
DXT Configuration
When using the DXT package, create a .env
file in the DXT package directory (typically in ~/.dxt/packages/pywinauto-mcp/.env
on Linux/macOS or %USERPROFILE%\.dxt\packages\pywinauto-mcp\.env
on Windows).
Local Development
For local development, create a .env
file in the project root:
# Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=INFO
# PyWinAuto Settings
TIMEOUT=10.0
RETRY_ATTEMPTS=3
RETRY_DELAY=1.0
# Face Recognition Settings
FACE_RECOGNITION_TOLERANCE=0.6
FACE_RECOGNITION_MODEL=hog
# Security Settings
SECURITY_ALERT_EMAIL=alerts@example.com
SECURITY_WEBHOOK_URL=
# Screenshot Settings
SCREENSHOT_DIR=./screenshots
SCREENSHOT_FORMAT=png
🔒 Security Features
Face Recognition
# Enroll a new face
curl -X POST "http://localhost:8000/face-recognition/enroll" \
-H "Content-Type: multipart/form-data" \
-F "name=John Doe" \
-F "image_file=@john.jpg"
# Verify face using webcam
curl -X POST "http://localhost:8000/face-recognition/verify/webcam?confidence_threshold=0.7"
# List known faces
curl "http://localhost:8000/face-recognition/faces"
Security Monitoring
# Monitor sensitive applications
curl -X POST "http://localhost:8000/security/monitor/apps/start" \
-H "Content-Type: application/json" \
-d '{"app_names": ["banking_app.exe"], "webcam_required": true}'
# Start intruder detection
curl -X POST "http://localhost:8000/security/monitor/intruder/start" \
-H "Content-Type: application/json" \
-d '{"sensitivity": 0.8, "alert_contacts": ["security@example.com"]}'
🤝 Contributing
We welcome contributions! PyWinAuto MCP has comprehensive contribution guidelines and a structured process:
📋 Getting Started
- 📖 : Complete development workflow and guidelines
- 🐛 : Structured bug reports and feature requests
- 🔄 CI/CD Pipeline: Automated testing and quality assurance
🛠️ Development Workflow
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature
) - Develop following our coding standards
- Test your changes thoroughly
- Submit a pull request with detailed description
📚 Documentation
- 📋 : Comprehensive project assessment and roadmap
- 📝 : Technical documentation and guides
- 🔄 : Version history and release notes
🤝 Community Standards
- 📜 : Community guidelines and expectations
- 🔒 : Vulnerability reporting and security features
📄 License
This project is licensed under the MIT License - see the file for details.