pywinauto-mcp

sandraschi/pywinauto-mcp

3.4

If you are the rightful owner of pywinauto-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

PyWinAuto MCP is a modular, FastMCP 2.10+ compliant server for Windows UI automation using PyWinAuto, featuring a plugin-based architecture for extensibility and maintainability.

Tools
1
Resources
0
Prompts
0

PyWinAuto MCP - Portmanteau Edition

Version 0.3.1 | 8 Comprehensive Portmanteau Tools | FastMCP 2.13.1 | SOTA 2026 Compliant

A sophisticated, FastMCP 2.13.1 compliant server for Windows UI automation using PyWinAuto. Features 8 comprehensive portmanteau tools consolidating 60+ operations, face recognition security, and professional packaging.

🚀 What's New in v0.3.0 - Portmanteau Edition

Tool Consolidation

Previous versions had 60+ individual tools scattered across multiple files with duplicates. The Portmanteau Edition consolidates everything into 8 comprehensive tools:

ToolOperationsDescription
automation_windows11Window management (list, find, maximize, minimize, etc.)
automation_elements14UI element interaction (click, hover, text, etc.)
automation_mouse9Mouse control (move, click, scroll, drag)
automation_keyboard4Keyboard input (type, press, hotkey)
automation_visual4Visual operations (screenshot, OCR, find image)
automation_face5Face recognition (add, recognize, list, delete)
automation_system7System utilities (health, help, clipboard, processes)
get_desktop_state1Comprehensive desktop UI element discovery

Benefits

  • Reduced tool explosion: 60+ tools → 8 tools
  • No duplicates: Each operation defined once
  • Better discoverability: Related operations grouped together
  • FastMCP 2.13.1 compliant: Latest features and security fixes
  • SOTA 2026 Standard: 100% docstring compliance (Ruff D-rules) and industrial technical documentation

🏆 Features

🔍 Window Management (automation_windows)

# List all windows
automation_windows("list")

# Find window by title
automation_windows("find", title="Notepad", partial=True)

# Maximize, minimize, restore
automation_windows("maximize", handle=12345)
automation_windows("minimize", handle=12345)
automation_windows("restore", handle=12345)

# Position and size
automation_windows("position", handle=12345, x=100, y=100, width=800, height=600)

🎯 Element Interaction (automation_elements)

# Click elements
automation_elements("click", window_handle=12345, control_id="btnOK")
automation_elements("double_click", window_handle=12345, control_id="listItem")
automation_elements("right_click", window_handle=12345, x=100, y=200)

# Get/set text
automation_elements("text", window_handle=12345, control_id="Edit1")
automation_elements("set_text", window_handle=12345, control_id="Edit1", text="Hello!")

# Wait and verify
automation_elements("wait", window_handle=12345, control_id="loading", timeout=10.0)
automation_elements("verify_text", window_handle=12345, control_id="status", expected_text="Ready")

🖱️ Mouse Control (automation_mouse)

# Position and movement
automation_mouse("position")
automation_mouse("move", x=500, y=300)
automation_mouse("move_relative", x=10, y=-5)

# Clicking
automation_mouse("click", x=500, y=300)
automation_mouse("double_click", x=500, y=300)
automation_mouse("right_click")

# Scrolling and dragging
automation_mouse("scroll", amount=3)
automation_mouse("drag", x=100, y=100, target_x=500, target_y=300)

⌨️ Keyboard Input (automation_keyboard)

# Type text
automation_keyboard("type", text="Hello World!")

# Press keys
automation_keyboard("press", key="enter")
automation_keyboard("hotkey", keys=["ctrl", "c"])
automation_keyboard("hotkey", keys=["ctrl", "shift", "s"])

📸 Visual Intelligence (automation_visual)

# Screenshots
automation_visual("screenshot")
automation_visual("screenshot", window_handle=12345, return_base64=True)

# OCR text extraction
automation_visual("extract_text", image_path="screen.png")

# Find image on screen
automation_visual("find_image", template_path="button.png", threshold=0.8)

🔒 Face Recognition (automation_face)

# Add and recognize faces
automation_face("add", name="John Doe", image_path="john.jpg")
automation_face("recognize", image_path="unknown.jpg")

# List and manage
automation_face("list")
automation_face("delete", name="John Doe")

# Webcam capture
automation_face("capture", camera_index=0)

⚙️ System Utilities (automation_system)

# Health and help
automation_system("health")
automation_system("help")

# Wait operations
automation_system("wait", seconds=2.5)
automation_system("wait_for_window", title="Notepad", timeout=10.0)

# Clipboard
automation_system("clipboard_get")
automation_system("clipboard_set", text="Copied!")

# Process list
automation_system("process_list")

📊 Desktop State Capture

# Basic UI discovery
get_desktop_state()

# With visual annotations
get_desktop_state(use_vision=True)

# With OCR text extraction
get_desktop_state(use_ocr=True)

# Full analysis
get_desktop_state(use_vision=True, use_ocr=True, max_depth=15)

🛠 Installation

Prerequisites

  • Windows 10/11
  • Python 3.10+
  • Microsoft UI Automation (UIA) support

Install from source

# Clone the repository
git clone https://github.com/sandraschi/pywinauto-mcp.git
cd pywinauto-mcp

# Create and activate a virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1

# Install core package
pip install -e .

# Install with face recognition
pip install -e ".[face]"

# Install with all dependencies (including dev tools)
pip install -e ".[all]"

Install Tesseract OCR (for OCR features)

Download and install Tesseract from UB Mannheim

🚀 Quick Start

Start the MCP Server

# Direct run
python -m pywinauto_mcp

# Or using the entry point
pywinauto-mcp

Claude Desktop Configuration

Add to your Claude Desktop claude_desktop_config.json:

{
  "mcpServers": {
    "pywinauto": {
      "command": "python",
      "args": ["-m", "pywinauto_mcp"],
      "cwd": "D:\\Dev\\repos\\pywinauto-mcp"
    }
  }
}

🔧 Configuration

Create a .env file in the project root:

# Server Configuration
HOST=0.0.0.0
PORT=8000
LOG_LEVEL=INFO

# PyWinAuto Settings
TIMEOUT=10.0
RETRY_ATTEMPTS=3
RETRY_DELAY=1.0

# Face Recognition Settings
FACE_RECOGNITION_TOLERANCE=0.6
FACE_RECOGNITION_MODEL=hog

# Screenshot Settings
SCREENSHOT_DIR=./screenshots
SCREENSHOT_FORMAT=png

📚 Architecture

Portmanteau Pattern

The Portmanteau Edition follows FastMCP 2.13+ best practices:

pywinauto_mcp/
├── app.py                    # FastMCP app instance
├── main.py                   # Entry point
└── tools/
    ├── __init__.py           # Tool registration
    ├── portmanteau_windows.py    # Window management
    ├── portmanteau_elements.py   # UI elements
    ├── portmanteau_mouse.py      # Mouse control
    ├── portmanteau_keyboard.py   # Keyboard input
    ├── portmanteau_visual.py     # Visual/OCR
    ├── portmanteau_face.py       # Face recognition
    ├── portmanteau_system.py     # System utilities
    ├── desktop_state.py          # Desktop state (standalone)
    └── archived/                 # Legacy tools (preserved)

Why Portmanteau?

  1. Prevents tool explosion: Instead of 60+ tools, 8 comprehensive tools
  2. Better discoverability: Related operations grouped logically
  3. Reduced cognitive load: Fewer tools to remember
  4. Consistent interface: Each tool follows the same pattern
  5. Easier maintenance: Changes in one place affect all operations

🤝 Contributing

See for development workflow and guidelines.

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments