sushant097/Custom-MCP-server-to-paint-in-Python
If you are the rightful owner of Custom-MCP-server-to-paint-in-Python and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
This document provides a structured summary of a Model Context Protocol (MCP) server designed to control a GUI drawing application using an Agent (LLM) without direct API access.
EAG v2 — Assignment 4 (Agent → MCP → GUI “Paint”)
Make an Agent (LLM) control a GUI drawing app that has no API by calling MCP tools. The Agent must open a paint-like app, draw a rectangle, then write text inside the rectangle — with no manual paint calls in the client code.
This repo contains a Mac-ready solution that uses Preview.app + Markup tools instead of MS Paint.
What’s in this repo
-
server.py– MCP server exposing GUI tools:open_paint()→ opens Preview with a blank canvas, sizes the window, shows Markup toolbardraw_rectangle(x1, y1, x2, y2)→ draws a rectangle via Preview’s Tools → Annotate → Rectangleadd_text_in_paint(text)→ inserts a Text box and types your text- All coordinates are window-relative (origin = top-left of Preview’s front window)
-
talk2mcp.py– MCP client agent:-
Starts the server via stdio and lists available tools
-
Uses Gemini Flash 2.0 (via
google-genai) with a strict system prompt that forces the model to emit one line per step:FUNCTION_CALL: <tool_name>|arg1|arg2|...- or
FINAL_ANSWER: DONE
-
Parses that single line and calls the MCP tool. No manual GUI calls here — the Agent drives everything.
-
-
requirements.txt– macOS dependencies (no Windows-only libs):mcp google-genai python-dotenv pyautogui pyobjc pillow
What changed (vs the instructor’s Windows/MS Paint baseline)
-
OS swap (Windows → macOS)
- Replaced
pywinauto/pywin32withpyautogui+ AppleScript to control Preview.app (works on Apple Silicon). - Dropped Windows-only packages; updated
requirements.txtaccordingly.
- Replaced
-
Window-relative coordinates
- The server reads the Preview window bounds and converts window-relative
(x, y)to absolute screen coordinates. - This makes things resilient to multi-monitor setups and different resolutions.
- The server reads the Preview window bounds and converts window-relative
-
No manual paint calls in the client
talk2mcp.pynow only forwards the Agent’sFUNCTION_CALL:lines to the server.- The Agent itself decides the sequence:
open_paint→draw_rectangle→add_text_in_paint→FINAL_ANSWER.
-
Robust tool activation
- Preview Markup toolbar is toggled programmatically.
- Tools are invoked via menu navigation (AppleScript), which is more reliable than keyboard shortcuts across setups.
How it works (flow)
+--------------------+ MCP (stdio) +------------------+ mac GUI
| talk2mcp.py | <--------------------> | server.py | (Preview.app)
| (Agent client) | | (MCP server) |
+--------------------+ +------------------+
| list_tools() |
|----------------------------------------------->|
| |
| prompt LLM (system prompt lists tools) |
| |
| LLM: "FUNCTION_CALL: open_paint" |
|----------------------------------------------->| open Preview + Markup toolbar
| LLM: "FUNCTION_CALL: draw_rectangle|x1|y1|x2|y2"
|----------------------------------------------->| drag to draw rectangle
| LLM: "FUNCTION_CALL: add_text_in_paint|Hello" |
|----------------------------------------------->| place text box + type
| LLM: "FINAL_ANSWER: DONE" |
v
complete
Setup (macOS)
-
Create venv & install
python -m venv venv source venv/bin/activate pip install -r requirements.txt -
API key Create a
.envfile in the project root:GEMINI_API_KEY=your_google_api_key_here -
Permissions (very important)
-
System Settings → Privacy & Security
- Accessibility → allow your Terminal/iTerm/Python
- Screen Recording → allow the same
-
Restart your terminal if changes don’t take effect.
-
Run
Open two terminals (or tabs):
Terminal A – server
source venv/bin/activate
python server.py
Terminal B – client
source venv/bin/activate
python talk2mcp.py
You should see the Agent emit calls like:
FUNCTION_CALL: open_paint
FUNCTION_CALL: draw_rectangle|400|250|1100|650
FUNCTION_CALL: add_text_in_paint|What is the capital of Nepal?
FINAL_ANSWER: DONE
And in Preview you’ll see a rectangle and your text inside.
Tuning the Agent’s behavior
-
In
talk2mcp.py:-
Set what you want written inside the rectangle:
user_text = "Hello from the Agent!" query = f'Task: Draw a rectangle and write the following text inside it: "{user_text}".' -
The system prompt already enforces the one-line response format and mentions the three tools.
-
-
If the Agent draws off-canvas:
- Adjust the rectangle coordinates you suggest (the Agent generally picks sensible values after a successful first run).
- You can also add a hint into the
query, e.g. “Use coordinates roughly in the center of the window.”
Coordinates & multi-monitor notes
-
All tool coords are window-relative in
server.py.(0,0)is the top-left of Preview’s front window.- If your rectangle isn’t visible, try smaller values (e.g.,
x1=200, y1=150, x2=1000, y2=600).
-
The server sizes the window to a known rectangle (defaults:
left=100, top=80, width=1400, height=900) to make this repeatable. You can change those insideopen_paint().
Troubleshooting
-
Preview doesn’t respond / clicks do nothing
- Check Accessibility + Screen Recording permissions.
- Ensure Preview is frontmost (the server calls
activate_app("Preview"), but try clicking Preview once manually).
-
“Text” tool not inserted
-
Menus may be localized. In
server.py, the menu click is:Tools → Annotate → TextIf your system language differs, change
"Text"to your local menu label.
-
-
Gemini errors (auth/model)
- Confirm
.envhas a validGEMINI_API_KEY. - Internet required for
google-genai.
- Confirm
-
Agent keeps saying DONE without drawing
- The system prompt forces tool calls, but if it still happens, increase the loop iterations in
talk2mcp.pyor include a stronger hint inquerylike “First call open_paint, then draw_rectangle, then add_text_in_paint.”
- The system prompt forces tool calls, but if it still happens, increase the loop iterations in
Quick FAQ
Q: Can I still use MS Paint if I switch to Windows later?
A: Yes. Swap server.py to the Windows version that uses pywinauto and keep talk2mcp.py unchanged (tool names stay the same).
Q: Can the Agent choose coordinates itself?
A: Yes; the prompt encourages it. If it picks poorly, add a coordinate hint or provide a “safe default rectangle” in the prompt.
Q: How do I verify the Agent actually called the tools?
A: Both talk2mcp.py and server.py print logs for each tool call — you’ll see the FUNCTION_CALL: lines and the server’s responses in the consoles.