browser-use-mcp

fu-tran-tpv-clv/browser-use-mcp

3.2

If you are the rightful owner of browser-use-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The MCP Browser Use Server is designed to convert natural-language manual test steps into browser automation actions, providing precise selectors and element indices.

MCP Browser Use Server

MCP Browser Use Server converts natural-language manual test steps into fully-qualified browser automation actions (returning XPath/CSS selectors and element indices). The server follows the Model Context Protocol 2024-11-05 and exposes a single /message endpoint (HTTP POST or SSE) ready for integration with GitHub Copilot, Cursor and other AI-powered IDEs.

Table of Contents

  1. Features
  2. High-Level Architecture
  3. Requirements
  4. Quick Start
  5. Environment Variables
  6. Usage
  7. Project Structure
  8. Contribution Guidelines
  9. License

Features

Natural Language → Browser Automation – returns a model_actions array containing every step with precise selectors.
MCP-compliant API/message supports JSON-RPC 2.0 over HTTP or SSE; root / is a health check.
Real-time Streaming – watch task progress live via Server-Sent Events.
Session Management – automatic creation, monitoring, and clean-up of sessions.
Multi-LLM Support – Mistral by default, switch to OpenAI, Claude, Gemini via environment variables.
Developer-friendly Makefile – install, run, lint, test, CI in one place.
Ant Design Prompt – built-in advanced prompt for React + AntD apps (browser_agent.py).

High-Level Architecture

graph TD;
  IDE["IDE / Copilot"];
  MCP["MCP Server<br/>FastAPI + asyncio"];
  IDE -- "JSON-RPC 2.0<br/>HTTP & SSE" --> MCP;
  MCP --> BA["Browser Agent"];
  MCP --> SM["Session Manager"];
  BA --> BU["browser-use"];
  BU --> CH["Chrome"];
  SM --> RD["Redis*"];
  RD --> MT["Metrics"];

Redis persistence is planned for a future release.

Requirements

  • Python 3.11+
  • Google Chrome / Chromium
  • macOS / Linux / Windows

Quick Start

# 1. Create virtual-env
make venv
source venv/bin/activate   # Windows: .\venv\Scripts\activate

# 2. Install dependencies
make install

# 3. Set API keys
export MISTRAL_API_KEY="sk-..."          # required
# optional: switch LLM provider
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-..."
export GOOGLE_API_KEY="AIza..."

# 4. Launch the server
make start              # fastest way
# or with banner & health-monitor
make start-enhanced

After startup the service listens on http://localhost:8002.

Cursor / GitHub Copilot MCP configuration

Add the following snippet to your IDE's MCP config file (.cursor/mcp.json, .copilot/mcp.json, etc.) so that Cursor or GitHub Copilot can route tools/call requests to this server:

{
  "browser-automation": {
    "url": "http://localhost:8002/message"
  }
}

Environment Variables

VariableRequiredDescription
LLM_PROVIDERLLM provider openai, claude, gemini, mistral, etc. default openai
LLM_API_URLetc. https://api.openai.com/v1/ for openai, https://api.mistral.ai/v1/ for mistral, https://api.deepseek.com/v1 for deepseek, and so on...
MISTRAL_API_KEYMistral model API key (default provider).
OPENAI_API_KEYGPT-4/3.5 key if you prefer OpenAI.
ANTHROPIC_API_KEYClaude key.
GOOGLE_API_KEYGemini key.
LLM_API_KEYOther LLM provider key.
MODEL_IDOverride default model id.
CHROME_PATHExplicit Chrome path (auto-detected otherwise).
HOST / PORTOverride host/port (default localhost:8002).
LOG_LEVELDEBUG, INFO, WARNING, …

Usage

1. Health Check

curl http://localhost:8002/
# → {"status":"ok","active_sessions":0}

2. Initialise an MCP session

curl -X POST http://localhost:8002/message \ 
     -H 'Content-Type: application/json' \ 
     -d '{
  "jsonrpc": "2.0",
  "id": "init-1",
  "method": "initialize",
  "params": {
    "protocolVersion": "2024-11-05",
    "capabilities": {"tools": {}},
    "clientInfo": {"name": "demo", "version": "1.0"}
  }
}' -i
# The response header "Mcp-Session-Id: <uuid>" contains the session id.

3. Call execute_task_steps_by_browser

curl -X POST http://localhost:8002/message \ 
     -H 'Content-Type: application/json' \ 
     -H "Mcp-Session-Id: <uuid>" \ 
     -d '{
  "jsonrpc": "2.0",
  "id": "task-1",
  "method": "tools/call",
  "params": {
    "name": "execute_task_steps_by_browser",
    "arguments": {
      "task_message": "<task-step>Go to example.com and click Login</task-step>"
    }
  }
}'

See for an end-to-end demonstration.

4. Real-time SSE Stream

curl -N -H "Accept:text/event-stream" http://localhost:8002/message

Project Structure

├── browser_agent.py                 # Browser agent & AntD prompt
├── config.py                        # Environment-driven configuration loader
├── streamable-browser-use-mcp-server.py  # FastAPI server (single endpoint)
├── start_server.py                  # Advanced launcher (banner, health-check)
├── example_usage.py                 # API demo
├── Makefile                         # Developer commands
└── requirements.txt                 # Dependencies

Contribution Guidelines

Coding Rules

  • PEP 8, full type hints, f-strings, public docstrings.
  • Prefer async/await for I/O-bound code.
  • Run make format lint type-check before pushing.

Conventional Commits

<type>(<scope>): <short summary>

<body>
  • type: feat, fix, docs, refactor, test, chore …
  • scope: module or file name.
  • Example: feat(agent): support AntD Select component.

Workflow

  1. Fork → create feature branch from main.
  2. Ensure make validate passes (config, lint, type, tests).
  3. Open Pull Request with detailed description.
  4. Maintainers will review & merge.

GitHub Copilot & Cursor

Copilot/Cursor work best when the commit scope is clear. Add a short comment at the top of new files explaining the intent so Copilot has context. Keep prompts and tests near the code they exercise.

License

Released under the MIT License. See for full text.