mlamoure/selenium-mcp-server
If you are the rightful owner of selenium-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Selenium MCP Server is a Model Context Protocol server that enables browser automation through Selenium Grid, allowing AI agents to control web browsers without writing direct Selenium scripts.
Selenium MCP Server
A Model Context Protocol (MCP) server that provides browser automation capabilities via Selenium Grid. AI agents can drive web browsers step-by-step using MCP tools instead of writing Selenium scripts directly.
Architecture
Built with FastMCP 2.0 using Streamable HTTP transport for self-hosted deployment. Deploy once on your infrastructure and connect from any MCP client remotely.
- Streamable HTTP - Native HTTP transport with streaming support
- Selenium Grid Backend - Separates browser instances from the MCP server for independent scaling
- Centralized Endpoint - Single server serves multiple AI agents/clients
Features
- 35+ MCP Tools for complete browser automation
- Session Management with automatic timeout and cleanup
- Element Registry for stable element references across tool calls
- Domain Guardrails to restrict navigation to allowed domains
- Structured Errors with helpful suggestions for AI recovery
- Docker Support with Selenium Grid included
Quick Start
Deploy from Container Registry (Recommended)
Deploy directly without cloning the repository. Create a docker-compose.yml:
services:
selenium-mcp:
image: ghcr.io/mlamoure/selenium-mcp-server:latest
ports:
- "8000:8000"
environment:
- SELENIUM_MCP_SELENIUM_GRID_URL=http://selenium-hub:4444
depends_on:
selenium-hub:
condition: service_healthy
restart: unless-stopped
selenium-hub:
image: selenium/hub:4.25
ports:
- "4444:4444"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4444/status"]
interval: 10s
timeout: 5s
retries: 5
chrome-node:
image: selenium/node-chrome:4.25
shm_size: 2g
environment:
- SE_EVENT_BUS_HOST=selenium-hub
- SE_EVENT_BUS_PUBLISH_PORT=4442
- SE_EVENT_BUS_SUBSCRIBE_PORT=4443
- SE_NODE_MAX_SESSIONS=4
depends_on:
selenium-hub:
condition: service_healthy
docker-compose up -d
# MCP server: http://localhost:8000/mcp/
# Selenium Grid UI: http://localhost:4444
Using Docker Compose (From Source)
git clone https://github.com/mlamoure/selenium-mcp-server.git
cd selenium-mcp-server
docker-compose up -d
# MCP server: http://localhost:8000/mcp/
# Selenium Grid UI: http://localhost:4444
Manual Installation
# Requires Python 3.12+ and a running Selenium Grid
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Set Grid URL and run
export SELENIUM_MCP_SELENIUM_GRID_URL=http://localhost:4444
python -m selenium_mcp
Tool Categories
Session Management
create_session- Create a new browser session (with optional video recording)close_session- Close a browser sessionget_session_info- Get detailed session info including recording statuslist_sessions- List all active sessionsping- Health check
Navigation
navigate- Go to a URLreload_page- Refresh the current pagenavigate_back/navigate_forward- Browser historyget_page_info- Get current URL, title, and ready state
Observation
get_dom- Get page HTML (with optional script/style stripping)get_visible_text- Get all visible text contentquery_elements- Find elements and get stable IDsget_screenshot- Capture page screenshotget_console_logs- Get browser console logs
Actions
click_element/click_selector- Click elementstype_text- Type into input fieldsclear_element- Clear input contentset_checkbox_state- Check/uncheck checkboxesselect_dropdown_option- Select from dropdownsscroll_to_element/scroll_by- Scroll the pagehover_element- Mouse hoverdrag_drop- Drag and dropsend_keys- Send keyboard keysupload_file- Upload files
Waits
wait_for_selector- Wait for element (exists/visible/clickable/hidden)wait_for_url- Wait for URL to match patternwait_for_ready_state- Wait for document ready statewait_for_text- Wait for text to appear
JavaScript
execute_script- Run synchronous JavaScriptexecute_async_script- Run asynchronous JavaScript
Configuration
Configure via environment variables (prefix: SELENIUM_MCP_):
| Variable | Description | Default |
|---|---|---|
SELENIUM_GRID_URL | Selenium Grid endpoint | http://localhost:4444 |
DEFAULT_BROWSER | Default browser type | chrome |
MAX_CONCURRENT_SESSIONS | Maximum simultaneous sessions | 10 |
SESSION_MAX_LIFETIME_SECONDS | Session auto-close timeout | 900 |
SESSION_MAX_IDLE_SECONDS | Idle session timeout | 300 |
RECORDING_DEFAULT | Default video recording state | true |
RECORDING_FORCE | Force recording setting (ignore client) | false |
ALLOWED_DOMAINS | Comma-separated allowed domains | (all allowed) |
DOM_MAX_CHARS | Max chars for get_dom | 20000 |
DEFAULT_WAIT_TIMEOUT_MS | Default wait timeout | 10000 |
HOST | Server bind address | 0.0.0.0 |
PORT | Server port | 8000 |
Video Recording
When using Selenium Grid with video recording sidecars, sessions can be recorded for debugging and review. Recording is controlled via the se:recordVideo Selenium capability.
Configuration
RECORDING_DEFAULT- Whentrue(default), sessions record video unless explicitly disabledRECORDING_FORCE- Whentrue, ignores client preferences and usesRECORDING_DEFAULTfor all sessions
Usage
# Create session with recording (uses server default if not specified)
result = await client.call_tool("create_session", {
"browser": "chrome",
"record_video": True # Optional: explicitly enable/disable
})
# Response includes recording info
# {
# "session_id": "sess_abc123",
# "record_video": true,
# "selenium_session_id": "abc123-def456-..." # Grid session ID for video retrieval
# }
Video Retrieval
The selenium_session_id in the response is the Selenium Grid's internal session ID. Use this ID to locate recorded videos in your Grid's video storage. The exact retrieval method depends on your Grid configuration.
Example Usage
# Typical AI agent workflow:
# 1. Create a session (recording enabled by default)
result = await client.call_tool("create_session", {"browser": "chrome"})
session_id = result["session_id"]
# 2. Navigate to a page
await client.call_tool("navigate", {
"session_id": session_id,
"url": "https://example.com"
})
# 3. Find elements
elements = await client.call_tool("query_elements", {
"session_id": session_id,
"strategy": "css",
"selector": "button.submit"
})
# 4. Click an element
await client.call_tool("click_element", {
"session_id": session_id,
"element_id": elements["elements"][0]["element_id"]
})
# 5. Wait for navigation
await client.call_tool("wait_for_url", {
"session_id": session_id,
"pattern": "success"
})
# 6. Close session when done
await client.call_tool("close_session", {"session_id": session_id})
Error Handling
All tools return structured errors with:
error_code- Machine-readable code (e.g.,ELEMENT_NOT_FOUND)message- Human-readable descriptionsuggestion- Recovery hint for the AI
Example error:
{
"success": false,
"error": {
"code": "ELEMENT_NOT_FOUND",
"message": "Element not found: css=#missing",
"suggestion": "Verify the selector is correct. Use wait_for_selector before querying if the element loads dynamically."
}
}
Development
# Install dev dependencies
pip install -r requirements-dev.txt
# Run tests
pytest
# Lint
ruff check src/
License
MIT