selenium-mcp-server

mlamoure/selenium-mcp-server

3.2

If you are the rightful owner of selenium-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Selenium MCP Server is a Model Context Protocol server that enables browser automation through Selenium Grid, allowing AI agents to control web browsers without writing direct Selenium scripts.

Tools
5
Resources
0
Prompts
0

Selenium MCP Server

A Model Context Protocol (MCP) server that provides browser automation capabilities via Selenium Grid. AI agents can drive web browsers step-by-step using MCP tools instead of writing Selenium scripts directly.

Architecture

Built with FastMCP 2.0 using Streamable HTTP transport for self-hosted deployment. Deploy once on your infrastructure and connect from any MCP client remotely.

  • Streamable HTTP - Native HTTP transport with streaming support
  • Selenium Grid Backend - Separates browser instances from the MCP server for independent scaling
  • Centralized Endpoint - Single server serves multiple AI agents/clients

Features

  • 35+ MCP Tools for complete browser automation
  • Session Management with automatic timeout and cleanup
  • Element Registry for stable element references across tool calls
  • Domain Guardrails to restrict navigation to allowed domains
  • Structured Errors with helpful suggestions for AI recovery
  • Docker Support with Selenium Grid included

Quick Start

Deploy from Container Registry (Recommended)

Deploy directly without cloning the repository. Create a docker-compose.yml:

services:
  selenium-mcp:
    image: ghcr.io/mlamoure/selenium-mcp-server:latest
    ports:
      - "8000:8000"
    environment:
      - SELENIUM_MCP_SELENIUM_GRID_URL=http://selenium-hub:4444
    depends_on:
      selenium-hub:
        condition: service_healthy
    restart: unless-stopped

  selenium-hub:
    image: selenium/hub:4.25
    ports:
      - "4444:4444"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4444/status"]
      interval: 10s
      timeout: 5s
      retries: 5

  chrome-node:
    image: selenium/node-chrome:4.25
    shm_size: 2g
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_MAX_SESSIONS=4
    depends_on:
      selenium-hub:
        condition: service_healthy
docker-compose up -d

# MCP server: http://localhost:8000/mcp/
# Selenium Grid UI: http://localhost:4444

Using Docker Compose (From Source)

git clone https://github.com/mlamoure/selenium-mcp-server.git
cd selenium-mcp-server
docker-compose up -d

# MCP server: http://localhost:8000/mcp/
# Selenium Grid UI: http://localhost:4444

Manual Installation

# Requires Python 3.12+ and a running Selenium Grid
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Set Grid URL and run
export SELENIUM_MCP_SELENIUM_GRID_URL=http://localhost:4444
python -m selenium_mcp

Tool Categories

Session Management

  • create_session - Create a new browser session (with optional video recording)
  • close_session - Close a browser session
  • get_session_info - Get detailed session info including recording status
  • list_sessions - List all active sessions
  • ping - Health check

Navigation

  • navigate - Go to a URL
  • reload_page - Refresh the current page
  • navigate_back / navigate_forward - Browser history
  • get_page_info - Get current URL, title, and ready state

Observation

  • get_dom - Get page HTML (with optional script/style stripping)
  • get_visible_text - Get all visible text content
  • query_elements - Find elements and get stable IDs
  • get_screenshot - Capture page screenshot
  • get_console_logs - Get browser console logs

Actions

  • click_element / click_selector - Click elements
  • type_text - Type into input fields
  • clear_element - Clear input content
  • set_checkbox_state - Check/uncheck checkboxes
  • select_dropdown_option - Select from dropdowns
  • scroll_to_element / scroll_by - Scroll the page
  • hover_element - Mouse hover
  • drag_drop - Drag and drop
  • send_keys - Send keyboard keys
  • upload_file - Upload files

Waits

  • wait_for_selector - Wait for element (exists/visible/clickable/hidden)
  • wait_for_url - Wait for URL to match pattern
  • wait_for_ready_state - Wait for document ready state
  • wait_for_text - Wait for text to appear

JavaScript

  • execute_script - Run synchronous JavaScript
  • execute_async_script - Run asynchronous JavaScript

Configuration

Configure via environment variables (prefix: SELENIUM_MCP_):

VariableDescriptionDefault
SELENIUM_GRID_URLSelenium Grid endpointhttp://localhost:4444
DEFAULT_BROWSERDefault browser typechrome
MAX_CONCURRENT_SESSIONSMaximum simultaneous sessions10
SESSION_MAX_LIFETIME_SECONDSSession auto-close timeout900
SESSION_MAX_IDLE_SECONDSIdle session timeout300
RECORDING_DEFAULTDefault video recording statetrue
RECORDING_FORCEForce recording setting (ignore client)false
ALLOWED_DOMAINSComma-separated allowed domains(all allowed)
DOM_MAX_CHARSMax chars for get_dom20000
DEFAULT_WAIT_TIMEOUT_MSDefault wait timeout10000
HOSTServer bind address0.0.0.0
PORTServer port8000

Video Recording

When using Selenium Grid with video recording sidecars, sessions can be recorded for debugging and review. Recording is controlled via the se:recordVideo Selenium capability.

Configuration

  • RECORDING_DEFAULT - When true (default), sessions record video unless explicitly disabled
  • RECORDING_FORCE - When true, ignores client preferences and uses RECORDING_DEFAULT for all sessions

Usage

# Create session with recording (uses server default if not specified)
result = await client.call_tool("create_session", {
    "browser": "chrome",
    "record_video": True  # Optional: explicitly enable/disable
})

# Response includes recording info
# {
#   "session_id": "sess_abc123",
#   "record_video": true,
#   "selenium_session_id": "abc123-def456-..."  # Grid session ID for video retrieval
# }

Video Retrieval

The selenium_session_id in the response is the Selenium Grid's internal session ID. Use this ID to locate recorded videos in your Grid's video storage. The exact retrieval method depends on your Grid configuration.

Example Usage

# Typical AI agent workflow:

# 1. Create a session (recording enabled by default)
result = await client.call_tool("create_session", {"browser": "chrome"})
session_id = result["session_id"]

# 2. Navigate to a page
await client.call_tool("navigate", {
    "session_id": session_id,
    "url": "https://example.com"
})

# 3. Find elements
elements = await client.call_tool("query_elements", {
    "session_id": session_id,
    "strategy": "css",
    "selector": "button.submit"
})

# 4. Click an element
await client.call_tool("click_element", {
    "session_id": session_id,
    "element_id": elements["elements"][0]["element_id"]
})

# 5. Wait for navigation
await client.call_tool("wait_for_url", {
    "session_id": session_id,
    "pattern": "success"
})

# 6. Close session when done
await client.call_tool("close_session", {"session_id": session_id})

Error Handling

All tools return structured errors with:

  • error_code - Machine-readable code (e.g., ELEMENT_NOT_FOUND)
  • message - Human-readable description
  • suggestion - Recovery hint for the AI

Example error:

{
  "success": false,
  "error": {
    "code": "ELEMENT_NOT_FOUND",
    "message": "Element not found: css=#missing",
    "suggestion": "Verify the selector is correct. Use wait_for_selector before querying if the element loads dynamically."
  }
}

Development

# Install dev dependencies
pip install -r requirements-dev.txt

# Run tests
pytest

# Lint
ruff check src/

License

MIT