ruby-webdriver-mcp by jpalley - MCP Server

Selenium WebDriver MCP Server

A containerized Model Context Protocol (MCP) server for controlling Selenium WebDriver from Large Language Models (LLMs) and AI agents. Run browser automation through a simple HTTP API.

Features

MCP Server Implementation: Full MCP server using FastMCP with HTTP/SSE transport
Selenium WebDriver Integration: Control Chrome, Firefox, or Edge browsers remotely
Docker Ready: Containerized deployment with docker-compose
Comprehensive Browser Control: Navigate, find elements, interact, execute JavaScript, and more
Console Logs: Access browser console logs for debugging
Health Checks: Built-in health endpoint for monitoring
Environment Configuration: Fully configurable via environment variables

Quick Start

Docker Deployment (Recommended)

Run the MCP server with Docker:

# Quick start with docker-compose (includes Selenium)
docker-compose -f docker-compose.production.yml up

# Or build and run manually
docker build -t selenium-mcp-server .
docker run -p 4443:4443 \
  -e SELENIUM_URL=http://your-selenium:4444/wd/hub \
  selenium-mcp-server

Test the server:

# Health check
curl http://localhost:4443/health

# List available tools
curl -X POST http://localhost:4443/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

Environment Variables:

SELENIUM_URL - Selenium server URL (default: http://selenium:4444/wd/hub)
SELENIUM_BROWSER - Browser type: chrome, firefox, edge (default: chrome)
SELENIUM_HEADLESS - Run headless: true/false (default: true)
PORT - HTTP server port (default: 4443)

See for detailed Docker deployment guide.

Development Setup

For local development without Docker:

# Install dependencies
bundle install

# Start Selenium (or use existing Selenium server)
docker run -d -p 4444:4444 selenium/standalone-chrome

# Start the MCP server
SELENIUM_URL=http://localhost:4444/wd/hub bundle exec rackup -p 4443

# Enable debug logging (see below for details)
DEBUG=true bundle exec rackup -p 4443

# Run tests
bundle exec rspec

Debug Logging:

Enable detailed MCP request/response logging with environment variables:

# Enable debug logging (shows all MCP requests, responses, and routing)
DEBUG=true bundle exec rackup

# Or set specific log level
LOG_LEVEL=DEBUG bundle exec rackup  # DEBUG, INFO, WARN, ERROR, FATAL

Debug output includes:

Server startup information (registered tools and resources)
Request routing (which paths go to MCP vs web app)
MCP JSON-RPC request/response details
Tool invocations and results

Web Dashboard

A modern web interface is available at the root path for monitoring sessions and viewing screenshot history.

Access: Open http://localhost:4443/ in your web browser

Features:

View all active browser sessions in real-time
Session details: current URL, page title, last activity
Capture and view screenshot history (up to 10 per session)
Auto-refresh every 5 seconds
Responsive design with Tailwind CSS

Screenshot Management:

Click "Capture Screenshot" to save current browser state
View screenshot history in grid layout
Click screenshots to view full-size
Automatically limited to 10 most recent screenshots per session

See for detailed documentation.

Configuration

Configuration is done via environment variables (for Docker) or programmatically (for development):

SeleniumWebdriverMcp.configure do |config|
  # URL of the Selenium WebDriver server
  config.selenium_url = ENV.fetch("SELENIUM_URL", "http://selenium:4444/wd/hub")

  # Browser to use (:chrome, :firefox, :edge)
  config.browser = :chrome

  # Run browser in headless mode
  config.headless = true

  # Timeouts (in seconds)
  config.implicit_wait = 10
  config.page_load_timeout = 30
  config.script_timeout = 30

  # Default window size [width, height]
  config.window_size = [1400, 1400]

  # Additional browser capabilities (optional)
  config.capabilities = {
    "goog:chromeOptions" => {
      "args" => ["--disable-gpu"]
    }
  }
end

Architecture

┌──────────────────┐     ┌──────────────────┐
│   Web Browser    │     │   MCP Client     │
│   (Dashboard)    │     │   (Claude, LLM)  │
└────────┬─────────┘     └────────┬─────────┘
         │ HTTP                   │ HTTP
         └────────────┬───────────┘
                      ▼
       ┌──────────────────────────────────┐
       │   MCP Server (Port 4443)         │
       │   - / (Web Dashboard)            │
       │   - /mcp (JSON-RPC)              │
       │   - /sse (Server Events)         │
       │   - /api/* (Dashboard API)       │
       │   - /screenshots/* (Images)      │
       │   - /health                      │
       └──────────────┬───────────────────┘
                      │ WebDriver
                      ▼
       ┌────────────────────────┐
       │   Selenium (Port 4444) │
       │   - Chrome Browser     │
       └────────────────────────┘

Session Management

IMPORTANT: All tools require a session_id parameter. You must create a session before using other tools.

Session Tools

create_session - Create a new browser session (returns a session_id)
- session_id (string, optional): Custom session ID (auto-generated if not provided)
destroy_session - Destroy a session and close the browser
- session_id (string, required): The session ID to destroy
list_sessions - List all active sessions

See for detailed information on handling multiple concurrent clients.

MCP Tools

The following MCP tools are available (all require session_id):

Navigation

navigate - Navigate to a URL
- session_id (string, required): The browser session ID
- url (string, required): The URL to navigate to

Element Finding

find_element - Find an element on the page
- session_id (string, required): The browser session ID
- strategy (string, required): Locator strategy (id, name, class, css, xpath, link_text, tag_name)
- value (string, required): The value to search for
wait_for_element - Wait for an element to appear
- session_id (string, required): The browser session ID
- strategy (string, required): Locator strategy
- value (string, required): The value to search for
- timeout (integer, optional): Maximum time to wait in seconds (default: 10)

Element Interaction

click_element - Click on an element
- session_id (string, required): The browser session ID
- element_id (string, required): Element ID from find_element
send_keys - Type text into an element
- session_id (string, required): The browser session ID
- element_id (string, required): Element ID from find_element
- text (string, required): Text to type

Element Information

get_text - Get the text content of an element
- session_id (string, required): The browser session ID
- element_id (string, required): Element ID from find_element
get_attribute - Get an attribute value from an element
- session_id (string, required): The browser session ID
- element_id (string, required): Element ID from find_element
- attribute_name (string, required): Name of the attribute

JavaScript Execution

execute_script - Execute JavaScript in the browser
- session_id (string, required): The browser session ID
- script (string, required): JavaScript code to execute

Page Information

get_page_source - Get the HTML source of the current page
- session_id (string, required): The browser session ID
get_console_logs - Get browser console logs (Chrome/JS console output)
- session_id (string, required): The browser session ID

Screenshot Management

save_screenshot - Capture a screenshot and save it to history (max 10 per session)
- session_id (string, required): The browser session ID
- Returns: {saved: true, timestamp, url, screenshot_url, message, total_screenshots}
- The screenshot_url is a full URL (e.g., http://localhost:4443/screenshots/session_id/timestamp.png) that reflects the host of the incoming request

Note: Current screenshots are also accessible via MCP Resources or HTTP endpoint (see below)

Frame Switching

switch_frame - Switch to a frame or iframe
- session_id (string, required): The browser session ID
- frame_reference: Frame index (integer), frame name (string), element_id, 'default', or 'parent'

MCP Resources

The following MCP resources are available:

Non-Templated Resources (accessible via `resources/list`)

browser://current_url - The current URL of the browser
browser://page_title - The title of the current page
browser://session_status - WebDriver session status and information

Templated Resources (accessed by URI)

screenshot://{session_id} - PNG screenshot of the browser window for a session

Accessing Screenshots

Screenshots can be accessed in two ways:

1. Via MCP Resource Protocol:

{"method": "resources/read", "params": {"uri": "screenshot://session_abc123"}}

Returns base64-encoded PNG in a blob field with mimeType: "image/png"

2. Via Direct HTTP Endpoint:

curl http://localhost:4443/screenshots/session_abc123 > screenshot.png

Returns raw PNG binary data with Content-Type: image/png header

Usage Example

Here's an example of how an LLM might use these tools via MCP:

Create a session:
```
{"tool": "create_session", "arguments": {}}
```
Returns: {"session_id": "session_abc123", ...}

Navigate to a website:

{"tool": "navigate", "arguments": {"session_id": "session_abc123", "url": "https://example.com"}}

Find a search input:

{"tool": "find_element", "arguments": {"session_id": "session_abc123", "strategy": "name", "value": "q"}}

Returns: {"element_id": "element_0", "tag_name": "input", ...}

Type into the search input:

{"tool": "send_keys", "arguments": {"session_id": "session_abc123", "element_id": "element_0", "text": "hello world"}}

Find and click the search button:

{"tool": "find_element", "arguments": {"session_id": "session_abc123", "strategy": "css", "value": "button[type=submit]"}}

Returns: {"element_id": "element_1", ...}

{"tool": "click_element", "arguments": {"session_id": "session_abc123", "element_id": "element_1"}}

Get console logs:

{"tool": "get_console_logs", "arguments": {"session_id": "session_abc123"}}

Take a screenshot (via resource):

{"method": "resources/read", "params": {"uri": "screenshot://session_abc123"}}

Or via HTTP: GET /screenshots/session_abc123

Clean up:

{"tool": "destroy_session", "arguments": {"session_id": "session_abc123"}}

Development

After checking out the repo, run bundle install to install dependencies.

To run tests (when implemented):

bundle exec rspec

To run RuboCop:

bundle exec rubocop

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/brainpage/ruby-webdriver-mcp.

License

The gem is available as open source under the terms of the MIT License.

Security

This gem provides programmatic control of a web browser. When exposing this via MCP:

Only expose the MCP server to trusted LLM clients
Be cautious about which websites you allow navigation to
Consider implementing authentication for the MCP endpoints
Use in isolated/sandboxed environments when possible
Review and validate any JavaScript executed via execute_script

Credits

Built with:

FastMCP - Ruby implementation of the Model Context Protocol
Selenium WebDriver - Browser automation framework