jpalley/ruby-webdriver-mcp
If you are the rightful owner of ruby-webdriver-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Selenium WebDriver MCP Server is a containerized server designed to control Selenium WebDriver from Large Language Models (LLMs) and AI agents, enabling browser automation through a simple HTTP API.
Selenium WebDriver MCP Server
A containerized Model Context Protocol (MCP) server for controlling Selenium WebDriver from Large Language Models (LLMs) and AI agents. Run browser automation through a simple HTTP API.
Features
- MCP Server Implementation: Full MCP server using FastMCP with HTTP/SSE transport
- Selenium WebDriver Integration: Control Chrome, Firefox, or Edge browsers remotely
- Docker Ready: Containerized deployment with docker-compose
- Comprehensive Browser Control: Navigate, find elements, interact, execute JavaScript, and more
- Console Logs: Access browser console logs for debugging
- Health Checks: Built-in health endpoint for monitoring
- Environment Configuration: Fully configurable via environment variables
Quick Start
Docker Deployment (Recommended)
Run the MCP server with Docker:
# Quick start with docker-compose (includes Selenium)
docker-compose -f docker-compose.production.yml up
# Or build and run manually
docker build -t selenium-mcp-server .
docker run -p 4443:4443 \
-e SELENIUM_URL=http://your-selenium:4444/wd/hub \
selenium-mcp-server
Test the server:
# Health check
curl http://localhost:4443/health
# List available tools
curl -X POST http://localhost:4443/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'
Environment Variables:
SELENIUM_URL- Selenium server URL (default:http://selenium:4444/wd/hub)SELENIUM_BROWSER- Browser type: chrome, firefox, edge (default:chrome)SELENIUM_HEADLESS- Run headless: true/false (default:true)PORT- HTTP server port (default:4443)
See for detailed Docker deployment guide.
Development Setup
For local development without Docker:
# Install dependencies
bundle install
# Start Selenium (or use existing Selenium server)
docker run -d -p 4444:4444 selenium/standalone-chrome
# Start the MCP server
SELENIUM_URL=http://localhost:4444/wd/hub bundle exec rackup -p 4443
# Enable debug logging (see below for details)
DEBUG=true bundle exec rackup -p 4443
# Run tests
bundle exec rspec
Debug Logging:
Enable detailed MCP request/response logging with environment variables:
# Enable debug logging (shows all MCP requests, responses, and routing)
DEBUG=true bundle exec rackup
# Or set specific log level
LOG_LEVEL=DEBUG bundle exec rackup # DEBUG, INFO, WARN, ERROR, FATAL
Debug output includes:
- Server startup information (registered tools and resources)
- Request routing (which paths go to MCP vs web app)
- MCP JSON-RPC request/response details
- Tool invocations and results
Web Dashboard
A modern web interface is available at the root path for monitoring sessions and viewing screenshot history.
Access: Open http://localhost:4443/ in your web browser
Features:
- View all active browser sessions in real-time
- Session details: current URL, page title, last activity
- Capture and view screenshot history (up to 10 per session)
- Auto-refresh every 5 seconds
- Responsive design with Tailwind CSS
Screenshot Management:
- Click "Capture Screenshot" to save current browser state
- View screenshot history in grid layout
- Click screenshots to view full-size
- Automatically limited to 10 most recent screenshots per session
See for detailed documentation.
Configuration
Configuration is done via environment variables (for Docker) or programmatically (for development):
SeleniumWebdriverMcp.configure do |config|
# URL of the Selenium WebDriver server
config.selenium_url = ENV.fetch("SELENIUM_URL", "http://selenium:4444/wd/hub")
# Browser to use (:chrome, :firefox, :edge)
config.browser = :chrome
# Run browser in headless mode
config.headless = true
# Timeouts (in seconds)
config.implicit_wait = 10
config.page_load_timeout = 30
config.script_timeout = 30
# Default window size [width, height]
config.window_size = [1400, 1400]
# Additional browser capabilities (optional)
config.capabilities = {
"goog:chromeOptions" => {
"args" => ["--disable-gpu"]
}
}
end
Architecture
┌──────────────────┐ ┌──────────────────┐
│ Web Browser │ │ MCP Client │
│ (Dashboard) │ │ (Claude, LLM) │
└────────┬─────────┘ └────────┬─────────┘
│ HTTP │ HTTP
└────────────┬───────────┘
▼
┌──────────────────────────────────┐
│ MCP Server (Port 4443) │
│ - / (Web Dashboard) │
│ - /mcp (JSON-RPC) │
│ - /sse (Server Events) │
│ - /api/* (Dashboard API) │
│ - /screenshots/* (Images) │
│ - /health │
└──────────────┬───────────────────┘
│ WebDriver
▼
┌────────────────────────┐
│ Selenium (Port 4444) │
│ - Chrome Browser │
└────────────────────────┘
Session Management
IMPORTANT: All tools require a session_id parameter. You must create a session before using other tools.
Session Tools
create_session- Create a new browser session (returns a session_id)session_id(string, optional): Custom session ID (auto-generated if not provided)
destroy_session- Destroy a session and close the browsersession_id(string, required): The session ID to destroy
list_sessions- List all active sessions
See for detailed information on handling multiple concurrent clients.
MCP Tools
The following MCP tools are available (all require session_id):
Navigation
navigate- Navigate to a URLsession_id(string, required): The browser session IDurl(string, required): The URL to navigate to
Element Finding
find_element- Find an element on the pagesession_id(string, required): The browser session IDstrategy(string, required): Locator strategy (id, name, class, css, xpath, link_text, tag_name)value(string, required): The value to search for
wait_for_element- Wait for an element to appearsession_id(string, required): The browser session IDstrategy(string, required): Locator strategyvalue(string, required): The value to search fortimeout(integer, optional): Maximum time to wait in seconds (default: 10)
Element Interaction
click_element- Click on an elementsession_id(string, required): The browser session IDelement_id(string, required): Element ID from find_element
send_keys- Type text into an elementsession_id(string, required): The browser session IDelement_id(string, required): Element ID from find_elementtext(string, required): Text to type
Element Information
get_text- Get the text content of an elementsession_id(string, required): The browser session IDelement_id(string, required): Element ID from find_element
get_attribute- Get an attribute value from an elementsession_id(string, required): The browser session IDelement_id(string, required): Element ID from find_elementattribute_name(string, required): Name of the attribute
JavaScript Execution
execute_script- Execute JavaScript in the browsersession_id(string, required): The browser session IDscript(string, required): JavaScript code to execute
Page Information
get_page_source- Get the HTML source of the current pagesession_id(string, required): The browser session ID
get_console_logs- Get browser console logs (Chrome/JS console output)session_id(string, required): The browser session ID
Screenshot Management
save_screenshot- Capture a screenshot and save it to history (max 10 per session)session_id(string, required): The browser session ID- Returns:
{saved: true, timestamp, url, screenshot_url, message, total_screenshots} - The
screenshot_urlis a full URL (e.g.,http://localhost:4443/screenshots/session_id/timestamp.png) that reflects the host of the incoming request
Note: Current screenshots are also accessible via MCP Resources or HTTP endpoint (see below)
Frame Switching
switch_frame- Switch to a frame or iframesession_id(string, required): The browser session IDframe_reference: Frame index (integer), frame name (string), element_id, 'default', or 'parent'
MCP Resources
The following MCP resources are available:
Non-Templated Resources (accessible via resources/list)
browser://current_url- The current URL of the browserbrowser://page_title- The title of the current pagebrowser://session_status- WebDriver session status and information
Templated Resources (accessed by URI)
screenshot://{session_id}- PNG screenshot of the browser window for a session
Accessing Screenshots
Screenshots can be accessed in two ways:
1. Via MCP Resource Protocol:
{"method": "resources/read", "params": {"uri": "screenshot://session_abc123"}}
Returns base64-encoded PNG in a blob field with mimeType: "image/png"
2. Via Direct HTTP Endpoint:
curl http://localhost:4443/screenshots/session_abc123 > screenshot.png
Returns raw PNG binary data with Content-Type: image/png header
Usage Example
Here's an example of how an LLM might use these tools via MCP:
-
Create a session:
{"tool": "create_session", "arguments": {}}Returns:
{"session_id": "session_abc123", ...} -
Navigate to a website:
{"tool": "navigate", "arguments": {"session_id": "session_abc123", "url": "https://example.com"}} -
Find a search input:
{"tool": "find_element", "arguments": {"session_id": "session_abc123", "strategy": "name", "value": "q"}}Returns:
{"element_id": "element_0", "tag_name": "input", ...} -
Type into the search input:
{"tool": "send_keys", "arguments": {"session_id": "session_abc123", "element_id": "element_0", "text": "hello world"}} -
Find and click the search button:
{"tool": "find_element", "arguments": {"session_id": "session_abc123", "strategy": "css", "value": "button[type=submit]"}}Returns:
{"element_id": "element_1", ...}{"tool": "click_element", "arguments": {"session_id": "session_abc123", "element_id": "element_1"}} -
Get console logs:
{"tool": "get_console_logs", "arguments": {"session_id": "session_abc123"}} -
Take a screenshot (via resource):
{"method": "resources/read", "params": {"uri": "screenshot://session_abc123"}}Or via HTTP:
GET /screenshots/session_abc123 -
Clean up:
{"tool": "destroy_session", "arguments": {"session_id": "session_abc123"}}
Development
After checking out the repo, run bundle install to install dependencies.
To run tests (when implemented):
bundle exec rspec
To run RuboCop:
bundle exec rubocop
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/brainpage/ruby-webdriver-mcp.
License
The gem is available as open source under the terms of the MIT License.
Security
This gem provides programmatic control of a web browser. When exposing this via MCP:
- Only expose the MCP server to trusted LLM clients
- Be cautious about which websites you allow navigation to
- Consider implementing authentication for the MCP endpoints
- Use in isolated/sandboxed environments when possible
- Review and validate any JavaScript executed via
execute_script
Credits
Built with:
- FastMCP - Ruby implementation of the Model Context Protocol
- Selenium WebDriver - Browser automation framework