parruda/headless-browser-tool
If you are the rightful owner of headless-browser-tool and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Headless Browser Tool is a powerful utility for controlling headless browsers using the Model Context Protocol (MCP) server, leveraging Capybara and Selenium for automation.
Headless Browser Tool
A headless browser control tool that provides an MCP (Model Context Protocol) server with tools to control a headless browser using Capybara and Selenium. Features multi-session support, session persistence, and both HTTP and stdio communication modes.
Features
- Headless Chrome browser automation - Full browser control via Selenium WebDriver
- MCP server with 40+ browser control tools - Comprehensive API for browser interactions
- Multi-session support - Isolated browser sessions for each client
- Session persistence - Sessions survive server restarts with cookies and state preservation
- Two server modes - HTTP server mode and stdio mode for different integration patterns
- Smart screenshot tools - With annotations, highlighting, and visual diff capabilities
- AI-assisted tools - Auto-narration and intelligent page analysis
- Comprehensive logging - Separate log files for stdio mode to avoid protocol interference
- Structured responses - All tools return rich, structured data instead of simple strings
- Smart element selectors - Tools returning multiple elements include selectors for each
Installation
Add this line to your application's Gemfile:
gem 'headless_browser_tool'
And then execute:
bundle install
Or install it yourself as:
gem install headless_browser_tool
Prerequisites
You need to have Chrome/Chromium browser installed on your system. The gem will use Chrome in headless mode by default.
Usage
Command Line Interface
The hbt command provides three main commands:
hbt start - Start HTTP Server Mode
Starts the MCP server as an HTTP server with SSE (Server-Sent Events) support:
hbt start [OPTIONS]
Options:
--port PORT- Port for the MCP server (default: 4567)--headless/--no-headless- Run browser in headless mode (default: true)--single-session- Use single shared browser session instead of multi-session mode--session-id SESSION_ID- Enable session persistence for single session mode (requires--single-session)--show-headers- Show HTTP request headers for debugging session issues
Examples:
# Start with default settings (multi-session, headless, port 4567)
hbt start
# Start in non-headless mode for debugging
hbt start --no-headless
# Start in single session mode (legacy compatibility)
hbt start --single-session
# Start in single session mode with persistence
hbt start --single-session --session-id my-app-session
# Start with request header logging
hbt start --show-headers
hbt stdio - Start Stdio Server Mode
Starts the MCP server in stdio mode for direct integration with tools that spawn subprocesses:
hbt stdio [OPTIONS]
Options:
--headless/--no-headless- Run browser in headless mode (default: true)
Notes:
- Always runs in single-session mode
- Logs to
.hbt/logs/PID.loginstead of stdout to avoid interfering with MCP protocol - Ideal for editor integrations and tools that communicate via stdin/stdout
- Supports optional session persistence via
HBT_SESSION_IDenvironment variable
Session Persistence in Stdio Mode:
You can enable session persistence by setting the HBT_SESSION_ID environment variable:
# First run - creates and saves session
HBT_SESSION_ID=my-editor-session hbt stdio
# Later run - restores previous session state
HBT_SESSION_ID=my-editor-session hbt stdio
When HBT_SESSION_ID is set:
- Session state is saved to
.hbt/sessions/{session_id}.jsonon exit - On startup, if the session file exists, it restores:
- Current URL
- Cookies
- localStorage
- sessionStorage
- Window size
This is useful for editor integrations that want to maintain browser state across multiple tool invocations.
Examples:
# Start in stdio mode (headless by default, no persistence)
hbt stdio
# Start with session persistence
HBT_SESSION_ID=vscode-session hbt stdio
# Start in stdio mode with visible browser
hbt stdio --no-headless
hbt version - Display Version
Shows the current version of HeadlessBrowserTool:
hbt version
Session Management
Multi-Session Mode (Default for HTTP Server)
In multi-session mode, each client connection gets its own isolated browser session with:
- Separate cookies and localStorage - Complete isolation between sessions
- Independent navigation history - Each session maintains its own browser state
- Session persistence - Sessions are saved to
.hbt/sessions/and restored on restart - Automatic cleanup - Idle sessions are closed after 30 minutes
- LRU eviction - When at capacity (10 sessions), least recently used sessions are closed
Session Identification in Multi-Session Mode:
For HTTP server mode, sessions require an X-Session-ID header:
# Connect with session ID "alice"
curl -H "X-Session-ID: alice" -H "Accept: text/event-stream" http://localhost:4567/
# Different session ID gets different browser
curl -H "X-Session-ID: bob" -H "Accept: text/event-stream" http://localhost:4567/
# Without X-Session-ID header, connection is rejected
curl -H "Accept: text/event-stream" http://localhost:4567/
# Returns: 400 Bad Request - X-Session-ID header is required
Session ID Requirements:
- Must be provided via
X-Session-IDheader - Can only contain alphanumeric characters, underscores, and hyphens
- Maximum length: 64 characters
- Invalid formats are rejected with 400 error
Single Session Mode
Use --single-session flag for legacy mode where all clients share one browser:
hbt start --single-session
Session Persistence in Single Session Mode:
You can enable session persistence with the --session-id flag:
# First run - creates and saves session
hbt start --single-session --session-id my-app
# Server restart - restores previous session
hbt start --single-session --session-id my-app
When --session-id is provided:
- Session state is saved to
.hbt/sessions/{session_id}.jsonon shutdown - On startup, if the session file exists, it restores browser state
- All clients share this single persistent session
- Compatible with stdio mode session files
This is useful for:
- Development servers that need to maintain login state
- Testing environments where you want consistent browser state
- Applications that don't need multi-user isolation
Note: The --session-id flag can only be used with --single-session. In multi-session mode, session IDs are provided by clients via headers.
Session Management Endpoints
View active sessions:
curl http://localhost:4567/sessions | jq
Response:
{
"active_sessions": ["alice", "bob"],
"session_count": 2,
"session_data": {
"alice": {
"created_at": "2024-01-20T10:00:00Z",
"last_activity": "2024-01-20T10:05:00Z",
"idle_time": 300.5
}
}
}
Close a specific session:
curl -X DELETE http://localhost:4567/sessions/alice
Directory Structure
HeadlessBrowserTool creates a .hbt/ directory with:
.hbt/
├── .gitignore # Contains "*" to ignore all contents
├── screenshots/ # Screenshot storage
├── sessions/ # Session persistence files
└── logs/ # Log files (stdio mode only)
└── PID.log # Process-specific log file
MCP API
The server implements the Model Context Protocol (MCP) and responds to JSON-RPC requests.
Using with MCP Clients
For HTTP mode with proper MCP clients:
# Start server
hbt start
# MCP client should:
# 1. Connect with X-Session-ID header
# 2. Use SSE endpoint for streaming: http://localhost:4567/mcp/sse
# 3. Send commands via JSON-RPC
For stdio mode:
# MCP client spawns the process directly
hbt stdio
# Communication happens via stdin/stdout
Available Browser Tools
All tools are available through the MCP protocol. Here's a complete reference:
Navigation Tools
| Tool | Description | Parameters | Returns |
|---|---|---|---|
visit | Navigate to a URL | url (required) | {url, current_url, title, status} |
refresh | Reload the current page | None | {url, title, changed, status} |
go_back | Navigate back in browser history | None | {navigation: {from, to, title, navigated}, status} |
go_forward | Navigate forward in browser history | None | {navigation: {from, to, title, navigated}, status} |
Element Interaction Tools
| Tool | Description | Parameters | Returns |
|---|---|---|---|
click | Click an element | selector (required) | {selector, element, navigation, status} |
right_click | Right-click an element | selector (required) | {selector, element, status} |
double_click | Double-click an element | selector (required) | {selector, element, status} |
hover | Hover mouse over element | selector (required) | {selector, element, status} |
drag | Drag element to target | source_selector, target_selector (required) | {source_selector, target_selector, source, target, status} |
Element Finding Tools
| Tool | Description | Parameters | Key Returns |
|---|---|---|---|
find_element | Find single element | selector (required) | Element details with attributes |
find_all | Find all matching elements | selector (required) | {elements: [{selector, tag_name, text, visible, attributes}]} |
find_elements_containing_text | Find elements with text | text (required), exact_match, case_sensitive, visible_only | {elements: [{selector, xpath, tag, text, clickable}]} |
get_text | Get element text | selector (required) | Text content string |
get_attribute | Get element attribute | selector, attribute (required) | Attribute value |
get_value | Get input value | selector (required) | Input value |
is_visible | Check element visibility | selector (required) | Boolean |
has_element | Check element exists | selector (required), wait | Boolean |
has_text | Check text exists | text (required), wait | Boolean |
Form Interaction Tools
| Tool | Description | Parameters | Key Returns |
|---|---|---|---|
fill_in | Fill input field | field, value (required) | {field, value, field_info, status} |
select | Select dropdown option | value, dropdown_selector (required) | {selected_value, selected_text, options: [{selector, value, text}]} |
check | Check checkbox | checkbox_selector (required) | {selector, was_checked, is_checked, element, status} |
uncheck | Uncheck checkbox | checkbox_selector (required) | {selector, was_checked, is_checked, element, status} |
choose | Select radio button | radio_button_selector (required) | {selector, radio, group: [{selector, value, checked}], status} |
attach_file | Upload file | file_field_selector, file_path (required) | {field_selector, file_path, file_name, file_size, field, status} |
click_button | Click button | button_text_or_selector (required) | {button, element, navigation, status} |
click_link | Click link | link_text_or_selector (required) | {link, element, navigation, status} |
Page Information Tools
| Tool | Description | Returns |
|---|---|---|
get_current_url | Get current URL | Full URL string |
get_current_path | Get current path | Path without domain |
get_page_title | Get page title | Title string |
get_page_source | Get HTML source | Full HTML |
get_page_context | Get page analysis | Structured page data |
Search Tools
| Tool | Description | Parameters |
|---|---|---|
search_page | Search visible content | query (required), case_sensitive, regex, context_lines, highlight |
search_source | Search HTML source | query (required), case_sensitive, regex, context_lines, show_line_numbers |
JavaScript Execution Tools
| Tool | Description | Parameters | Returns |
|---|---|---|---|
execute_script | Run JavaScript | javascript_code (required) | {javascript_code, execution_time, timestamp, status} |
evaluate_script | Run JS and return result | javascript_code (required) | Script return value |
Screenshot and Capture Tools
| Tool | Description | Parameters | Key Returns |
|---|---|---|---|
screenshot | Take screenshot | filename, highlight_selectors, annotate, full_page | {file_path, filename, file_size, timestamp, url, title} |
save_page | Save HTML to file | file_path (required) | {file_path, file_size, timestamp, url, title, status} |
Window Management Tools
| Tool | Description | Parameters | Key Returns |
|---|---|---|---|
switch_to_window | Switch to window/tab | window_handle (required) | {window_handle, previous_window, current_url, title, total_windows} |
open_new_window | Open new window/tab | None | {window_handle, total_windows, previous_windows, current_window} |
close_window | Close window/tab | window_handle (required) | {closed_window, was_current, remaining_windows, current_window} |
get_window_handles | Get all window handles | None | {current_window, windows: [{handle, index, is_current}], total_windows} |
maximize_window | Maximize window | None | {size_before: {width, height}, size_after: {width, height}, status} |
resize_window | Resize window | width, height (required) | {requested_size, size_before, size_after, status} |
Session Management Tools
| Tool | Description | Returns |
|---|---|---|
get_session_info | Get session information | Session details |
Smart Tools (experimental)
| Tool | Description | Parameters |
|---|---|---|
auto_narrate | Generate page description | focus_on |
get_narration_history | Get narration history | None |
visual_diff | Compare screenshots | before_path, after_path (required) |
Tool Response Structure
All tools now return structured data instead of simple strings. This makes it easier to:
- Extract specific information from responses
- Check operation success/failure
- Access element properties and metadata
- Navigate to specific elements using returned selectors
Example responses:
// visit tool response
{
"url": "https://example.com",
"current_url": "https://example.com/",
"title": "Example Domain",
"status": "success"
}
// find_all tool response with selectors
{
"selector": ".item",
"count": 3,
"elements": [
{
"index": 0,
"selector": ".item:nth-of-type(1)",
"tag_name": "div",
"text": "Item 1",
"visible": true,
"attributes": {"class": "item active"}
},
// ... more elements
]
}
// select tool response with option selectors
{
"dropdown_selector": "#country",
"selected_value": "US",
"selected_text": "United States",
"options": [
{
"selector": "#country option:nth-of-type(1)",
"value": "US",
"text": "United States",
"selected": true
},
// ... more options
],
"status": "selected"
}
Example Tool Calls
Here are examples using curl with the HTTP server:
# Navigate to a URL
curl -X POST http://localhost:4567/ \
-H "Content-Type: application/json" \
-H "X-Session-ID: alice" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/call",
"params": {"name": "visit", "arguments": {"url": "https://example.com"}}}'
# Take an annotated screenshot
curl -X POST http://localhost:4567/ \
-H "Content-Type: application/json" \
-H "X-Session-ID: alice" \
-d '{"jsonrpc": "2.0", "id": 2, "method": "tools/call",
"params": {"name": "screenshot",
"arguments": {"filename": "example",
"highlight_selectors": [".error", ".warning"],
"annotate": true,
"full_page": true}}}'
# Search page content with highlighting
curl -X POST http://localhost:4567/ \
-H "Content-Type: application/json" \
-H "X-Session-ID: alice" \
-d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call",
"params": {"name": "search_page",
"arguments": {"query": "error|warning",
"regex": true,
"highlight": true}}}'
Environment Variables
HBT_SINGLE_SESSION=true- Force single session mode in HTTP serverHBT_SHOW_HEADERS=true- Enable request header logging in HTTP serverHBT_SESSION_ID=<session_name>- Enable session persistence in stdio mode
Logging
- HTTP mode: Logs to stdout
- Stdio mode: Logs to
.hbt/logs/PID.logto avoid interfering with MCP protocol
Tool calls are logged with format:
INFO -- HBT: CALL: ToolName [] {args} -> result
ERROR -- HBT: ERROR: ToolName [] {args} -> error_message
Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt.
To install this gem onto your local machine, run bundle exec rake install.
Running Tests and Linting
# Run tests
rake test
# Run linter
rake rubocop
# Run linter with auto-fix
rake rubocop -A
# Run both tests and linter (default task)
rake
Recent Improvements
Version 0.1.0
- Structured tool responses - All tools now return rich JSON objects instead of simple strings
- Element selectors in arrays - Tools returning multiple elements include unique selectors for each
- Session persistence - Both stdio and single-session HTTP modes support persistent sessions
- Strict session management - Multi-session mode requires X-Session-ID header (no auto-creation)
- Improved logging - Fixed stdio mode logging to properly write to
.hbt/logs/PID.log - DRY refactoring - Extracted common functionality into
SessionPersistenceandDirectorySetupmodules - Better error handling - Tools return structured error information
- Enhanced tool responses:
- Navigation tools return before/after URLs and navigation status
- Form tools return element state before/after interaction
- Window tools return comprehensive window state information
- Screenshot tool returns file metadata
- All element-finding tools return complete element information
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/parruda/headless_browser_tool.