parruda/headless-browser-tool
If you are the rightful owner of headless-browser-tool and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Headless Browser Tool is a powerful utility for controlling headless browsers using the Model Context Protocol (MCP) server, leveraging Capybara and Selenium for automation.
visit
Navigate to a URL
click
Click an element
fill_in
Fill input field
screenshot
Take screenshot
execute_script
Run JavaScript
Headless Browser Tool
A headless browser control tool that provides an MCP (Model Context Protocol) server with tools to control a headless browser using Capybara and Selenium. Features multi-session support, session persistence, and both HTTP and stdio communication modes.
Features
- Headless Chrome browser automation - Full browser control via Selenium WebDriver
- MCP server with 40+ browser control tools - Comprehensive API for browser interactions
- Multi-session support - Isolated browser sessions for each client
- Session persistence - Sessions survive server restarts with cookies and state preservation
- Two server modes - HTTP server mode and stdio mode for different integration patterns
- Smart screenshot tools - With annotations, highlighting, and visual diff capabilities
- AI-assisted tools - Auto-narration and intelligent page analysis
- Comprehensive logging - Separate log files for stdio mode to avoid protocol interference
- Structured responses - All tools return rich, structured data instead of simple strings
- Smart element selectors - Tools returning multiple elements include selectors for each
Installation
Add this line to your application's Gemfile:
gem 'headless_browser_tool'
And then execute:
bundle install
Or install it yourself as:
gem install headless_browser_tool
Prerequisites
You need to have Chrome/Chromium browser installed on your system. The gem will use Chrome in headless mode by default.
Usage
Command Line Interface
The hbt
command provides three main commands:
hbt start
- Start HTTP Server Mode
Starts the MCP server as an HTTP server with SSE (Server-Sent Events) support:
hbt start [OPTIONS]
Options:
--port PORT
- Port for the MCP server (default: 4567)--headless
/--no-headless
- Run browser in headless mode (default: true)--single-session
- Use single shared browser session instead of multi-session mode--session-id SESSION_ID
- Enable session persistence for single session mode (requires--single-session
)--show-headers
- Show HTTP request headers for debugging session issues
Examples:
# Start with default settings (multi-session, headless, port 4567)
hbt start
# Start in non-headless mode for debugging
hbt start --no-headless
# Start in single session mode (legacy compatibility)
hbt start --single-session
# Start in single session mode with persistence
hbt start --single-session --session-id my-app-session
# Start with request header logging
hbt start --show-headers
hbt stdio
- Start Stdio Server Mode
Starts the MCP server in stdio mode for direct integration with tools that spawn subprocesses:
hbt stdio [OPTIONS]
Options:
--headless
/--no-headless
- Run browser in headless mode (default: true)
Notes:
- Always runs in single-session mode
- Logs to
.hbt/logs/PID.log
instead of stdout to avoid interfering with MCP protocol - Ideal for editor integrations and tools that communicate via stdin/stdout
- Supports optional session persistence via
HBT_SESSION_ID
environment variable
Session Persistence in Stdio Mode:
You can enable session persistence by setting the HBT_SESSION_ID
environment variable:
# First run - creates and saves session
HBT_SESSION_ID=my-editor-session hbt stdio
# Later run - restores previous session state
HBT_SESSION_ID=my-editor-session hbt stdio
When HBT_SESSION_ID
is set:
- Session state is saved to
.hbt/sessions/{session_id}.json
on exit - On startup, if the session file exists, it restores:
- Current URL
- Cookies
- localStorage
- sessionStorage
- Window size
This is useful for editor integrations that want to maintain browser state across multiple tool invocations.
Examples:
# Start in stdio mode (headless by default, no persistence)
hbt stdio
# Start with session persistence
HBT_SESSION_ID=vscode-session hbt stdio
# Start in stdio mode with visible browser
hbt stdio --no-headless
hbt version
- Display Version
Shows the current version of HeadlessBrowserTool:
hbt version
Session Management
Multi-Session Mode (Default for HTTP Server)
In multi-session mode, each client connection gets its own isolated browser session with:
- Separate cookies and localStorage - Complete isolation between sessions
- Independent navigation history - Each session maintains its own browser state
- Session persistence - Sessions are saved to
.hbt/sessions/
and restored on restart - Automatic cleanup - Idle sessions are closed after 30 minutes
- LRU eviction - When at capacity (10 sessions), least recently used sessions are closed
Session Identification in Multi-Session Mode:
For HTTP server mode, sessions require an X-Session-ID
header:
# Connect with session ID "alice"
curl -H "X-Session-ID: alice" -H "Accept: text/event-stream" http://localhost:4567/
# Different session ID gets different browser
curl -H "X-Session-ID: bob" -H "Accept: text/event-stream" http://localhost:4567/
# Without X-Session-ID header, connection is rejected
curl -H "Accept: text/event-stream" http://localhost:4567/
# Returns: 400 Bad Request - X-Session-ID header is required
Session ID Requirements:
- Must be provided via
X-Session-ID
header - Can only contain alphanumeric characters, underscores, and hyphens
- Maximum length: 64 characters
- Invalid formats are rejected with 400 error
Single Session Mode
Use --single-session
flag for legacy mode where all clients share one browser:
hbt start --single-session
Session Persistence in Single Session Mode:
You can enable session persistence with the --session-id
flag:
# First run - creates and saves session
hbt start --single-session --session-id my-app
# Server restart - restores previous session
hbt start --single-session --session-id my-app
When --session-id
is provided:
- Session state is saved to
.hbt/sessions/{session_id}.json
on shutdown - On startup, if the session file exists, it restores browser state
- All clients share this single persistent session
- Compatible with stdio mode session files
This is useful for:
- Development servers that need to maintain login state
- Testing environments where you want consistent browser state
- Applications that don't need multi-user isolation
Note: The --session-id
flag can only be used with --single-session
. In multi-session mode, session IDs are provided by clients via headers.
Session Management Endpoints
View active sessions:
curl http://localhost:4567/sessions | jq
Response:
{
"active_sessions": ["alice", "bob"],
"session_count": 2,
"session_data": {
"alice": {
"created_at": "2024-01-20T10:00:00Z",
"last_activity": "2024-01-20T10:05:00Z",
"idle_time": 300.5
}
}
}
Close a specific session:
curl -X DELETE http://localhost:4567/sessions/alice
Directory Structure
HeadlessBrowserTool creates a .hbt/
directory with:
.hbt/
āāā .gitignore # Contains "*" to ignore all contents
āāā screenshots/ # Screenshot storage
āāā sessions/ # Session persistence files
āāā logs/ # Log files (stdio mode only)
āāā PID.log # Process-specific log file
MCP API
The server implements the Model Context Protocol (MCP) and responds to JSON-RPC requests.
Using with MCP Clients
For HTTP mode with proper MCP clients:
# Start server
hbt start
# MCP client should:
# 1. Connect with X-Session-ID header
# 2. Use SSE endpoint for streaming: http://localhost:4567/mcp/sse
# 3. Send commands via JSON-RPC
For stdio mode:
# MCP client spawns the process directly
hbt stdio
# Communication happens via stdin/stdout
Available Browser Tools
All tools are available through the MCP protocol. Here's a complete reference:
Navigation Tools
Tool | Description | Parameters | Returns |
---|---|---|---|
visit | Navigate to a URL | url (required) | {url, current_url, title, status} |
refresh | Reload the current page | None | {url, title, changed, status} |
go_back | Navigate back in browser history | None | {navigation: {from, to, title, navigated}, status} |
go_forward | Navigate forward in browser history | None | {navigation: {from, to, title, navigated}, status} |
Element Interaction Tools
Tool | Description | Parameters | Returns |
---|---|---|---|
click | Click an element | selector (required) | {selector, element, navigation, status} |
right_click | Right-click an element | selector (required) | {selector, element, status} |
double_click | Double-click an element | selector (required) | {selector, element, status} |
hover | Hover mouse over element | selector (required) | {selector, element, status} |
drag | Drag element to target | source_selector , target_selector (required) | {source_selector, target_selector, source, target, status} |
Element Finding Tools
Tool | Description | Parameters | Key Returns |
---|---|---|---|
find_element | Find single element | selector (required) | Element details with attributes |
find_all | Find all matching elements | selector (required) | {elements: [{selector, tag_name, text, visible, attributes}]} |
find_elements_containing_text | Find elements with text | text (required), exact_match , case_sensitive , visible_only | {elements: [{selector, xpath, tag, text, clickable}]} |
get_text | Get element text | selector (required) | Text content string |
get_attribute | Get element attribute | selector , attribute (required) | Attribute value |
get_value | Get input value | selector (required) | Input value |
is_visible | Check element visibility | selector (required) | Boolean |
has_element | Check element exists | selector (required), wait | Boolean |
has_text | Check text exists | text (required), wait | Boolean |
Form Interaction Tools
Tool | Description | Parameters | Key Returns |
---|---|---|---|
fill_in | Fill input field | field , value (required) | {field, value, field_info, status} |
select | Select dropdown option | value , dropdown_selector (required) | {selected_value, selected_text, options: [{selector, value, text}]} |
check | Check checkbox | checkbox_selector (required) | {selector, was_checked, is_checked, element, status} |
uncheck | Uncheck checkbox | checkbox_selector (required) | {selector, was_checked, is_checked, element, status} |
choose | Select radio button | radio_button_selector (required) | {selector, radio, group: [{selector, value, checked}], status} |
attach_file | Upload file | file_field_selector , file_path (required) | {field_selector, file_path, file_name, file_size, field, status} |
click_button | Click button | button_text_or_selector (required) | {button, element, navigation, status} |
click_link | Click link | link_text_or_selector (required) | {link, element, navigation, status} |
Page Information Tools
Tool | Description | Returns |
---|---|---|
get_current_url | Get current URL | Full URL string |
get_current_path | Get current path | Path without domain |
get_page_title | Get page title | Title string |
get_page_source | Get HTML source | Full HTML |
get_page_context | Get page analysis | Structured page data |
Search Tools
Tool | Description | Parameters |
---|---|---|
search_page | Search visible content | query (required), case_sensitive , regex , context_lines , highlight |
search_source | Search HTML source | query (required), case_sensitive , regex , context_lines , show_line_numbers |
JavaScript Execution Tools
Tool | Description | Parameters | Returns |
---|---|---|---|
execute_script | Run JavaScript | javascript_code (required) | {javascript_code, execution_time, timestamp, status} |
evaluate_script | Run JS and return result | javascript_code (required) | Script return value |
Screenshot and Capture Tools
Tool | Description | Parameters | Key Returns |
---|---|---|---|
screenshot | Take screenshot | filename , highlight_selectors , annotate , full_page | {file_path, filename, file_size, timestamp, url, title} |
save_page | Save HTML to file | file_path (required) | {file_path, file_size, timestamp, url, title, status} |
Window Management Tools
Tool | Description | Parameters | Key Returns |
---|---|---|---|
switch_to_window | Switch to window/tab | window_handle (required) | {window_handle, previous_window, current_url, title, total_windows} |
open_new_window | Open new window/tab | None | {window_handle, total_windows, previous_windows, current_window} |
close_window | Close window/tab | window_handle (required) | {closed_window, was_current, remaining_windows, current_window} |
get_window_handles | Get all window handles | None | {current_window, windows: [{handle, index, is_current}], total_windows} |
maximize_window | Maximize window | None | {size_before: {width, height}, size_after: {width, height}, status} |
resize_window | Resize window | width , height (required) | {requested_size, size_before, size_after, status} |
Session Management Tools
Tool | Description | Returns |
---|---|---|
get_session_info | Get session information | Session details |
Smart Tools (experimental)
Tool | Description | Parameters |
---|---|---|
auto_narrate | Generate page description | focus_on |
get_narration_history | Get narration history | None |
visual_diff | Compare screenshots | before_path , after_path (required) |
Tool Response Structure
All tools now return structured data instead of simple strings. This makes it easier to:
- Extract specific information from responses
- Check operation success/failure
- Access element properties and metadata
- Navigate to specific elements using returned selectors
Example responses:
// visit tool response
{
"url": "https://example.com",
"current_url": "https://example.com/",
"title": "Example Domain",
"status": "success"
}
// find_all tool response with selectors
{
"selector": ".item",
"count": 3,
"elements": [
{
"index": 0,
"selector": ".item:nth-of-type(1)",
"tag_name": "div",
"text": "Item 1",
"visible": true,
"attributes": {"class": "item active"}
},
// ... more elements
]
}
// select tool response with option selectors
{
"dropdown_selector": "#country",
"selected_value": "US",
"selected_text": "United States",
"options": [
{
"selector": "#country option:nth-of-type(1)",
"value": "US",
"text": "United States",
"selected": true
},
// ... more options
],
"status": "selected"
}
Example Tool Calls
Here are examples using curl with the HTTP server:
# Navigate to a URL
curl -X POST http://localhost:4567/ \
-H "Content-Type: application/json" \
-H "X-Session-ID: alice" \
-d '{"jsonrpc": "2.0", "id": 1, "method": "tools/call",
"params": {"name": "visit", "arguments": {"url": "https://example.com"}}}'
# Take an annotated screenshot
curl -X POST http://localhost:4567/ \
-H "Content-Type: application/json" \
-H "X-Session-ID: alice" \
-d '{"jsonrpc": "2.0", "id": 2, "method": "tools/call",
"params": {"name": "screenshot",
"arguments": {"filename": "example",
"highlight_selectors": [".error", ".warning"],
"annotate": true,
"full_page": true}}}'
# Search page content with highlighting
curl -X POST http://localhost:4567/ \
-H "Content-Type: application/json" \
-H "X-Session-ID: alice" \
-d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call",
"params": {"name": "search_page",
"arguments": {"query": "error|warning",
"regex": true,
"highlight": true}}}'
Environment Variables
HBT_SINGLE_SESSION=true
- Force single session mode in HTTP serverHBT_SHOW_HEADERS=true
- Enable request header logging in HTTP serverHBT_SESSION_ID=<session_name>
- Enable session persistence in stdio mode
Logging
- HTTP mode: Logs to stdout
- Stdio mode: Logs to
.hbt/logs/PID.log
to avoid interfering with MCP protocol
Tool calls are logged with format:
INFO -- HBT: CALL: ToolName [] {args} -> result
ERROR -- HBT: ERROR: ToolName [] {args} -> error_message
Development
After checking out the repo, run bin/setup
to install dependencies. Then, run rake test
to run the tests. You can also run bin/console
for an interactive prompt.
To install this gem onto your local machine, run bundle exec rake install
.
Running Tests and Linting
# Run tests
rake test
# Run linter
rake rubocop
# Run linter with auto-fix
rake rubocop -A
# Run both tests and linter (default task)
rake
Recent Improvements
Version 0.1.0
- Structured tool responses - All tools now return rich JSON objects instead of simple strings
- Element selectors in arrays - Tools returning multiple elements include unique selectors for each
- Session persistence - Both stdio and single-session HTTP modes support persistent sessions
- Strict session management - Multi-session mode requires X-Session-ID header (no auto-creation)
- Improved logging - Fixed stdio mode logging to properly write to
.hbt/logs/PID.log
- DRY refactoring - Extracted common functionality into
SessionPersistence
andDirectorySetup
modules - Better error handling - Tools return structured error information
- Enhanced tool responses:
- Navigation tools return before/after URLs and navigation status
- Form tools return element state before/after interaction
- Window tools return comprehensive window state information
- Screenshot tool returns file metadata
- All element-finding tools return complete element information
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/parruda/headless_browser_tool.