headless-browser-tool

parruda/headless-browser-tool

3.3

If you are the rightful owner of headless-browser-tool and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Headless Browser Tool is a powerful utility for controlling headless browsers using the Model Context Protocol (MCP) server, leveraging Capybara and Selenium for automation.

Tools
  1. visit

    Navigate to a URL

  2. click

    Click an element

  3. fill_in

    Fill input field

  4. screenshot

    Take screenshot

  5. execute_script

    Run JavaScript

Headless Browser Tool

A headless browser control tool that provides an MCP (Model Context Protocol) server with tools to control a headless browser using Capybara and Selenium. Features multi-session support, session persistence, and both HTTP and stdio communication modes.

Features

  • Headless Chrome browser automation - Full browser control via Selenium WebDriver
  • MCP server with 40+ browser control tools - Comprehensive API for browser interactions
  • Multi-session support - Isolated browser sessions for each client
  • Session persistence - Sessions survive server restarts with cookies and state preservation
  • Two server modes - HTTP server mode and stdio mode for different integration patterns
  • Smart screenshot tools - With annotations, highlighting, and visual diff capabilities
  • AI-assisted tools - Auto-narration and intelligent page analysis
  • Comprehensive logging - Separate log files for stdio mode to avoid protocol interference
  • Structured responses - All tools return rich, structured data instead of simple strings
  • Smart element selectors - Tools returning multiple elements include selectors for each

Installation

Add this line to your application's Gemfile:

gem 'headless_browser_tool'

And then execute:

bundle install

Or install it yourself as:

gem install headless_browser_tool

Prerequisites

You need to have Chrome/Chromium browser installed on your system. The gem will use Chrome in headless mode by default.

Usage

Command Line Interface

The hbt command provides three main commands:

hbt start - Start HTTP Server Mode

Starts the MCP server as an HTTP server with SSE (Server-Sent Events) support:

hbt start [OPTIONS]

Options:

  • --port PORT - Port for the MCP server (default: 4567)
  • --headless / --no-headless - Run browser in headless mode (default: true)
  • --single-session - Use single shared browser session instead of multi-session mode
  • --session-id SESSION_ID - Enable session persistence for single session mode (requires --single-session)
  • --show-headers - Show HTTP request headers for debugging session issues

Examples:

# Start with default settings (multi-session, headless, port 4567)
hbt start

# Start in non-headless mode for debugging
hbt start --no-headless

# Start in single session mode (legacy compatibility)
hbt start --single-session

# Start in single session mode with persistence
hbt start --single-session --session-id my-app-session

# Start with request header logging
hbt start --show-headers
hbt stdio - Start Stdio Server Mode

Starts the MCP server in stdio mode for direct integration with tools that spawn subprocesses:

hbt stdio [OPTIONS]

Options:

  • --headless / --no-headless - Run browser in headless mode (default: true)

Notes:

  • Always runs in single-session mode
  • Logs to .hbt/logs/PID.log instead of stdout to avoid interfering with MCP protocol
  • Ideal for editor integrations and tools that communicate via stdin/stdout
  • Supports optional session persistence via HBT_SESSION_ID environment variable

Session Persistence in Stdio Mode:

You can enable session persistence by setting the HBT_SESSION_ID environment variable:

# First run - creates and saves session
HBT_SESSION_ID=my-editor-session hbt stdio

# Later run - restores previous session state
HBT_SESSION_ID=my-editor-session hbt stdio

When HBT_SESSION_ID is set:

  • Session state is saved to .hbt/sessions/{session_id}.json on exit
  • On startup, if the session file exists, it restores:
    • Current URL
    • Cookies
    • localStorage
    • sessionStorage
    • Window size

This is useful for editor integrations that want to maintain browser state across multiple tool invocations.

Examples:

# Start in stdio mode (headless by default, no persistence)
hbt stdio

# Start with session persistence
HBT_SESSION_ID=vscode-session hbt stdio

# Start in stdio mode with visible browser
hbt stdio --no-headless
hbt version - Display Version

Shows the current version of HeadlessBrowserTool:

hbt version

Session Management

Multi-Session Mode (Default for HTTP Server)

In multi-session mode, each client connection gets its own isolated browser session with:

  • Separate cookies and localStorage - Complete isolation between sessions
  • Independent navigation history - Each session maintains its own browser state
  • Session persistence - Sessions are saved to .hbt/sessions/ and restored on restart
  • Automatic cleanup - Idle sessions are closed after 30 minutes
  • LRU eviction - When at capacity (10 sessions), least recently used sessions are closed

Session Identification in Multi-Session Mode:

For HTTP server mode, sessions require an X-Session-ID header:

# Connect with session ID "alice"
curl -H "X-Session-ID: alice" -H "Accept: text/event-stream" http://localhost:4567/

# Different session ID gets different browser
curl -H "X-Session-ID: bob" -H "Accept: text/event-stream" http://localhost:4567/

# Without X-Session-ID header, connection is rejected
curl -H "Accept: text/event-stream" http://localhost:4567/
# Returns: 400 Bad Request - X-Session-ID header is required

Session ID Requirements:

  • Must be provided via X-Session-ID header
  • Can only contain alphanumeric characters, underscores, and hyphens
  • Maximum length: 64 characters
  • Invalid formats are rejected with 400 error
Single Session Mode

Use --single-session flag for legacy mode where all clients share one browser:

hbt start --single-session

Session Persistence in Single Session Mode:

You can enable session persistence with the --session-id flag:

# First run - creates and saves session
hbt start --single-session --session-id my-app

# Server restart - restores previous session
hbt start --single-session --session-id my-app

When --session-id is provided:

  • Session state is saved to .hbt/sessions/{session_id}.json on shutdown
  • On startup, if the session file exists, it restores browser state
  • All clients share this single persistent session
  • Compatible with stdio mode session files

This is useful for:

  • Development servers that need to maintain login state
  • Testing environments where you want consistent browser state
  • Applications that don't need multi-user isolation

Note: The --session-id flag can only be used with --single-session. In multi-session mode, session IDs are provided by clients via headers.

Session Management Endpoints

View active sessions:

curl http://localhost:4567/sessions | jq

Response:

{
  "active_sessions": ["alice", "bob"],
  "session_count": 2,
  "session_data": {
    "alice": {
      "created_at": "2024-01-20T10:00:00Z",
      "last_activity": "2024-01-20T10:05:00Z",
      "idle_time": 300.5
    }
  }
}

Close a specific session:

curl -X DELETE http://localhost:4567/sessions/alice

Directory Structure

HeadlessBrowserTool creates a .hbt/ directory with:

.hbt/
ā”œā”€ā”€ .gitignore      # Contains "*" to ignore all contents
ā”œā”€ā”€ screenshots/    # Screenshot storage
ā”œā”€ā”€ sessions/       # Session persistence files
└── logs/          # Log files (stdio mode only)
    └── PID.log    # Process-specific log file

MCP API

The server implements the Model Context Protocol (MCP) and responds to JSON-RPC requests.

Using with MCP Clients

For HTTP mode with proper MCP clients:

# Start server
hbt start

# MCP client should:
# 1. Connect with X-Session-ID header
# 2. Use SSE endpoint for streaming: http://localhost:4567/mcp/sse
# 3. Send commands via JSON-RPC

For stdio mode:

# MCP client spawns the process directly
hbt stdio
# Communication happens via stdin/stdout

Available Browser Tools

All tools are available through the MCP protocol. Here's a complete reference:

Navigation Tools
ToolDescriptionParametersReturns
visitNavigate to a URLurl (required){url, current_url, title, status}
refreshReload the current pageNone{url, title, changed, status}
go_backNavigate back in browser historyNone{navigation: {from, to, title, navigated}, status}
go_forwardNavigate forward in browser historyNone{navigation: {from, to, title, navigated}, status}
Element Interaction Tools
ToolDescriptionParametersReturns
clickClick an elementselector (required){selector, element, navigation, status}
right_clickRight-click an elementselector (required){selector, element, status}
double_clickDouble-click an elementselector (required){selector, element, status}
hoverHover mouse over elementselector (required){selector, element, status}
dragDrag element to targetsource_selector, target_selector (required){source_selector, target_selector, source, target, status}
Element Finding Tools
ToolDescriptionParametersKey Returns
find_elementFind single elementselector (required)Element details with attributes
find_allFind all matching elementsselector (required){elements: [{selector, tag_name, text, visible, attributes}]}
find_elements_containing_textFind elements with texttext (required), exact_match, case_sensitive, visible_only{elements: [{selector, xpath, tag, text, clickable}]}
get_textGet element textselector (required)Text content string
get_attributeGet element attributeselector, attribute (required)Attribute value
get_valueGet input valueselector (required)Input value
is_visibleCheck element visibilityselector (required)Boolean
has_elementCheck element existsselector (required), waitBoolean
has_textCheck text existstext (required), waitBoolean
Form Interaction Tools
ToolDescriptionParametersKey Returns
fill_inFill input fieldfield, value (required){field, value, field_info, status}
selectSelect dropdown optionvalue, dropdown_selector (required){selected_value, selected_text, options: [{selector, value, text}]}
checkCheck checkboxcheckbox_selector (required){selector, was_checked, is_checked, element, status}
uncheckUncheck checkboxcheckbox_selector (required){selector, was_checked, is_checked, element, status}
chooseSelect radio buttonradio_button_selector (required){selector, radio, group: [{selector, value, checked}], status}
attach_fileUpload filefile_field_selector, file_path (required){field_selector, file_path, file_name, file_size, field, status}
click_buttonClick buttonbutton_text_or_selector (required){button, element, navigation, status}
click_linkClick linklink_text_or_selector (required){link, element, navigation, status}
Page Information Tools
ToolDescriptionReturns
get_current_urlGet current URLFull URL string
get_current_pathGet current pathPath without domain
get_page_titleGet page titleTitle string
get_page_sourceGet HTML sourceFull HTML
get_page_contextGet page analysisStructured page data
Search Tools
ToolDescriptionParameters
search_pageSearch visible contentquery (required), case_sensitive, regex, context_lines, highlight
search_sourceSearch HTML sourcequery (required), case_sensitive, regex, context_lines, show_line_numbers
JavaScript Execution Tools
ToolDescriptionParametersReturns
execute_scriptRun JavaScriptjavascript_code (required){javascript_code, execution_time, timestamp, status}
evaluate_scriptRun JS and return resultjavascript_code (required)Script return value
Screenshot and Capture Tools
ToolDescriptionParametersKey Returns
screenshotTake screenshotfilename, highlight_selectors, annotate, full_page{file_path, filename, file_size, timestamp, url, title}
save_pageSave HTML to filefile_path (required){file_path, file_size, timestamp, url, title, status}
Window Management Tools
ToolDescriptionParametersKey Returns
switch_to_windowSwitch to window/tabwindow_handle (required){window_handle, previous_window, current_url, title, total_windows}
open_new_windowOpen new window/tabNone{window_handle, total_windows, previous_windows, current_window}
close_windowClose window/tabwindow_handle (required){closed_window, was_current, remaining_windows, current_window}
get_window_handlesGet all window handlesNone{current_window, windows: [{handle, index, is_current}], total_windows}
maximize_windowMaximize windowNone{size_before: {width, height}, size_after: {width, height}, status}
resize_windowResize windowwidth, height (required){requested_size, size_before, size_after, status}
Session Management Tools
ToolDescriptionReturns
get_session_infoGet session informationSession details
Smart Tools (experimental)
ToolDescriptionParameters
auto_narrateGenerate page descriptionfocus_on
get_narration_historyGet narration historyNone
visual_diffCompare screenshotsbefore_path, after_path (required)

Tool Response Structure

All tools now return structured data instead of simple strings. This makes it easier to:

  • Extract specific information from responses
  • Check operation success/failure
  • Access element properties and metadata
  • Navigate to specific elements using returned selectors

Example responses:

// visit tool response
{
  "url": "https://example.com",
  "current_url": "https://example.com/",
  "title": "Example Domain",
  "status": "success"
}

// find_all tool response with selectors
{
  "selector": ".item",
  "count": 3,
  "elements": [
    {
      "index": 0,
      "selector": ".item:nth-of-type(1)",
      "tag_name": "div",
      "text": "Item 1",
      "visible": true,
      "attributes": {"class": "item active"}
    },
    // ... more elements
  ]
}

// select tool response with option selectors
{
  "dropdown_selector": "#country",
  "selected_value": "US",
  "selected_text": "United States",
  "options": [
    {
      "selector": "#country option:nth-of-type(1)",
      "value": "US",
      "text": "United States",
      "selected": true
    },
    // ... more options
  ],
  "status": "selected"
}

Example Tool Calls

Here are examples using curl with the HTTP server:

# Navigate to a URL
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 1, "method": "tools/call", 
       "params": {"name": "visit", "arguments": {"url": "https://example.com"}}}'

# Take an annotated screenshot
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 2, "method": "tools/call",
       "params": {"name": "screenshot", 
                  "arguments": {"filename": "example", 
                              "highlight_selectors": [".error", ".warning"],
                              "annotate": true,
                              "full_page": true}}}'

# Search page content with highlighting
curl -X POST http://localhost:4567/ \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: alice" \
  -d '{"jsonrpc": "2.0", "id": 3, "method": "tools/call",
       "params": {"name": "search_page",
                  "arguments": {"query": "error|warning",
                              "regex": true,
                              "highlight": true}}}'

Environment Variables

  • HBT_SINGLE_SESSION=true - Force single session mode in HTTP server
  • HBT_SHOW_HEADERS=true - Enable request header logging in HTTP server
  • HBT_SESSION_ID=<session_name> - Enable session persistence in stdio mode

Logging

  • HTTP mode: Logs to stdout
  • Stdio mode: Logs to .hbt/logs/PID.log to avoid interfering with MCP protocol

Tool calls are logged with format:

INFO -- HBT: CALL: ToolName [] {args} -> result
ERROR -- HBT: ERROR: ToolName [] {args} -> error_message

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt.

To install this gem onto your local machine, run bundle exec rake install.

Running Tests and Linting

# Run tests
rake test

# Run linter
rake rubocop

# Run linter with auto-fix
rake rubocop -A

# Run both tests and linter (default task)
rake

Recent Improvements

Version 0.1.0

  • Structured tool responses - All tools now return rich JSON objects instead of simple strings
  • Element selectors in arrays - Tools returning multiple elements include unique selectors for each
  • Session persistence - Both stdio and single-session HTTP modes support persistent sessions
  • Strict session management - Multi-session mode requires X-Session-ID header (no auto-creation)
  • Improved logging - Fixed stdio mode logging to properly write to .hbt/logs/PID.log
  • DRY refactoring - Extracted common functionality into SessionPersistence and DirectorySetup modules
  • Better error handling - Tools return structured error information
  • Enhanced tool responses:
    • Navigation tools return before/after URLs and navigation status
    • Form tools return element state before/after interaction
    • Window tools return comprehensive window state information
    • Screenshot tool returns file metadata
    • All element-finding tools return complete element information

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/parruda/headless_browser_tool.