selenium-mcp-server

krishnapollu/selenium-mcp-server

3.2

If you are the rightful owner of selenium-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

Selenium MCP Server is a powerful Model Context Protocol (MCP) server that integrates Selenium WebDriver automation with AI assistants, enabling natural language-driven web automation.

Tools
  1. start_browser

    Launch a new browser session.

  2. navigate

    Go to a URL.

  3. find_element

    Find an element using various locator strategies.

  4. click_element

    Click an element with optional JavaScript fallback.

  5. take_screenshot

    Take a screenshot of the current page.

Selenium MCP Server

CI

A powerful Model Context Protocol (MCP) server that brings Selenium WebDriver automation to AI assistants. This server enables AI tools like Claude Desktop to control web browsers programmatically, making web automation accessible through natural language commands.

What This Does

Ever wanted to tell an AI assistant to "go to Google, search for something, and take a screenshot"? This MCP server makes that possible. It provides a bridge between AI assistants and web browsers, allowing for sophisticated web automation workflows.

Key Features

šŸš€ Multiple Browser Sessions

Run multiple browsers simultaneously - perfect for comparing different websites or handling complex workflows that require multiple browser instances.

šŸŽÆ Smart Element Interaction

  • Find and interact with elements using various locator strategies
  • Enhanced waiting mechanisms that actually work
  • Force click with JavaScript fallback when normal clicks fail
  • Type text with configurable speed (useful for avoiding detection)

⚔ JavaScript Execution

Execute custom JavaScript code directly in the browser - great for advanced interactions, data extraction, or custom automation logic.

šŸ“ø Screenshot & File Operations

Take screenshots (including full-page captures) and upload files with ease.

šŸ›”ļø Robust Error Handling

Specific error messages that actually help you debug issues, rather than generic failures.

šŸ“Š Session Management

List, switch between, and manage multiple browser sessions with detailed metadata tracking.

Quick Start

1. Install the Package

pip install -e .

The server uses webdriver-manager to automatically handle browser drivers, so you don't need to manually download ChromeDriver or GeckoDriver.

2. Configure Your MCP Client

Use this simple, generic configuration:

{
  "mcpServers": {
    "selenium": {
      "command": "python",
      "args": ["-m", "selenium_mcp_server"]
    }
  }
}

Configuration Locations:

  • Cursor AI: %USERPROFILE%\.cursor\mcp_config.json (Windows)
  • Claude Desktop: %APPDATA%\claude-desktop\config.json (Windows)
  • Other MCP Clients: Check your client's documentation

3. Test the Server

python -m selenium_mcp_server

Note:

  • The configuration is generic and works across all platforms
  • No hardcoded paths required
  • Ready-to-use configuration file: config/mcp_client_config.json
  • Detailed setup instructions: config/configuration_guide.md

Available Tools

Browser Management

ToolDescriptionKey Parameters
start_browserLaunch a new browser sessionbrowser, options, session_name
list_sessionsShow all active sessionsNone
switch_sessionSwitch to a different sessionsession_id
close_sessionClose a specific sessionsession_id (optional)

Navigation & Page Info

ToolDescriptionKey Parameters
navigateGo to a URLurl, wait_for_load
get_page_infoGet page detailsinclude_title, include_url, include_source

Element Interaction

ToolDescriptionKey Parameters
find_elementFind an elementby, value, timeout, wait_for_clickable
click_elementClick an elementby, value, timeout, force_click
send_keysType textby, value, text, clear_first, type_speed
get_element_textGet element textby, value, timeout
wait_for_elementWait for elementby, value, timeout, wait_for_visible

Advanced Actions

ToolDescriptionKey Parameters
hoverHover over elementby, value, timeout
drag_and_dropDrag and dropby, value, targetBy, targetValue
double_clickDouble clickby, value, timeout
right_clickRight clickby, value, timeout
press_keyPress keyboard keykey
execute_scriptRun JavaScriptscript, arguments

File Operations

ToolDescriptionKey Parameters
upload_fileUpload a fileby, value, filePath, timeout
take_screenshotTake screenshotoutputPath, full_page

Real-World Examples

Example 1: Google Search Automation

[
  {
    "name": "start_browser",
    "arguments": {
      "browser": "chrome",
      "options": {"headless": false},
      "session_name": "search_session"
    }
  },
  {
    "name": "navigate",
    "arguments": {
      "url": "https://www.google.com",
      "wait_for_load": true
    }
  },
  {
    "name": "send_keys",
    "arguments": {
      "by": "name",
      "value": "q",
      "text": "Selenium automation tutorial",
      "clear_first": true
    }
  },
  {
    "name": "press_key",
    "arguments": {"key": "Enter"}
  },
  {
    "name": "take_screenshot",
    "arguments": {
      "outputPath": "search_results.png",
      "full_page": true
    }
  }
]

Example 2: Multi-Session Workflow

[
  {
    "name": "start_browser",
    "arguments": {
      "browser": "chrome",
      "session_name": "main"
    }
  },
  {
    "name": "start_browser",
    "arguments": {
      "browser": "firefox",
      "options": {"headless": true},
      "session_name": "background"
    }
  },
  {
    "name": "list_sessions",
    "arguments": {}
  }
]

Example 3: JavaScript Data Extraction

[
  {
    "name": "navigate",
    "arguments": {"url": "https://example.com"}
  },
  {
    "name": "execute_script",
    "arguments": {
      "script": "return Array.from(document.querySelectorAll('h1, h2, h3')).map(h => h.textContent);"
    }
  }
]

Locator Strategies

The server supports all standard Selenium locator strategies:

  • id: Find by element ID (fastest)
  • css: Find by CSS selector (most flexible)
  • xpath: Find by XPath (most powerful)
  • name: Find by name attribute
  • tag: Find by tag name
  • class: Find by class name

Error Handling

The server provides meaningful error messages instead of generic failures:

  • ā° Timeout errors: When elements don't appear within the specified time
  • šŸ” Element not found: When locators don't match any elements
  • šŸ–±ļø Click intercepted: When elements are covered by other elements
  • 🚫 Session errors: When browser startup fails

Common Use Cases

Web Scraping

Use execute_script to extract data from complex pages, or get_element_text for simple text extraction.

Form Automation

Fill out forms with send_keys, handle file uploads, and submit with click_element.

Testing

Take screenshots, verify element presence, and automate user workflows.

Monitoring

Set up automated checks that navigate to pages and verify content.

Troubleshooting

Browser Won't Start

  • Make sure Chrome or Firefox is installed
  • Check that webdriver-manager can access the internet
  • Try running with headless: false to see what's happening

Elements Not Found

  • Double-check your locator strategy and value
  • Use browser dev tools to verify the element exists
  • Try increasing the timeout value
  • Use wait_for_element to ensure the page is fully loaded

Permission Issues

  • Ensure the script has write permissions for screenshot directories
  • Use absolute paths for file uploads

Performance Issues

  • Use headless mode for faster execution
  • Close unused sessions with close_session
  • Consider using type_speed to avoid being detected as a bot

Project Structure

selenium-mcp-server/
ā”œā”€ā”€ src/                    # Source code
│   ā”œā”€ā”€ selenium_mcp_server.py
│   ā”œā”€ā”€ main.py            # Main entry point
│   └── run_server.py      # Server runner
ā”œā”€ā”€ scripts/                # Utility scripts
│   ā”œā”€ā”€ cleanup.py         # Cleanup utility
│   └── install_dependencies.py # Dependency installer
ā”œā”€ā”€ tests/                  # Test files
│   ā”œā”€ā”€ run_tests.py       # Test runner
│   └── *.py              # Individual tests
ā”œā”€ā”€ docs/                   # Documentation
ā”œā”€ā”€ examples/               # Example usage
ā”œā”€ā”€ config/                 # Configuration files
ā”œā”€ā”€ README.md              # Main documentation
ā”œā”€ā”€ requirements.txt       # Dependencies
ā”œā”€ā”€ setup.py               # Package setup
└── .gitignore             # Git exclusions

Development

Running Tests

# Use the test runner (recommended)
python tests/run_tests.py

# Or run individual tests
python tests/interactive_test.py
python tests/test_browser_management.py
python tests/test_error_handling.py
python tests/test_selenium_mcp.py

Utility Scripts

# Install dependencies
python scripts/install_dependencies.py

# Clean up project
python scripts/cleanup.py

Continuous Integration

This project includes GitHub Actions CI that automatically runs tests on every push and pull request. The CI workflow:

  • āœ… Tests server initialization
  • āœ… Tests basic browser functionality
  • āœ… Tests error handling
  • āœ… Checks Python syntax
  • āœ… Tests package installation
  • āœ… Runs on Ubuntu with Chrome and Firefox

See .github/workflows/ci.yml for details.

Debug Mode

Enable detailed logging by modifying the logging level in the script:

logging.basicConfig(level=logging.DEBUG)

Contributing

Found a bug? Have an idea for a new feature? Feel free to open an issue or submit a pull request. This project is actively maintained and welcomes contributions.

License

MIT License - feel free to use this in your own projects.


Note: This server is designed for legitimate automation tasks. Please respect websites' terms of service and robots.txt files when using this tool.