selenium-mcp-server-ai

jyotiprakashb83/selenium-mcp-server-ai

3.2

If you are the rightful owner of selenium-mcp-server-ai and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Selenium MCP Server with AI is a Java-based application that translates natural language test steps into Selenium WebDriver commands using a Language Model (LLM).

Selenium MCP Server with AI

This JAVA MAVEN TESTNG Project implements a Selenium Master Control Program (MCP) server in Java that executes browser automation tasks by converting natural language test steps into Selenium WebDriver commands using a Language Model (LLM). The server supports local Ollama or remote LLM endpoints and is configurable via property files. A TestNG-based client is included to send test prompts and log results.

Project Structure

  • src/main/java/McpServer.java: Main server class that handles HTTP requests, LLM interaction, and Selenium command execution.
  • src/main/java/LlmClient.java: Handles communication with the LLM (Ollama or remote API).
  • src/test/java/SeleniumTestClient.java: TestNG client that sends test steps to the server and logs results.
  • config.properties: Configuration file for server settings (port, browser, LLM endpoint).
  • selenium_operations.properties: Defines supported Selenium operations and their required parameters.
  • pom.xml: Maven configuration with dependencies.

Application Flow:

  • Step 1: Client input natural language test steps to MCP Server
  • Step 2: MCP Server processes the input and LLM is called with an augmented PROMPT
  • Step 3: LLM Responds with command for MCP
  • Step 4: MCP Server uses the response and perform Selenium commands on browser

Prerequisites

  • Java: 17 or later
  • Maven: 3.6.0 or later
  • WebDriver: ChromeDriver or FirefoxDriver (matching your browser version) installed and added to PATH
  • LLM: Either:
    • Local Ollama running on http://localhost:11434 (default) with a model like llama3
    • A remote LLM endpoint (e.g., OpenAI-compatible API) with an API key
  • Browser: Chrome or Firefox installed
  • TestNG: Included via Maven, no separate installation needed

Setup

STEP 1:

  • Clone the Project:
  • git clone
  • cd selenium-mcp

Install Dependencies:Run the following command to download dependencies: mvn install

Configure the Server:Edit config.properties in the project root to set:

  • server.port: HTTP server port (default: 8080)
  • browser: Browser type (chrome or firefox, default: chrome)
  • llm.endpoint: LLM API URL (default: http://localhost:11434 for Ollama)
  • llm.model: LLM model name (default: llama3)
  • llm.apiKey: API key for remote LLM (leave empty for Ollama)

Sample config.properties:

STEP 2:

  • start server using McpServer.java
  • Run selenium client file to see test execution SeleniumTestClient.java

Configure Selenium Operations:The selenium_operations.properties file defines supported Selenium commands and their parameters. It includes operations like start_browser, navigate, click_element, and more. Modify this file to add or change operations if needed. Example

  • selenium_operations.properties:
  • start_browser=browser,headless
  • navigate=url
  • click_element=by,value,timeout

... (other operations)

Set Up Ollama (if using local LLM):

Install Ollama (see Ollama documentation). Start Ollama server:

ollama serve

Ensure the specified model (e.g., llama3) is pulled:ollama pull llama3

Running the Server

Start the Server:Run the McpServer class:

mvn exec:java -Dexec.mainClass="McpServer"

Or, use your IDE to run McpServer.java. The server will start on the configured port (default: http://localhost:8080) and listen for POST requests to /execute.

Verify Server:Check the console for: MCP Server started on port 8080

Running the Client

Run Tests:Execute the TestNG client to send test steps and generate results: mvn test

Or, run SeleniumTestClient.java in your IDE with the TestNG plugin. The client sends a sample test prompt (e.g., navigating to https://example.com, clicking a link, and verifying text) and logs results using TestNG.

View Test Results:TestNG generates reports in the test-output/ directory (HTML and XML formats) with pass/fail status and details. Example test steps in SeleniumTestClient.java:

Input natural language test steps = "Navigate to en.wikipedia.org. Search for India. Take a screenshot";

Example LLM Output The LLM must convert test steps into a JSON array of commands matching the operations in selenium_operations.properties. Example:

[

{"type": "start_browser", "browser": "chrome", "headless": "false"},

{"type": "navigate", "url": "https://example.com"},

{"type": "click_element", "by": "xpath", "value": "//a[text()='More information']", "timeout": "10000"},

{"type": "get_element_text", "by": "xpath", "value": "//body", "timeout": "10000"},

{"type": "close_session"}

]

Supported Selenium Operations: The server supports the following operations (defined in selenium_operations.properties):

  • start_browser: Launches a browser session (browser, headless).
  • navigate: Navigates to a URL (url).
  • find_element: Locates an element (by, value, timeout).
  • click_element: Clicks an element (by, value, timeout).
  • send_keys: Sends text to an element (by, value, text, timeout).
  • get_element_text: Retrieves element text (by, value, timeout).
  • hover: Hovers over an element (by, value, timeout).
  • drag_and_drop: Drags one element to another (by, value, targetBy, targetValue, timeout).
  • double_click: Double-clicks an element (by, value, timeout).
  • right_click: Right-clicks an element (by, value, timeout).
  • press_key: Simulates a key press (key).
  • upload_file: Uploads a file (by, value, filePath, timeout).
  • take_screenshot: Captures a screenshot as base64 (outputPath optional).
  • close_session: Closes the current browser session.

Notes

LLM Configuration: Ensure the LLM returns structured JSON commands matching the operations in selenium_operations.properties. Adjust the prompt in McpServer.java if needed. WebDriver Compatibility: Ensure ChromeDriver or GeckoDriver matches your browser version. File Uploads: The upload_file operation requires a valid file path accessible to the server. Screenshots: The take_screenshot operation returns base64 data to avoid server-side file system dependencies. Error Handling: The server catches and reports errors for each command in the response. Extending Operations: Add new operations to selenium_operations.properties and update executeCommand in McpServer.java as needed.

Troubleshooting

Server Fails to Start: Check config.properties for valid settings and ensure the LLM endpoint is accessible. WebDriver Errors: Verify ChromeDriver/GeckoDriver is in PATH and compatible with your browser. LLM Issues: Ensure Ollama is running or the remote LLM endpoint and API key are correct. Test Failures: Check TestNG reports in test-output/ for detailed error messages.

License This project is licensed under the MIT License.