Alae-Eddine-Akesbi/AgenticMCP-Selenium
If you are the rightful owner of AgenticMCP-Selenium and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The AI-Powered Browser Automation project leverages LangChain and Selenium MCP to automate web browsers using natural language commands.
🤖 AI-Powered Browser Automation with LangChain and Selenium MCP
📖 About
This project provides a powerful and flexible solution for automating web browsers using natural language commands. It combines a LangChain agent with a Selenium-based MCP server to create a system that can perform complex web automation tasks based on simple, conversational instructions. The project is designed for seamless integration with n8n, allowing you to build sophisticated automation workflows with ease. It's an ideal starting point for developers looking to leverage AI for web scraping, automated testing, and other browser-based tasks.
✨ Key Features
- Natural Language Control: Use simple, conversational language to control a web browser.
- Modular Architecture: The LangChain agent and Selenium MCP server are decoupled, making the system easy to maintain and extend.
- Seamless n8n Integration: Comes with a pre-built n8n workflow for rapid setup and deployment.
- Comprehensive Toolset: A rich set of browser automation tools for handling a wide range of tasks.
- Session Persistence: Browser sessions are maintained across server restarts, allowing for long-running, stateful automations.
🏛️ Project Architecture
The project consists of two main components:
-
langchain-agent: A Python application built with LangChain and Streamlit that provides a chat-based interface for the user. It uses a powerful language model (like Google's Gemini) to interpret user commands and orchestrate the automation tasks. -
mcp-selenium: A Node.js server that implements the Model Context Protocol (MCP). It acts as a bridge between the LangChain agent and Selenium WebDriver, exposing a set of tools that the agent can use to control a web browser.
The two components communicate over HTTP, with the LangChain agent sending commands to the MCP server, which then executes them in the browser.
🚀 Getting Started
Prerequisites
- Node.js and npm
- Python and pip
- An API key for a supported language model (e.g., Google Gemini)
1. Set Up the mcp-selenium Server
The mcp-selenium server is the engine of the project, responsible for executing the browser automation tasks.
Installation:
cd mcp-selenium
npm install
Running the Server:
node http-server.js
The server will start on http://localhost:3000.
2. Set Up the langchain-agent
The langchain-agent provides the user interface and the "brains" of the operation.
Installation:
cd langchain-agent
pip install -r requirements.txt
Configuration:
Create a .env file in the langchain-agent directory and add your language model's API key:
GEMINI_API_KEY=your_api_key_here
Running the Agent:
streamlit run app.py
This will launch a web-based chat interface where you can interact with the agent.
🐳 Docker Usage
This project is also available as a Docker image, which simplifies the setup process by bundling the langchain-agent and mcp-selenium server into a single container.
Building the Docker Image:
docker build -t mcp-selenium-agent .
Running the Docker Container:
docker run -p 3000:3000 -p 8501:8501 mcp-selenium-agent
This will start both the mcp-selenium server on port 3000 and the langchain-agent on port 8501. You can access the agent's web interface by navigating to http://localhost:8501 in your browser.
🔗 n8n Integration
This project is designed to be used with n8n, a powerful workflow automation tool. We've provided a pre-built workflow (selenium_workflow.json) to get you started quickly.
Setting Up the Workflow:
- Import the
selenium_workflow.jsonfile into your n8n instance. - Configure the Google Gemini Chat Model node with your API credentials.
- Ensure the MCP Client node is pointing to the correct URL for your
mcp-seleniumserver (http://127.0.0.1:3000by default). - Activate the workflow.
You can now interact with the agent directly from the n8n chat interface.
🧰 Available Tools
The mcp-selenium server provides a comprehensive set of tools for browser automation. Here's a complete list:
Browser Management
start_browser: Launches a new browser session.navigate: Navigates to a specific URL.close_session: Closes the current browser session.
Page Interaction
get_page_source: Retrieves the HTML source of the current page.get_element_text: Gets the visible text of a specific element.take_screenshot: Takes a screenshot of the visible part of the page.
Element Interaction
find_element: Checks if an element exists on the page.click_element: Clicks on a specific element.send_keys: Types text into an input field.upload_file: Uploads a file to a file input field.
Advanced Interactions
hover: Hovers the mouse over an element.drag_and_drop: Drags an element and drops it onto another.double_click: Performs a double-click on an element.right_click: Performs a right-click on an element.press_key: Simulates pressing a key on the keyboard.
📜 License
This project is licensed under the MIT License. See the file for details.