tashifkhan/agentic-browser
If you are the rightful owner of agentic-browser and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Model Context Protocol (MCP) server is a pivotal component in bridging modern LLM reasoning with real-world applications, enabling secure and structured interactions.
Agentic Browser — Adaptive, Model-Agnostic Web Automation
The Open Agent Browser Extension Powered by Python & MCP
Mission: Build an intelligent browser agent that doesn’t just understand the web — it acts on it. Fully model-agnostic, privacy-respecting, and BYOKeys-ready.
Overview
Agentic Browser is a next-generation browser extension powered by a Python MCP (Model Context Protocol) server that bridges modern LLM reasoning with real browser interactivity.
Unlike typical AI assistants, this agent:
- Understands complex web content,
- Takes actions (like filling forms, navigating, comparing data),
- And adapts to any preferred model backend — OpenAI, Anthropic, Ollama, local LLaMA, Mistral, etc.
It’s your agent, your browser, your keys.
Architecture
Key Architectural Principles
- Model-Agnostic: Works with any LLM backend that supports API-style calls (OpenAI-compatible, Anthropic, Ollama, LM Studio).
- BYOKeys: No vendor lock-in. Users supply their own API keys via
.env
or runtime UI input. - MCP-Compliant: Uses the Model Context Protocol for secure and structured interaction.
- Declarative Action System: The model declares browser actions (e.g.
click
,fill_form
,extract
), and the extension executes them safely.
Core Objectives
1. Model-Agnostic Agent Backend
Create a flexible, LLM-agnostic backend using Python, LangChain, and the Model Context Protocol (MCP) framework.
Allows seamless switching across models (OpenRouter, Ollama, Anthropic, OpenAI, or local inference models).
2. Secure Browser Extension
Design a robust and secure browser extension using the WebExtensions API, ensuring compatibility across Chrome, Firefox, and other Chromium-based browsers.
3. Advanced Agent Workflows
Support sophisticated agentic workflows through Retrieval Augmented Generation (RAG), persistent memory, and automated multi-step browsing tasks like form filling, search synthesis, and citation retrieval.
4. Guardrails & Transparency
Implement strong security and transparency layers:
- User approval before every actionable operation
- Comprehensive activity logs
- Intelligent content filtering
- Safe domain allowlisting and IPI protection
5. Open-Source Extensibility
Adopt a modular, community-driven architecture encouraging open innovation and integration of new capabilities, workflows, and extensions over time.
Technical Stack: Components & Technologies
Component | Functionality | Technologies / Frameworks |
---|---|---|
Agent Orchestration | Task planning, retrieval-augmented reasoning, complex multi-step workflows | LangChain, LangGraph |
Browser Control | DOM inspection, navigation, form filling, input injection, and content extraction | WebExtensions API (Chrome / Firefox) |
LLM Adapters | Model-agnostic routing, adapter layer for multi-provider compatibility | OpenRouter, Ollama, Anthropic, OpenAI, Hugging Face APIs |
Backend Agent | Core logic execution, action orchestration, safety and state management | Python MCP Server |
Retrieval & Citation | Web data extraction, embedding-based retrieval, factual grounding | Vector Databases (FAISS / Pinecone) |
Safety & Guardrails | Logging, data protection, domain-level security enforcement | Secure Audit System, Activity Logger |
Core Features
Model-Agnostic Intelligence
Works with any LLM provider — OpenAI, Anthropic, Mistral, Ollama, LM Studio, or custom deployments — using a unified adapter layer.
Bring Your Own Keys (BYOKeys)
No vendor dependency. Users supply their own API keys securely via local .env
or UI input; keys never leave local context.
Web Interaction Engine
Real-time DOM inspection and manipulation for safe, human-approved automation — including form filling, data extraction, and structured web actions.
Retrieval & Grounded Reasoning
Leverages RAG pipelines to incorporate external data and enhance factual grounding, improving contextual accuracy in responses.
Secure Architecture
Every action is validated, logged, and requires explicit permission, ensuring responsible automation and explainability.
Extensible Agent Tools
Developers can easily extend agent capabilities by adding Python tools, context managers, or new browser-side actions.
Roadmap
- Add visual DOM debugger panel
- Multi-model round-robin support (reasoning blending)
- Offline LLM embedding-based retrieval
- GUI for managing keys/providers
- Fine-grained content permissions
Contributing
Contributions are very welcome!
If you’re into LLM orchestration, WebExtension APIs, or intelligent web automation, this project is an open canvas.
Please:
- Fork the repo
- Create a feature branch
- Submit a well-documented PR
License
Released under the GPL 3 License — free to modify, distribute, and extend with attribution.