jeffmm/crawl4ai-mcp
If you are the rightful owner of crawl4ai-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
crawl4ai-mcp is a model context protocol server that extends LLM capabilities with real-time web access through web crawling and internet search tools.
crawl4ai-mcp
crawl4ai-mcp provides a set of web crawling and internet search tools, implemented as a MCP (model context protocol) server. Built using crawl4ai and mcp, this project enables LLMs to perform internet searches and scrape websites for data, extending their capabilities with real-time web access.
Project Structure
crawl4ai-mcp/
├── .gitignore
├── .pre-commit-config.yaml
├── .python-version
├── LICENSE
├── README.md
├── pyproject.toml
├── src/
│ └── crawl4ai_mcp/
│ ├── __init__.py
│ ├── config.py
│ ├── main.py
│ ├── server.py
│ └── types.py
├── tests/
└── uv.lock
Features
- Google Search: Perform Google searches and retrieve the top 10 results in markdown format.
- Deep Crawling:
- Perform Breadth-First Search (BFS) deep crawls.
- Execute Best-First deep crawls, prioritizing pages based on keywords.
- Configurable
max_depth,max_pages, andinclude_externallinks.
- Multi-URL Crawling: Crawl a list of specified URLs concurrently.
- Configurable Settings: Adjust browser type, headless mode, verbose logging, screenshot capture, word count threshold, cache mode, and content return type (HTML or Markdown).
- Stealth Mode: Includes settings for random user agents, user simulation, timezone, and geolocation to mimic real user behavior.
- Flexible Output: Returns content in either raw HTML for your LLM to parse, or processed into Markdown.
Installation
-
Clone the repository:
git clone https://github.com/crawl4ai/crawl4ai-mcp.git cd crawl4ai-mcp -
Install with uv:
uv tool install crawl4ai-mcp -
Install playwright browsers (if not already installed):
playwright install chromium # or firefox or webkit
Configuration
The application's settings are managed via environment variables, prefixed with C4AI_. Key settings include:
C4AI_BROWSER_TYPE:chromium,firefox, orwebkit(default:chromium)C4AI_HEADLESS:trueorfalse(default:true)C4AI_VERBOSE:trueorfalse(default:false)C4AI_SCREENSHOT:trueorfalse(default:false)C4AI_WORD_COUNT_THRESHOLD: Minimum word count for content to be returned (default:10)C4AI_CACHE_MODE:enabled,disabled,read_only,write_only,bypass(default:bypass)C4AI_MAX_DEPTH: Maximum depth for deep crawling (default:2)C4AI_MAX_PAGES: Maximum number of pages to crawl in deep strategies (default:50)C4AI_INCLUDE_EXTERNAL: Whether to include external links in deep crawling (default:false)C4AI_CONTENT_TYPE:htmlormarkdown(default:markdown)
Example:
C4AI_BROWSER_TYPE=firefox C4AI_HEADLESS=false python src/crawl4ai_mcp/main.py
Running the Server
The application can be run as a FastMCP server, supporting different transport mechanisms.
To run the server as a streamable HTTP server:
python src/crawl4ai_mcp/main.py --transport http
To run the server using standard I/O (default):
python src/crawl4ai_mcp/main.py
Available Tools (API)
The following asynchronous tools are exposed by the FastMCP server for use by LLMs:
google_search(query: str) -> MCPCrawlResult
Performs a Google search and returns a markdown page of the top 10 results.
query: The search query string.
deep_crawl(url: str, keywords: list[str] | None = None) -> list[MCPCrawlResult]
Crawl a website deeply, optionally using keywords to prioritize pages.
url: The URL to start crawling from.keywords: An optional list of keywords to prioritize pages during the crawl. IfNone, a Breadth-First Search (BFS) strategy is used.
crawl(urls: list[str]) -> list[MCPCrawlResult]
Crawl multiple URLs and return their content.
urls: A list of URLs to crawl.
Developer setup
To set up the project locally, follow these steps:
-
Create and activate a virtual environment using
uv(recommended, as indicated byuv.lock):uv venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate` -
Install dependencies from
pyproject.tomlanduv.lock:uv sync -
Install pre-commit hooks:
pre-commit install