jeffmm/crawl4ai-mcp
If you are the rightful owner of crawl4ai-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
crawl4ai-mcp is a model context protocol server that extends LLM capabilities with real-time web access through web crawling and internet search tools.
google_search
Performs a Google search and returns a markdown page of the top 10 results.
deep_crawl
Crawl a website deeply, optionally using keywords to prioritize pages.
crawl
Crawl multiple URLs and return their content.
crawl4ai-mcp
crawl4ai-mcp
provides a set of web crawling and internet search tools, implemented as a MCP (model context protocol) server. Built using crawl4ai
and mcp
, this project enables LLMs to perform internet searches and scrape websites for data, extending their capabilities with real-time web access.
Project Structure
crawl4ai-mcp/
āāā .gitignore
āāā .pre-commit-config.yaml
āāā .python-version
āāā LICENSE
āāā README.md
āāā pyproject.toml
āāā src/
ā āāā crawl4ai_mcp/
ā āāā __init__.py
ā āāā config.py
ā āāā main.py
ā āāā server.py
ā āāā types.py
āāā tests/
āāā uv.lock
Features
- Google Search: Perform Google searches and retrieve the top 10 results in markdown format.
- Deep Crawling:
- Perform Breadth-First Search (BFS) deep crawls.
- Execute Best-First deep crawls, prioritizing pages based on keywords.
- Configurable
max_depth
,max_pages
, andinclude_external
links.
- Multi-URL Crawling: Crawl a list of specified URLs concurrently.
- Configurable Settings: Adjust browser type, headless mode, verbose logging, screenshot capture, word count threshold, cache mode, and content return type (HTML or Markdown).
- Stealth Mode: Includes settings for random user agents, user simulation, timezone, and geolocation to mimic real user behavior.
- Flexible Output: Returns content in either raw HTML for your LLM to parse, or processed into Markdown.
Installation
-
Clone the repository:
git clone https://github.com/crawl4ai/crawl4ai-mcp.git cd crawl4ai-mcp
-
Install with uv:
uv tool install crawl4ai-mcp
-
Install playwright browsers (if not already installed):
playwright install chromium # or firefox or webkit
Configuration
The application's settings are managed via environment variables, prefixed with C4AI_
. Key settings include:
C4AI_BROWSER_TYPE
:chromium
,firefox
, orwebkit
(default:chromium
)C4AI_HEADLESS
:true
orfalse
(default:true
)C4AI_VERBOSE
:true
orfalse
(default:false
)C4AI_SCREENSHOT
:true
orfalse
(default:false
)C4AI_WORD_COUNT_THRESHOLD
: Minimum word count for content to be returned (default:10
)C4AI_CACHE_MODE
:enabled
,disabled
,read_only
,write_only
,bypass
(default:bypass
)C4AI_MAX_DEPTH
: Maximum depth for deep crawling (default:2
)C4AI_MAX_PAGES
: Maximum number of pages to crawl in deep strategies (default:50
)C4AI_INCLUDE_EXTERNAL
: Whether to include external links in deep crawling (default:false
)C4AI_CONTENT_TYPE
:html
ormarkdown
(default:markdown
)
Example:
C4AI_BROWSER_TYPE=firefox C4AI_HEADLESS=false python src/crawl4ai_mcp/main.py
Running the Server
The application can be run as a FastMCP
server, supporting different transport mechanisms.
To run the server as a streamable HTTP server:
python src/crawl4ai_mcp/main.py --transport http
To run the server using standard I/O (default):
python src/crawl4ai_mcp/main.py
Available Tools (API)
The following asynchronous tools are exposed by the FastMCP
server for use by LLMs:
google_search(query: str) -> MCPCrawlResult
Performs a Google search and returns a markdown page of the top 10 results.
query
: The search query string.
deep_crawl(url: str, keywords: list[str] | None = None) -> list[MCPCrawlResult]
Crawl a website deeply, optionally using keywords to prioritize pages.
url
: The URL to start crawling from.keywords
: An optional list of keywords to prioritize pages during the crawl. IfNone
, a Breadth-First Search (BFS) strategy is used.
crawl(urls: list[str]) -> list[MCPCrawlResult]
Crawl multiple URLs and return their content.
urls
: A list of URLs to crawl.
Developer setup
To set up the project locally, follow these steps:
-
Create and activate a virtual environment using
uv
(recommended, as indicated byuv.lock
):uv venv source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
-
Install dependencies from
pyproject.toml
anduv.lock
:uv sync
-
Install pre-commit hooks:
pre-commit install