Crawl4AI-RAG-MCP-Server
If you are the rightful owner of Crawl4AI-RAG-MCP-Server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
Crawl4AI RAG MCP Server is a powerful implementation of the Model Context Protocol (MCP) integrated with Crawl4AI and Supabase, providing AI agents and AI coding assistants with advanced web crawling and RAG capabilities.
The Crawl4AI RAG MCP Server is designed to enable AI agents to efficiently crawl websites, store the content in a vector database (Supabase), and perform Retrieval-Augmented Generation (RAG) over the crawled content. This server supports smart URL detection, recursive crawling, parallel processing, content chunking, and vector search, making it a robust tool for AI-driven web data extraction and utilization. It is particularly useful for AI coding assistants and agents that require access to up-to-date web content for enhanced decision-making and content generation. The server can be deployed using Docker or directly through Python, and it integrates seamlessly with MCP clients using SSE or stdio transport.
Features
- Smart URL Detection: Automatically detects and handles different URL types (regular webpages, sitemaps, text files).
- Recursive Crawling: Follows internal links to discover content.
- Parallel Processing: Efficiently crawls multiple pages simultaneously.
- Content Chunking: Intelligently splits content by headers and size for better processing.
- Vector Search: Performs RAG over crawled content, optionally filtering by data source for precision.
Tools
crawl_single_page`
Quickly crawl a single web page and store its contents into a vector database
smart_crawl_url`
full.txt or regular web pages that require recursive crawling) intelligently crawl the entire website
get_available_sources`
Get a list of all available sources (domain names) in the database
perform_rag_query`
Use semantic search (optional source filtering) to find relevant content