pnocera/crawl4ai-rust-mcp
If you are the rightful owner of crawl4ai-rust-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
A high-performance Rust MCP server designed for web crawling and retrieval-augmented generation, featuring zero-copy architecture and SIMD acceleration.
crawl4ai-rust-mcp
A high-performance Rust MCP (Model Context Protocol) server for web crawling and RAG (Retrieval-Augmented Generation) capabilities. Built with zero-copy architecture, SIMD acceleration, and memory-efficient storage.
Features
- š Zero-Copy Protocol: Efficient data transfer using rkyv serialization
- š Intelligent Web Crawling: Polite crawling with rate limiting and robots.txt support
- š§ Local Embeddings: GPU-accelerated text embeddings with Candle framework
- š¾ 32x Memory Reduction: Binary quantization for vector storage
- ā” SIMD-Accelerated Search: Hardware-optimized vector similarity search
- šļø Hybrid Storage: RocksDB + DuckDB + memory-mapped files
- š Knowledge Graphs: Memgraph integration for relationship tracking
- š MCP Protocol: Full compliance with Model Context Protocol
Architecture
crawl4ai-rust-mcp/
āāā mcp-server/ # Axum HTTP server with SSE/WebSocket support
āāā mcp-protocol/ # MCP protocol types and serialization
āāā crawler/ # Web crawler with JavaScript support
āāā embeddings/ # Local embeddings generation
āāā vector-store/ # Qdrant integration with binary quantization
āāā graph-store/ # Knowledge graph storage
āāā storage/ # Hybrid storage layer
āāā search/ # SIMD-accelerated similarity search
Quick Start
Prerequisites
- Rust 1.70+
- Docker (for Qdrant and Memgraph)
- CUDA/Metal (optional, for GPU acceleration)
Installation
- Clone the repository:
git clone https://github.com/yourusername/crawl4ai-rust-mcp.git
cd crawl4ai-rust-mcp
- Start required services:
# Qdrant vector database
docker run -p 6333:6333 qdrant/qdrant
# Memgraph (optional, for knowledge graphs)
docker run -p 7687:7687 memgraph/memgraph
- Build the project:
cargo build --release
- Run the server:
cargo run --bin mcp-server
Configuration
Set environment variables to configure the server:
MCP_HOST=0.0.0.0
MCP_PORT=8080
MCP_STORAGE_PATH=./data
QDRANT_URL=http://localhost:6333
MEMGRAPH_URL=bolt://localhost:7687
RUST_LOG=debug
MCP Tools
The server implements 8 MCP tools:
1. crawl_single_page
Crawl a single web page and extract content.
2. smart_crawl_url
Intelligently crawl multiple pages from a starting URL.
3. get_available_sources
List all crawled domains and their metadata.
4. perform_rag_query
Perform semantic search across indexed content.
5. search_code_examples
Search specifically for code snippets and examples.
6. parse_github_repository
Parse GitHub repositories into a knowledge graph.
7. check_ai_script_hallucinations
Validate AI-generated code against known patterns.
8. query_knowledge_graph
Query the knowledge graph for relationships and insights.
API Endpoints
POST /mcp
- Execute MCP toolsGET /mcp/stream
- Server-sent events for streaming responsesPOST /api/crawl
- Direct crawling endpointPOST /api/search
- Direct search endpointWS /ws
- WebSocket for bidirectional communication
Development
Building
# Build all crates
cargo build --release
# Build specific crate
cargo build --package mcp-server
Testing
# Run all tests
cargo test --workspace
# Run specific crate tests
cargo test --package crawler
# Run with property-based testing
cargo test --features proptest
Running Examples
# Crawler example
cargo run --example crawler_example --package crawler
# Embeddings example
cargo run --example embeddings_example --package embeddings
# Vector store example
cargo run --example vector_store_example --package vector-store
Performance
- Binary Quantization: 32x memory reduction for vector storage
- SIMD Operations: Hardware-accelerated similarity search
- Zero-Copy Design: Minimal memory allocations during data transfer
- Async Architecture: Handles thousands of concurrent operations
- Memory-Mapped Files: Efficient handling of large documents
Use Cases
- š¤ AI Assistants: Build context-aware AI applications
- š Documentation Search: Index and search technical documentation
- š¬ Research Tools: Crawl and analyze academic papers
- š» Code Intelligence: Parse and understand code repositories
- š Enterprise Search: Build internal knowledge bases
Contributing
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.
License
This project is licensed under the MIT License - see the LICENSE file for details.