crawl4ai-rust-mcp

pnocera/crawl4ai-rust-mcp

3.3

If you are the rightful owner of crawl4ai-rust-mcp and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

A high-performance Rust MCP server designed for web crawling and retrieval-augmented generation, featuring zero-copy architecture and SIMD acceleration.

Tools
8
Resources
0
Prompts
0

crawl4ai-rust-mcp

A high-performance Rust MCP (Model Context Protocol) server for web crawling and RAG (Retrieval-Augmented Generation) capabilities. Built with zero-copy architecture, SIMD acceleration, and memory-efficient storage.

Features

  • šŸš€ Zero-Copy Protocol: Efficient data transfer using rkyv serialization
  • šŸ” Intelligent Web Crawling: Polite crawling with rate limiting and robots.txt support
  • 🧠 Local Embeddings: GPU-accelerated text embeddings with Candle framework
  • šŸ’¾ 32x Memory Reduction: Binary quantization for vector storage
  • ⚔ SIMD-Accelerated Search: Hardware-optimized vector similarity search
  • šŸ—„ļø Hybrid Storage: RocksDB + DuckDB + memory-mapped files
  • šŸ”— Knowledge Graphs: Memgraph integration for relationship tracking
  • 🌐 MCP Protocol: Full compliance with Model Context Protocol

Architecture

crawl4ai-rust-mcp/
ā”œā”€ā”€ mcp-server/         # Axum HTTP server with SSE/WebSocket support
ā”œā”€ā”€ mcp-protocol/       # MCP protocol types and serialization
ā”œā”€ā”€ crawler/            # Web crawler with JavaScript support
ā”œā”€ā”€ embeddings/         # Local embeddings generation
ā”œā”€ā”€ vector-store/       # Qdrant integration with binary quantization
ā”œā”€ā”€ graph-store/        # Knowledge graph storage
ā”œā”€ā”€ storage/            # Hybrid storage layer
└── search/             # SIMD-accelerated similarity search

Quick Start

Prerequisites

  • Rust 1.70+
  • Docker (for Qdrant and Memgraph)
  • CUDA/Metal (optional, for GPU acceleration)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/crawl4ai-rust-mcp.git
cd crawl4ai-rust-mcp
  1. Start required services:
# Qdrant vector database
docker run -p 6333:6333 qdrant/qdrant

# Memgraph (optional, for knowledge graphs)
docker run -p 7687:7687 memgraph/memgraph
  1. Build the project:
cargo build --release
  1. Run the server:
cargo run --bin mcp-server

Configuration

Set environment variables to configure the server:

MCP_HOST=0.0.0.0
MCP_PORT=8080
MCP_STORAGE_PATH=./data
QDRANT_URL=http://localhost:6333
MEMGRAPH_URL=bolt://localhost:7687
RUST_LOG=debug

MCP Tools

The server implements 8 MCP tools:

1. crawl_single_page

Crawl a single web page and extract content.

2. smart_crawl_url

Intelligently crawl multiple pages from a starting URL.

3. get_available_sources

List all crawled domains and their metadata.

4. perform_rag_query

Perform semantic search across indexed content.

5. search_code_examples

Search specifically for code snippets and examples.

6. parse_github_repository

Parse GitHub repositories into a knowledge graph.

7. check_ai_script_hallucinations

Validate AI-generated code against known patterns.

8. query_knowledge_graph

Query the knowledge graph for relationships and insights.

API Endpoints

  • POST /mcp - Execute MCP tools
  • GET /mcp/stream - Server-sent events for streaming responses
  • POST /api/crawl - Direct crawling endpoint
  • POST /api/search - Direct search endpoint
  • WS /ws - WebSocket for bidirectional communication

Development

Building

# Build all crates
cargo build --release

# Build specific crate
cargo build --package mcp-server

Testing

# Run all tests
cargo test --workspace

# Run specific crate tests
cargo test --package crawler

# Run with property-based testing
cargo test --features proptest

Running Examples

# Crawler example
cargo run --example crawler_example --package crawler

# Embeddings example
cargo run --example embeddings_example --package embeddings

# Vector store example
cargo run --example vector_store_example --package vector-store

Performance

  • Binary Quantization: 32x memory reduction for vector storage
  • SIMD Operations: Hardware-accelerated similarity search
  • Zero-Copy Design: Minimal memory allocations during data transfer
  • Async Architecture: Handles thousands of concurrent operations
  • Memory-Mapped Files: Efficient handling of large documents

Use Cases

  • šŸ¤– AI Assistants: Build context-aware AI applications
  • šŸ“š Documentation Search: Index and search technical documentation
  • šŸ”¬ Research Tools: Crawl and analyze academic papers
  • šŸ’» Code Intelligence: Parse and understand code repositories
  • šŸ” Enterprise Search: Build internal knowledge bases

Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments