project-rag

Brainwires/project-rag

3.3

If you are the rightful owner of project-rag and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

A Rust-based Model Context Protocol (MCP) server that provides AI assistants with powerful RAG (Retrieval-Augmented Generation) capabilities for understanding massive codebases.

Tools
6
Resources
0
Prompts
0

Project RAG - MCP Server for Code Understanding

Tests Coverage Rust

A Rust-based Model Context Protocol (MCP) server that provides AI assistants with powerful RAG (Retrieval-Augmented Generation) capabilities for understanding massive codebases.

Overview

This MCP server enables AI assistants to efficiently search and understand large projects by:

  • Creating semantic embeddings of code files
  • Storing them in a local vector database
  • Providing fast semantic search capabilities
  • Supporting incremental updates for efficiency

Features

  • Local-First: All processing happens locally using fastembed-rs (no API keys required)
  • Hybrid Search: Combines vector similarity with BM25 keyword matching using Reciprocal Rank Fusion (RRF) for optimal results
  • AST-Based Chunking: Uses Tree-sitter to extract semantic units (functions, classes, methods) for 12 languages
  • Comprehensive File Support: Indexes 40+ file types including code, documentation (with PDF→Markdown conversion), and configuration files
  • Git History Search: Search commit history with smart on-demand indexing (default: 10 commits, only indexes deeper as needed)
  • Multi-Project Support: Index and query multiple codebases simultaneously with project filtering
  • Smart Indexing: Automatically performs full indexing for new codebases or incremental updates for previously indexed ones
  • Concurrent Access Protection: Safe lock management prevents index corruption when multiple agents try to index simultaneously
  • Stable Embedded Database: LanceDB vector database (default, no external dependencies) with optional Qdrant support
  • Language Detection: Automatic detection of 40+ file types (programming languages, documentation formats, and config files)
  • Advanced Filtering: Search by file type, language, or path patterns
  • Respects .gitignore: Automatically excludes ignored files during indexing
  • Slash Commands: 6 convenient slash commands via MCP Prompts

MCP Slash Commands

The server provides 6 slash commands for quick access in Claude Code:

  1. /project:index - Index a codebase directory (automatically performs full or incremental)
  2. /project:query - Search the indexed codebase
  3. /project:stats - Get index statistics
  4. /project:clear - Clear all indexed data
  5. /project:search - Advanced search with filters
  6. /project:git-search - Search git commit history with on-demand indexing

See for detailed usage.

Supported File Types

Project RAG automatically indexes and searches 40+ file types across three categories:

Programming Languages (24 languages)

Supports AST-based semantic chunking for these languages:

  • Rust (.rs)
  • Python (.py)
  • JavaScript (.js, .mjs, .cjs), TypeScript (.ts), JSX (.jsx), TSX (.tsx)
  • Go (.go)
  • Java (.java)
  • C (.c), C++ (.cpp, .cc, .cxx), C/C++ Headers (.h, .hpp)
  • C# (.cs)
  • Swift (.swift)
  • Kotlin (.kt, .kts)
  • Scala (.scala)
  • Ruby (.rb)
  • PHP (.php)
  • Shell (.sh, .bash)
  • SQL (.sql)
  • HTML (.html, .htm)
  • CSS (.css), SCSS (.scss, .sass)

Documentation Formats (8 formats)

With special handling for rich content:

  • Markdown (.md, .markdown)
  • PDF (.pdf) - Automatically converted to Markdown with table preservation
  • reStructuredText (.rst)
  • AsciiDoc (.adoc, .asciidoc)
  • Org Mode (.org)
  • Plain Text (.txt)
  • Log Files (.log)

PDF Conversion Features:

  • Extracts text content using pdf-extract library
  • Converts to Markdown format automatically
  • Preserves table structures (detects tab/space-separated columns)
  • Detects and formats headings (ALL CAPS lines and section markers)
  • Handles multi-column layouts intelligently
  • Chunks like any other text file (50 lines per chunk by default)

Configuration Files (8 formats)

For complete project understanding:

  • JSON (.json)
  • YAML (.yaml, .yml)
  • TOML (.toml)
  • XML (.xml)
  • INI (.ini)
  • Config files (.conf, .config, .cfg)
  • Properties (.properties)
  • Environment (.env)

Example Use Cases

# Index documentation PDFs in your project
query_codebase("API authentication flow")  # Finds content in .pdf, .md, .rst files

# Search configuration files
query_codebase("database connection string")  # Finds .yaml, .toml, .env, .conf files

# Find code implementations
search_by_filters(query="JWT validation", file_extensions=["rs", "go"])

MCP Tools

The server provides 6 tools that can be used directly:

  1. index_codebase - Smartly index a codebase directory

    • Automatically performs full indexing for new codebases
    • Automatically performs incremental updates for previously indexed codebases
    • Respects .gitignore and exclude patterns
    • Returns mode information (full or incremental)
  2. query_codebase - Hybrid semantic + keyword search across the indexed code

    • Combines vector similarity with BM25 keyword matching (enabled by default)
    • Returns relevant code chunks with both vector and keyword scores
    • Configurable result limit and score threshold
    • Optional project filtering for multi-project setups
  3. get_statistics - Get statistics about the indexed codebase

    • File counts, chunk counts, embedding counts
    • Language breakdown
  4. clear_index - Clear all indexed data

    • Deletes the entire vector database collection
    • Prepares for fresh indexing
  5. search_by_filters - Advanced hybrid search with filters

    • Always uses hybrid search for best results
    • Filter by file extensions (e.g., ["rs", "toml"])
    • Filter by programming languages
    • Filter by path patterns
    • Optional project filtering
  6. search_git_history - Search git commit history using semantic search

    • Automatically indexes commits on-demand (default: 10 commits, configurable)
    • Searches commit messages, diffs, author info, and changed files
    • Smart caching: only indexes new commits as needed
    • Regex filtering by author name/email and file paths
    • Date range filtering (ISO 8601 or Unix timestamp)
    • Branch selection support

Prerequisites

  • Rust: 1.83+ with Rust 2024 edition support
  • protobuf-compiler: Required for building (install via sudo apt-get install protobuf-compiler on Ubuntu/Debian)

Vector Database Options

LanceDB (Default - Embedded, Stable)

No additional setup needed! LanceDB is an embedded vector database that runs directly in the application. It stores data in ./.lancedb directory by default.

Why LanceDB is the default:

  • Embedded - No external dependencies or servers required
  • Stable - Production-proven with ACID transactions
  • Feature-rich - Full SQL-like filtering capabilities
  • Hybrid search built-in - Tantivy BM25 + LanceDB vector with Reciprocal Rank Fusion
  • Columnar storage - Efficient for large datasets with Apache Arrow
  • Zero-copy - Memory-mapped files for fast queries

Qdrant (Optional - Server-Based)

To use Qdrant instead of LanceDB, build with the qdrant-backend feature:

cargo build --release --no-default-features --features qdrant-backend

Then start a Qdrant instance:

Using Docker (Recommended):

docker run -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_data:/qdrant/storage \
    qdrant/qdrant

Using Docker Compose:

version: '3.8'
services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - ./qdrant_data:/qdrant/storage

Or download standalone: https://qdrant.tech/documentation/guides/installation/

Installation

# Navigate to the project
cd project-rag

# Install protobuf compiler (Ubuntu/Debian)
sudo apt-get install protobuf-compiler

# Build the release binary (with default LanceDB backend - stable and embedded!)
cargo build --release

# Or build with Qdrant backend (requires external server)
cargo build --release --no-default-features --features qdrant-backend

# The binary will be at target/release/project-rag

Usage

Running as MCP Server

The server communicates over stdio following the MCP protocol:

./target/release/project-rag

Configuring in Claude Code

Add the MCP server to Claude Code using the CLI:

# Navigate to the project directory first
cd /path/to/project-rag

# Add the MCP server to Claude Code
claude mcp add project --command "$(pwd)/target/release/project-rag"

# Or with logging enabled
claude mcp add project --command "$(pwd)/target/release/project-rag" --env RUST_LOG=info

After adding, restart Claude Code to load the server. The slash commands (/project:index, /project:query, etc.) will be available immediately.

Configuring in Claude Desktop

Add to your Claude Desktop config:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Linux: ~/.config/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "project-rag": {
      "command": "/absolute/path/to/project-rag/target/release/project-rag",
      "env": {
        "RUST_LOG": "info"
      }
    }
  }
}

Note: Claude Code and Claude Desktop are different products with different configuration methods.

Example Tool Usage

Index a codebase:

{
  "path": "/path/to/your/project",
  "include_patterns": ["**/*.rs", "**/*.toml"],
  "exclude_patterns": ["**/target/**", "**/node_modules/**"],
  "max_file_size": 1048576
}

Query the codebase:

{
  "query": "How does authentication work?",
  "limit": 10,
  "min_score": 0.7
}

Advanced filtered search:

{
  "query": "database connection pool",
  "limit": 5,
  "min_score": 0.75,
  "file_extensions": ["rs"],
  "languages": ["Rust"],
  "path_patterns": ["src/db"]
}

Index (or re-index) a codebase:

{
  "path": "/path/to/your/project",
  "include_patterns": [],
  "exclude_patterns": []
}

Note: This automatically performs a full index for new codebases or an incremental update for previously indexed ones.

Architecture

project-rag/
├── src/
│   ├── bm25_search.rs      # Tantivy BM25 keyword search with RRF fusion
│   ├── embedding/          # FastEmbed integration for local embeddings
│   │   ├── mod.rs          # EmbeddingProvider trait
│   │   └── fastembed_manager.rs  # all-MiniLM-L6-v2 implementation
│   ├── vector_db/          # Vector database implementations
│   │   ├── mod.rs          # VectorDatabase trait
│   │   ├── lance_client.rs # LanceDB + Tantivy hybrid search (default)
│   │   └── qdrant_client.rs  # Qdrant implementation (optional)
│   ├── indexer/            # File walking and code chunking
│   │   ├── mod.rs          # Module exports
│   │   ├── file_walker.rs  # Directory traversal with .gitignore + 40+ file types
│   │   ├── chunker.rs      # Chunking strategies (AST-based, fixed-lines, sliding window)
│   │   ├── ast_parser.rs   # Tree-sitter AST parsing for 12 languages
│   │   └── pdf_extractor.rs # PDF to Markdown converter with table support
│   ├── mcp_server.rs       # MCP server with 6 tools
│   ├── types.rs            # Request/Response types with JSON schema
│   ├── main.rs             # Binary entry point with stdio transport
│   └── lib.rs              # Library root
├── Cargo.toml              # Rust 2024 edition with dependencies
├── README.md               # This file
├── NOTES.md                # Development notes and known issues
├── TEST_RESULTS.md         # Unit test results (10 tests passing)
└── COVERAGE_ANALYSIS.md    # Detailed test coverage analysis

Configuration

Environment Variables

  • RUST_LOG - Set logging level (options: error, warn, info, debug, trace)
    • Example: RUST_LOG=debug cargo run

Qdrant Configuration

  • Currently hardcoded to http://localhost:6334
  • Future: Add configuration file support

Embedding Model

  • Default: all-MiniLM-L6-v2 (384 dimensions)
  • First run downloads model (~50MB) to cache

Chunking Strategy

  • Default: Hybrid AST-based with fallback to fixed-lines
  • AST Parsing: Extracts semantic units (functions, classes, methods) for Rust, Python, JavaScript, TypeScript, Go, Java, Swift, C, C++, C#, Ruby, PHP
  • Fallback: 50 lines per chunk for unsupported languages
  • Alternative: Sliding window with configurable overlap

Technical Details

Embeddings

  • Model: all-MiniLM-L6-v2 (Sentence Transformers)
  • Dimensions: 384
  • Library: fastembed-rs with ONNX runtime
  • Performance: ~500 embeddings/second

Vector Database

  • Engine: Qdrant
  • Distance Metric: Cosine similarity
  • Index: HNSW for fast approximate nearest neighbor search
  • Payload: Stores file path, project, line numbers, language, hash, timestamp, content

Hybrid Search

  • Vector Similarity: Semantic understanding via embeddings (LanceDB or Qdrant)
  • Keyword Matching: Full-text BM25 search via Tantivy inverted index
  • Fusion Algorithm: Reciprocal Rank Fusion (RRF) with k=60 constant
  • BM25 Parameters: Uses Tantivy's optimized BM25 implementation
  • Ranking: RRF combines both rankings using 1/(k+rank) formula
  • Performance: Both indexes queried in parallel for fast results

Concurrent Access & Lock Safety

The BM25 (Tantivy) index uses file-based locks to prevent concurrent writes. The system includes smart lock management to handle multiple agents safely:

Stale Lock Detection:

  • Lock files are checked for staleness (>5 minutes old)
  • Uses file modification timestamps to detect crashed processes
  • Fresh locks (<5 minutes) are treated as active

Automatic Recovery:

  • When indexing fails with a lock error, the system checks if locks are stale
  • Stale locks (from crashes): Automatically cleaned up and indexing retries
  • Active locks (from running agents): Returns clear error message asking to wait

Error Messages:

  • Active indexing detected: "BM25 index is currently being used by another process. Please wait and try again later."
  • Stale locks cleaned: Logs warnings and retries automatically

Thread Safety:

  • In-process synchronization: Mutex prevents concurrent writers within the same process
  • Cross-process safety: File-based locks prevent concurrent writers across different processes
  • Read operations are always safe and never blocked

Best Practices:

  • If you see the "currently being used" error, wait for the other indexing operation to complete
  • Indexing operations typically complete in seconds to minutes depending on codebase size
  • Multiple agents can safely perform search operations simultaneously (reads are never locked)

Code Chunking

  • Default: Hybrid AST-based chunking
  • AST Support: Rust, Python, JavaScript, TypeScript, Go, Java, Swift, C, C++, C#, Ruby, PHP
  • Fallback: 50 lines per chunk for unsupported languages
  • Metadata: Tracks start/end lines, language, file hash, project

File Processing

  • Binary Detection: 30% non-printable byte threshold (PDFs handled specially)
  • Language Detection: 40+ file types supported (code, docs, configs)
  • PDF Processing: Automatic text extraction and Markdown conversion with table preservation
  • Hash Algorithm: SHA256 for change detection (works for all file types including PDFs)
  • .gitignore Support: Uses ignore crate

Development

Running Tests

# Run all unit tests (343 tests with ~94% coverage)
cargo test --lib

# Run specific module tests
cargo test --lib types::tests
cargo test --lib chunker::tests
cargo test --lib pdf_extractor::tests  # PDF to Markdown conversion tests
cargo test --lib bm25_search::tests    # Includes concurrent access & lock safety tests
cargo test --lib config::tests         # Includes validation & env override tests
cargo test --lib indexing::tests       # Includes error path & edge case tests

# Run with output
cargo test --lib -- --nocapture

# Run with code coverage
cargo llvm-cov --lib --html
# Open target/llvm-cov/html/index.html to view coverage report

Building

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Check without building
cargo check

Code Quality

# Format code
cargo fmt

# Lint with clippy
cargo clippy

# Fix clippy warnings
cargo clippy --fix

Debugging

# Run with debug logging
RUST_LOG=debug cargo run

# Run with trace logging
RUST_LOG=trace cargo run

Performance

Benchmarks (Typical Hardware)

  • Indexing Speed: ~1000 files/minute

    • Depends on file size and complexity
    • Includes file I/O, hashing, chunking, embedding generation
  • Search Latency: 20-30ms per query

    • ~95% recall with HNSW index
    • Sub-50ms for most queries
  • Memory Usage:

    • Base: ~100MB
    • Embedding model: ~50MB
    • Per 10k chunks: ~40MB (embeddings + metadata)
  • Storage:

    • Embeddings: ~1.5KB per chunk (384 floats)
    • Typical project (1000 files): ~75MB in Qdrant

Optimization Tips

  1. Adjust chunk size: Smaller chunks = more precise but slower indexing
  2. Use filters: Pre-filter by language/extension for faster searches
  3. Batch processing: Default 32 chunks per batch is optimal for most systems
  4. Incremental updates: Use after initial index to save time

Current Status

✅ Production Ready - 100% Complete

  • Core architecture with modular design
  • All 6 MCP tools implemented and working
  • All 6 MCP slash commands implemented
  • Hybrid search - Vector similarity + Full BM25 with IDF
  • AST-based chunking - Semantic code extraction for 12 languages
  • Multi-project support - Index and query multiple codebases
  • Persistent hash cache - Fast incremental updates across restarts
  • Concurrent access protection - Smart lock management prevents index corruption
  • FastEmbed integration for local embeddings
  • Qdrant vector database integration
  • File walking with .gitignore support
  • Language detection (40+ file types: code, docs, configs)
  • PDF to Markdown conversion with table preservation
  • SHA256-based change detection
  • 343 unit tests passing (including PDF extraction, BM25/RRF, and lock safety tests)
  • Comprehensive documentation
  • Full MCP prompts support enabled
  • Hybrid search with Tantivy BM25 + LanceDB vector using RRF

📋 Known Limitations

  1. Qdrant API Changes

    • Requires builder patterns (UpsertPointsBuilder, SearchPointsBuilder, etc.)
    • All builders implemented correctly
  2. FastEmbed Mutability

    • Uses unsafe workaround for mutable model access
    • Works correctly but should be refactored to use Arc<Mutex<>>
  3. Async Trait Warnings

    • 9 harmless warnings about async fn in public traits
    • Cosmetic issue, does not affect functionality

Limitations

Current Limitations

  • Qdrant Backend: Requires external Qdrant server when using qdrant-backend feature

    • Default LanceDB backend is fully embedded with no external dependencies
  • Model Download: First run downloads ~50MB model

    • Future: Include model in binary or provide offline installer
  • Path Filtering: Currently post-query filtering (not optimized)

    • Future: Add Qdrant payload indexing for path patterns
  • No Configuration File: All settings hardcoded

    • Future: Add TOML/YAML config support

Scale Limitations

  • Large Codebases: Projects with 100k+ files may take significant time to index

    • Mitigation: Use incremental updates
  • Memory: Very large indexes (1M+ chunks) may require significant RAM

    • Typical project (5k files) uses <500MB total

Troubleshooting

Index Lock Errors

Error: "BM25 index is currently being used by another process"

This means another agent or process is actively indexing. This is expected behavior to prevent index corruption.

Solutions:

  1. Wait: Let the current indexing operation complete (typically seconds to minutes)
  2. Check processes: Verify no other Claude Code/Desktop instances are running indexing operations
  3. Force cleanup (last resort): If you're certain no other process is running, manually remove stale locks:
    rm ~/.local/share/project-rag/lancedb/lancedb_bm25/.tantivy-*.lock
    

Note: The system automatically detects and cleans up stale locks (>5 minutes old) from crashed processes. You should rarely need manual intervention.

Qdrant Connection Fails

# Check if Qdrant is running
curl http://localhost:6334/health

# View Qdrant logs
docker logs <container-id>

Model Download Fails

# Pre-download model
python -c "from fastembed import TextEmbedding; TextEmbedding()"

# Or set HuggingFace mirror
export HF_ENDPOINT=https://hf-mirror.com

Out of Memory

# Reduce batch size (edit source)
# Or index in smaller chunks
# Or use smaller embedding model

Slow Indexing

# Check disk I/O
# Reduce max_file_size
# Use exclude_patterns to skip unnecessary files

Future Enhancements

High Priority

  • Add comprehensive integration tests
  • Configuration file support (TOML)
  • Cache IDF statistics to disk for faster startup

Medium Priority

  • Embedded vector DB option (no external dependencies)
  • Support for more embedding models
  • Performance benchmarks and profiling
  • AST support for more languages (Kotlin, Perl, Scala, etc.)

Low Priority

  • Web UI for testing/debugging
  • Metrics and monitoring endpoints
  • Multi-language documentation
  • Alternative transport mechanisms (HTTP, WebSocket)

License

MIT License - see LICENSE file for details

Contributing

Contributions welcome! Please ensure:

  1. Code Quality:

    • Source files stay under 600 lines (enforced)
    • Code is formatted with cargo fmt
    • Clippy lints pass (cargo clippy)
  2. Testing:

    • Add tests for new functionality
    • Existing tests pass (cargo test)
    • Update documentation
  3. Commits:

    • Clear, descriptive commit messages
    • One logical change per commit
    • Reference issues where applicable

Support

Acknowledgments

  • rmcp: Official Rust Model Context Protocol SDK
  • Qdrant: High-performance vector database
  • FastEmbed: Fast local embedding generation
  • Claude: For MCP protocol and testing

Built with ❤️ using Rust 2024 Edition