hybrid-rag-project by gwyer - MCP Server

Hybrid RAG Project

A generalized Retrieval-Augmented Generation (RAG) system with hybrid search capabilities that works with any documents you provide. Combines semantic (dense vector) search and keyword (sparse BM25) search for optimal document retrieval, with an MCP server API for easy integration.

🎯 Key Features: Multi-format support • Local LLM • Claude Desktop integration • Structured data queries • Document-type-aware retrieval

🚀 Quick Start (No MCP Required!)

You don't need Claude Desktop or MCP to use this project! Just run:

# 1. Make sure Ollama is running
ollama serve

# 2. Activate virtual environment
source .venv/bin/activate

# 3. Start conversational demo (recommended)
python scripts/demos/conversational.py

# Or use the shortcut
./scripts/bin/ask.sh

That's it! Ask questions about the 43,835 document chunks in the sample dataset.

📖 See for complete usage instructions. 📚 Browse all documentation in the folder or start with .

Overview

This project implements a hybrid RAG system that combines:

Semantic Search: Dense vector embeddings for understanding meaning and context
Keyword Search: BM25 sparse retrieval for exact keyword matching
Hybrid Fusion: Reciprocal Rank Fusion (RRF) to combine results from both methods
MCP Server: Both REST API and Model Context Protocol server for Claude integration
Multi-format Support: Automatically loads documents from various file formats

The hybrid approach ensures better retrieval accuracy by leveraging the strengths of both search methods.

Features

Vector-based semantic search using Chroma and Ollama embeddings
BM25 keyword search for exact term matching
Ensemble retriever with Reciprocal Rank Fusion (RRF)
Integration with local Ollama LLM for answer generation
Support for multiple document formats (TXT, PDF, MD, DOCX, CSV)
Automated document loading from data directory
RESTful API server with /ingest and /query endpoints
Model Context Protocol (MCP) server for Claude Desktop/API integration
Configuration-driven architecture (no hardcoded values)
Persistent vector store for faster subsequent queries

Architecture

User Documents → data/ directory
                      ↓
            Document Loader
                      ↓
Query → Hybrid Retriever → [Vector Retriever + BM25 Retriever]
                         → RRF Fusion
                         → Retrieved Context
                         → LLM (Ollama)
                         → Final Answer

Prerequisites

Python 3.9+
Ollama installed and running locally
Required Ollama models:
- llama3.1:latest (or another LLM model)
- nomic-embed-text (or another embedding model)

Installing Ollama

Visit ollama.ai to download and install Ollama for your platform.

After installation, pull the required models:

ollama pull llama3.1:latest
ollama pull nomic-embed-text

Verify Ollama is running:

curl http://localhost:11434/api/tags

Installation

Clone the repository:

git clone <your-repo-url>
cd hybrid-rag-project

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Project Structure

hybrid-rag-project/
├── src/
│   └── hybrid_rag/            # Core application package
│       ├── __init__.py        # Package initialization
│       ├── document_loader.py # Document loading utility
│       ├── structured_query.py# CSV query engine
│       └── utils.py           # Logging and utility functions
├── scripts/
│   ├── run_demo.py            # Main demonstration script
│   ├── mcp_server.py          # REST API server
│   └── mcp_server_claude.py   # MCP server for Claude integration
├── config/
│   ├── config.yaml            # Configuration file
│   └── claude_desktop_config.json # Sample Claude Desktop MCP config
├── docs/
│   ├── INSTALLATION.md        # Detailed installation guide
│   ├── STRUCTURED_QUERIES.md  # CSV query documentation
│   ├── ASYNC_INGESTION.md     # Async ingestion guide
│   └── SHUTDOWN.md            # Shutdown handling guide
├── data/                      # Sample data files (13 files included)
│   ├── *.csv                  # 7 CSV files (structured data)
│   ├── *.md                   # 5 Markdown files (unstructured)
│   └── *.txt                  # 1 Text file (technical specs)
├── chroma_db/                 # Vector store (auto-created)
├── tests/                     # Unit tests
│   └── extract_fields_tests.py
├── setup.py                   # Package setup file
├── requirements.txt           # Python dependencies
├── TESTING_RESULTS.md         # Comprehensive test results
├── CONTRIBUTING.md            # Contribution guidelines
├── CHANGELOG.md               # Version history
├── LICENSE                    # MIT License
└── README.md                  # This file

Sample Data (UCSC Extension Project)

This repository includes 13 sample data files for demonstration and testing purposes. These files represent a realistic business scenario for TechVision Electronics and are designed to showcase the system's capabilities across multiple document types.

📊 Included Sample Files

Structured Data (CSV) - 7 files:

product_catalog.csv - Product inventory with specifications (5,000 rows)
inventory_levels.csv - Stock levels and warehouse data (10,000 rows)
sales_orders_november.csv - Monthly sales transactions (8,000 rows)
warranty_claims_q4.csv - Customer warranty claims (3,000 rows)
production_schedule_dec2024.csv - Manufacturing schedule (4,000 rows)
supplier_pricing.csv - Vendor pricing information (6,000 rows)
shipping_manifests.csv - Shipping and logistics data (5,000 rows)

Unstructured Data (Markdown) - 5 files:

customer_feedback_q4_2024.md - Customer reviews and feedback (600 chunks)
market_analysis_2024.md - Market research and trends (400 chunks)
quality_control_report_nov2024.md - QC findings and issues (501 chunks)
return_policy_procedures.md - Policy documentation (300 chunks)
support_tickets_summary.md - Technical support summary (700 chunks)

Text Data - 1 file:

product_specifications.txt - Technical specifications (334 chunks)

Total Dataset:

41,000 CSV rows (chunked into 41,000 documents at 10 rows per chunk)
2,835 text/markdown chunks (chunked at 1000 chars with 200 char overlap)
43,835 total searchable document chunks

🎯 Purpose

These sample files are included to:

Demonstrate the system's hybrid search capabilities
Test both semantic (vector) and lexical (keyword) retrieval
Validate document-type-aware retrieval architecture
Provide immediate working examples without additional setup
Showcase cross-document query synthesis

📖 Testing Results

Comprehensive testing results are documented in TESTING_RESULTS.md, showing:

✅ 100% retrieval success rate across all document types
✅ 17 test queries with detailed results
✅ Performance metrics and comparative analysis
✅ Semantic vs Lexical vs Hybrid search comparison

💡 Using the Sample Data

Quick Start:

# 1. Run setup
./setup.sh

# 2. The sample data is already in data/ - ready to use!

# 3. Run the demo
python scripts/run_demo.py

# 4. Or use Claude Desktop
# Configure MCP server and query: "What are the prices in the product catalog?"

For Production Use: To use your own data instead:

Remove or backup the sample files from data/
Add your own documents (TXT, PDF, MD, DOCX, CSV)
Re-run ingestion
Optionally uncomment data exclusions in .gitignore


## Configuration

All settings are managed in `config/config.yaml`:

```yaml
# Ollama Configuration
ollama:
  base_url: "http://localhost:11434"
  embedding_model: "nomic-embed-text"
  llm_model: "llama3.1:latest"

# Data Configuration
data:
  directory: "./data"
  supported_formats:
    - "txt"
    - "pdf"
    - "md"
    - "docx"
    - "csv"

# Retrieval Configuration
retrieval:
  vector_search_k: 2
  keyword_search_k: 2

# MCP Server Configuration
mcp_server:
  host: "0.0.0.0"
  port: 8000

# Vector Store Configuration
vector_store:
  persist_directory: "./chroma_db"

Modify this file to:

Use different Ollama models
Change the data directory location
Adjust retrieval parameters (k values)
Configure server host/port
Change vector store persistence location

Usage

Option 1: Command Line Script

Add your documents to the data/ directory:

cp /path/to/your/documents/*.pdf data/
cp /path/to/your/documents/*.txt data/

Run the script:

python scripts/run_demo.py

The script will:

Load all supported documents from the data/ directory
Initialize Ollama embeddings and LLM
Create vector and BM25 retrievers
Build the hybrid RAG chain
Execute sample queries and display results

Option 2: REST API Server

Start the REST API server:

python scripts/mcp_server.py

The server will start on http://localhost:8000

To stop the server: Press Ctrl+C for graceful shutdown

Ingest documents (do this first):

curl -X POST http://localhost:8000/ingest

Response:

{
  "status": "success",
  "message": "Documents ingested successfully",
  "documents_loaded": 15
}

Query documents:

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the main topic of these documents?"}'

Response:

{
  "answer": "Based on the documents...",
  "context": [
    {
      "content": "Document text...",
      "source": "example.pdf",
      "type": ".pdf"
    }
  ]
}

Check server status:

curl http://localhost:8000/status

API Endpoints

Endpoint	Method	Description
`/`	GET	Health check
`/ingest`	POST	Load documents from data/ directory
`/query`	POST	Query documents with hybrid search
`/status`	GET	Get system status and configuration

Option 3: Claude Desktop/API via MCP

The MCP (Model Context Protocol) server allows Claude to directly query your local RAG system.

Setup for Claude Desktop

First, add documents to your data directory:

cp /path/to/your/documents/*.pdf data/

Edit the config/claude_desktop_config.json file to use the correct absolute path:

{
  "mcpServers": {
    "hybrid-rag": {
      "command": "python",
      "args": [
        "/absolute/path/to/hybrid-rag-project/scripts/mcp_server_claude.py"
      ],
      "env": {
        "PYTHONPATH": "/absolute/path/to/hybrid-rag-project"
      }
    }
  }
}

Add this configuration to Claude Desktop:

On macOS:

# Copy the configuration
mkdir -p ~/Library/Application\ Support/Claude
# Edit the file and add your MCP server configuration
nano ~/Library/Application\ Support/Claude/claude_desktop_config.json

On Windows:

%APPDATA%\Claude\claude_desktop_config.json

On Linux:

~/.config/Claude/claude_desktop_config.json

Restart Claude Desktop
In Claude Desktop, you'll now see the MCP tools available. You can ask Claude:
- "Use the ingest_documents tool to load my documents"
- "Query my documents about [your question]"
- "Check the status of the RAG system"

Available MCP Tools

Claude will have access to these tools:

Document Ingestion & Search:

ingest_documents: Start loading and indexing documents asynchronously from the data/ directory
get_ingestion_status: Monitor the progress of document ingestion (percentage, current file, stage)
query_documents: Query the documents using hybrid search (semantic + keyword)
get_status: Check the RAG system status

Structured Data Queries (for CSV files):

list_datasets: List all available CSV datasets with columns and row counts
count_by_field: Count rows where a field matches a value (e.g., "count people named Michael")
filter_dataset: Get all rows matching field criteria (e.g., "all people from Company X")
get_dataset_stats: Get statistics about a dataset (rows, columns, memory usage)

Async Ingestion with Progress Tracking

The ingestion process now runs asynchronously with real-time progress updates:

Non-blocking: Ingestion runs in the background
Progress tracking: See percentage complete (0-100%)
File-level updates: Know which file is currently being processed
Stage information: Loading files (0-80%) → Building index (80-100%) → Completed
Status monitoring: Check progress at any time with get_ingestion_status

Example Usage with Claude

You: "Please start ingesting my documents"
Claude: [Uses ingest_documents tool]
        "Ingestion started. Use get_ingestion_status to monitor progress."

You: "Check the ingestion status"
Claude: [Uses get_ingestion_status tool]
        "Ingestion Status: In Progress
         Progress: 45%
         Stage: loading_files
         Files Processed: 9/20
         Current File: document.pdf
         Documents Loaded: 15"

You: "Check status again"
Claude: [Uses get_ingestion_status tool]
        "Ingestion Status: Completed ✅
         Progress: 100%
         Total Files Processed: 20
         Total Documents Loaded: 35

         You can now use query_documents to search the documents."

You: "What are the main topics in my documents?"
Claude: [Uses query_documents tool with your question]
        "Based on the documents, the main topics are..."

Structured Data Queries

For CSV files, use structured query tools for exact counts and filtering:

You: "List available datasets"
Claude: [Uses list_datasets tool]
        "Available Datasets:
         📊 contacts
            Rows: 24,697
            Columns (7): First Name, Last Name, URL, Email Address, Company, Position, Connected On"

You: "Count how many people are named Michael in the contacts dataset"
Claude: [Uses count_by_field tool with dataset="contacts", field="First Name", value="Michael"]
        "Count Result:
         Dataset: contacts
         Field: First Name
         Value: Michael
         Count: 226 out of 24,697 total rows (0.92%)"

You: "Show me all the Michaels"
Claude: [Uses filter_dataset tool]
        "Filter Results:
         Found: 226 rows
         Showing: 100 rows (truncated to 100)

         [1] First Name: Michael | Last Name: Randel | Company: Randel Consulting Associates ..."

When to use each approach:

Structured queries (count_by_field, filter_dataset): For exact counts, filtering, and structured data
Semantic search (query_documents): For conceptual questions, understanding content, summarization

Supported File Formats

The system automatically loads and processes these formats:

.txt - Plain text files
.pdf - PDF documents
.md - Markdown files
.docx - Microsoft Word documents
.csv - CSV files

Simply drop any supported files into the data/ directory!

How It Works

Document Loading

The DocumentLoaderUtility class:

Scans the data/ directory recursively
Identifies supported file formats
Uses appropriate loaders for each format
Adds metadata (source file, file type) to each document
Returns a list of Document objects ready for indexing

Hybrid Retrieval

The EnsembleRetriever uses Reciprocal Rank Fusion (RRF) to:

Retrieve top-k results from vector search (semantic)
Retrieve top-k results from BM25 search (keyword)
Assign reciprocal rank scores to each result
Combine scores to produce a unified ranking
Return the most relevant documents overall

This approach handles:

Semantic queries ("How do I request time off?")
Keyword queries ("PTO form HR-42")
Complex queries benefiting from both methods

Customization

Using Different Models

Edit config/config.yaml to change models:

ollama:
  embedding_model: "your-embedding-model"
  llm_model: "your-llm-model"

Adjusting Retrieval Parameters

Modify the k values in config/config.yaml:

retrieval:
  vector_search_k: 5   # Return top 5 from semantic search
  keyword_search_k: 5  # Return top 5 from keyword search

Adding More File Format Support

Edit src/hybrid_rag/document_loader.py to add more loaders:

self.supported_loaders = {
    '.txt': TextLoader,
    '.pdf': PyPDFLoader,
    '.json': JSONLoader,  # Add this
    # ... more formats
}

Customizing the Prompt

Edit the prompt template in scripts/run_demo.py or scripts/mcp_server.py:

prompt = ChatPromptTemplate.from_template("""
Your custom prompt here...

<context>
{context}
</context>

Question: {input}
""")

Development Workflow

Add documents to data/ directory
Modify configuration in config/config.yaml as needed
Test with command line: python scripts/run_demo.py
Deploy MCP server: python scripts/mcp_server.py
Integrate via API in your applications

Troubleshooting

"Error connecting to Ollama"

Ensure Ollama is installed and running
Check that the Ollama service is accessible at the configured URL
Verify models are downloaded: ollama list

"No documents found in data directory"

Add files to the data/ directory
Ensure files have supported extensions (.txt, .pdf, .md, .docx, .csv)
Check the config/config.yaml data directory path is correct

"ModuleNotFoundError"

Ensure virtual environment is activated: source .venv/bin/activate
Reinstall dependencies: pip install -r requirements.txt

Poor Retrieval Results

Add more relevant documents to the data/ directory
Adjust k values in config/config.yaml
Try different embedding models
Ensure query terminology matches document content

API Errors

Ensure you call /ingest before /query
Check server logs for detailed error messages
Verify Ollama is running and accessible
Check that documents were successfully loaded

Example: Complete Workflow

# 1. Activate environment
source .venv/bin/activate

# 2. Add your documents
cp ~/my-docs/*.pdf data/

# 3. Start MCP server
python scripts/mcp_server.py &

# 4. Ingest documents
curl -X POST http://localhost:8000/ingest

# 5. Query your documents
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Summarize the key points"}'

# 6. Check status
curl http://localhost:8000/status

Dependencies

Core libraries:

langchain: Framework for LLM applications
langchain-community: Community integrations
langchain-ollama: Ollama integration
chromadb: Vector database for embeddings
rank-bm25: BM25 implementation for keyword search
fastapi: Web framework for API
uvicorn: ASGI server
pyyaml: YAML configuration parsing

Document loaders:

pypdf: PDF processing
python-docx: Word document processing
unstructured: Markdown and other formats

Performance Tips

Vector Store Persistence: The vector store is persisted to disk (chroma_db/) after ingestion, making subsequent queries faster.
Batch Processing: When adding many documents, use the /ingest endpoint once rather than multiple times.
Retrieval Parameters: Lower k values (e.g., 2-3) are faster and often sufficient for small document sets.
Model Selection: Smaller embedding models are faster but may sacrifice some accuracy.

License

This project is provided as-is for educational and demonstration purposes.

Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

Resources

Changelog

Version 2.0.0

Generalized system to work with any documents
Added data/ directory for document ingestion
Created DocumentLoaderUtility for multi-format support
Restructured project to follow Python best practices (src layout)
Moved all configuration to config/ directory
Moved all documentation to docs/ directory
Created proper Python package structure with setup.py
Organized scripts into scripts/ directory
Updated all import paths and documentation

Version 1.0.0

Initial implementation with sample HR documents
Basic hybrid search with vector and BM25 retrievers