mcp-rag-docs

jaimeferj/mcp-rag-docs

3.2

If you are the rightful owner of mcp-rag-docs and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Model Context Protocol (MCP) server facilitates seamless integration with Claude and other MCP-compatible clients, enhancing the capabilities of the RAG system.

Tools
5
Resources
0
Prompts
0

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

  • Document Storage: Upload and store text (.txt) and Markdown (.md) documents
  • Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
  • Vector Search: Efficient similarity search using Qdrant vector database
  • Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
  • REST API: FastAPI-based REST API with automatic OpenAPI documentation
  • MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
  • OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
  • Code Indexing: Index and search source code repositories with semantic understanding
  • Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

  • Tag-Based Organization: Organize documents with multiple tags for easy categorization
  • Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
  • Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
  • Context-Enhanced Answers: LLM receives section context for more accurate responses
  • Flexible Filtering: Filter documents by tags and/or section paths during queries
  • Document Structure API: Explore table of contents and section organization
  • GitHub Integration: Parse and extract content from GitHub URLs
  • Reference Following: Automatically follow documentation references for comprehensive answers
  • Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
  • Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/
   config/
      __init__.py
      settings.py                # Configuration and settings
   rag_server/
      __init__.py
      models.py                  # Pydantic models for API
      openai_api.py              # OpenAI-compatible API endpoints
      openai_models.py           # OpenAI API models
      rag_system.py              # Core RAG system logic
      server.py                  # FastAPI server
      smart_query.py             # Smart query routing
   mcp_server/
      __init__.py
      server.py                  # MCP server implementation
   utils/
      __init__.py
      code_indexer.py            # Source code indexing
      code_index_store.py        # Code index storage
      document_processor.py      # Document processing
      embeddings.py              # Google AI embeddings
      frontmatter_parser.py      # YAML frontmatter parsing
      github_parser.py           # GitHub URL parsing
      google_api_client.py       # Google AI API client
      hierarchical_chunker.py    # Hierarchical document chunking
      markdown_parser.py         # Markdown parsing
      query_classifier.py        # Query type classification
      rate_limit_store.py        # Rate limiting
      reference_extractor.py     # Extract doc references
      retrieval_router.py        # Multi-mode retrieval routing
      source_extractor.py        # Extract source code snippets
      text_chunker.py            # Text chunking utility
      vector_store.py            # Qdrant vector store wrapper
   build_code_index.py          # Build code index from repository
   check_github_urls.py         # Validate GitHub URLs
   check_status.py              # System status checker
   example_usage.py             # Example usage scripts
   ingest_docs.py               # Document ingestion utility
   main.py                      # Main entry point
   .env.example                 # Example environment variables
   docker-compose.yml           # Docker setup for Qdrant
   pyproject.toml               # Project dependencies

Installation

Prerequisites

  • Python 3.13 or higher
  • Google AI Studio API key (Get one here)

Setup

  1. Clone or navigate to the project directory

  2. Install dependencies

# Using pip
pip install -e .

# Or using uv (recommended)
uv pip install -e .
  1. Configure environment variables
# Copy the example env file
cp .env.example .env

# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
  1. Start Qdrant (optional - using Docker)
docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

  • POST /documents - Upload a document
  • POST /query - Query the RAG system (standard mode)
  • POST /query-enhanced - Query with automatic reference following
  • POST /smart-query - Smart query with automatic routing
  • GET /documents - List all documents
  • DELETE /documents/{doc_id} - Delete a document
  • GET /stats - Get system statistics
  • GET /health - Health check
  • GET /tags - List all available tags
  • GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

  • POST /v1/chat/completions - OpenAI-compatible chat completions
  • GET /v1/models - List available models
Example Usage with curl
# Upload a document
curl -X POST "http://localhost:8000/documents" \
  -F "file=@example.txt"

# Upload with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# Query the RAG system
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the main topic of the documents?", "top_k": 5}'

# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a Dagster asset?"}'

# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rag-smart",
    "messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
    "stream": false
  }'

# List documents
curl "http://localhost:8000/documents"

# Get statistics
curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server
MCP Tools Available
  1. query_rag - Query the RAG system with a question
  2. query_rag_enhanced - Query with automatic reference following
  3. smart_query - Smart query with automatic routing and classification
  4. add_document - Add a document to the RAG system
  5. list_documents - List all stored documents
  6. delete_document - Delete a document by ID
  7. get_rag_stats - Get system statistics
  8. get_tags - List all available tags
  9. get_document_structure - Get document table of contents
Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "rag": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-rag-docs",
        "run",
        "python",
        "-m",
        "mcp_server.server"
      ]
    }
  }
}

See for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

VariableDescriptionDefault
GOOGLE_API_KEYGoogle AI Studio API key(required)
CHUNK_SIZESize of text chunks in characters1000
CHUNK_OVERLAPOverlap between chunks200
TOP_K_RESULTSNumber of chunks to retrieve5
QDRANT_PATHPath to Qdrant storage./qdrant_storage
QDRANT_COLLECTION_NAMEQdrant collection namedocuments
FASTAPI_HOSTFastAPI server host0.0.0.0
FASTAPI_PORTFastAPI server port8000
EMBEDDING_MODELGoogle embedding modeltext-embedding-004
LLM_MODELGoogle LLM modelgemini-1.5-flash

Architecture

Document Processing Pipeline

  1. Upload - User uploads a .txt or .md file
  2. Processing - Document is read and metadata extracted (including frontmatter)
  3. Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
  4. Embedding - Each chunk is converted to a vector using Google AI embeddings
  5. Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query
  1. Query - User submits a question
  2. Embedding - Question is converted to a vector
  3. Retrieval - Similar chunks are retrieved from Qdrant
  4. Generation - Context is provided to Google AI Studio model
  5. Response - Answer is generated and returned with sources
Smart Query
  1. Classification - Query is classified (documentation, code, conceptual, etc.)
  2. Routing - Automatically selects best retrieval strategy
  3. Multi-Source - May combine documentation search, code search, and direct answers
  4. Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index
python build_code_index.py /path/to/repo

# Query code through the API or MCP server

Code is indexed with:

  • Class and function definitions
  • Docstrings and comments
  • File structure and imports
  • Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest

# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

  • Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

  • Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

  • Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

  • Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# List all available tags
curl "http://localhost:8000/tags"

# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'

# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"

# Query specific section
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure
# Installation
#   Prerequisites
#     Python Version
#   Setup Steps

# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

  1. Standard (/query) - Basic vector search and retrieval
  2. Enhanced (/query-enhanced) - Follows documentation references automatically
  3. Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{
  "question": "How do I deploy?",
  "tags": ["dagster"],
  "section_path": "Deployment"
}

smart_query - Smart query with automatic routing

{
  "question": "What is an asset and how do I use it?"
}

add_document - Upload with tags

{
  "file_path": "/path/to/doc.md",
  "tags": ["dagster", "docs"]
}

get_tags - List all tags

get_document_structure - Get table of contents

{
  "doc_id": "abc123"
}

API Reference

Enhanced Endpoints

POST /documents

  • Body: file (multipart), tags (comma-separated string)
  • Response: Document info with tags and chunk count

POST /query

  • Body: {"question": "...", "tags": [...], "section_path": "..."}
  • Response: Answer with section-aware sources

POST /smart-query

  • Body: {"question": "..."}
  • Response: Smart answer with automatic routing and classification

GET /tags

  • Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

  • Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

  • Query filtered by tags
  • Response: List of matching documents

POST /v1/chat/completions

  • OpenAI-compatible chat completion endpoint
  • Supports models: rag-standard, rag-enhanced, rag-smart
  • Supports streaming with stream: true

GET /v1/models

  • List available RAG models

Additional Documentation

  • - Quick setup guide for MCP integration
  • - Detailed MCP server setup
  • - OpenAI-compatible API documentation
  • - Smart query routing guide
  • - Multi-mode retrieval documentation
  • - Code indexing and search guide
  • - Rate limiting configuration
  • - Test coverage and testing guide

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • Google AI Studio for embeddings and LLM capabilities
  • Qdrant for vector database
  • FastAPI for the REST API framework
  • Anthropic MCP for the Model Context Protocol