mcp-rag-docs by jaimeferj - MCP Server

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

Document Storage: Upload and store text (.txt) and Markdown (.md) documents
Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
Vector Search: Efficient similarity search using Qdrant vector database
Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
REST API: FastAPI-based REST API with automatic OpenAPI documentation
MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
Code Indexing: Index and search source code repositories with semantic understanding
Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

Tag-Based Organization: Organize documents with multiple tags for easy categorization
Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
Context-Enhanced Answers: LLM receives section context for more accurate responses
Flexible Filtering: Filter documents by tags and/or section paths during queries
Document Structure API: Explore table of contents and section organization
GitHub Integration: Parse and extract content from GitHub URLs
Reference Following: Automatically follow documentation references for comprehensive answers
Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/
   config/
      __init__.py
      settings.py                # Configuration and settings
   rag_server/
      __init__.py
      models.py                  # Pydantic models for API
      openai_api.py              # OpenAI-compatible API endpoints
      openai_models.py           # OpenAI API models
      rag_system.py              # Core RAG system logic
      server.py                  # FastAPI server
      smart_query.py             # Smart query routing
   mcp_server/
      __init__.py
      server.py                  # MCP server implementation
   utils/
      __init__.py
      code_indexer.py            # Source code indexing
      code_index_store.py        # Code index storage
      document_processor.py      # Document processing
      embeddings.py              # Google AI embeddings
      frontmatter_parser.py      # YAML frontmatter parsing
      github_parser.py           # GitHub URL parsing
      google_api_client.py       # Google AI API client
      hierarchical_chunker.py    # Hierarchical document chunking
      markdown_parser.py         # Markdown parsing
      query_classifier.py        # Query type classification
      rate_limit_store.py        # Rate limiting
      reference_extractor.py     # Extract doc references
      retrieval_router.py        # Multi-mode retrieval routing
      source_extractor.py        # Extract source code snippets
      text_chunker.py            # Text chunking utility
      vector_store.py            # Qdrant vector store wrapper
   build_code_index.py          # Build code index from repository
   check_github_urls.py         # Validate GitHub URLs
   check_status.py              # System status checker
   example_usage.py             # Example usage scripts
   ingest_docs.py               # Document ingestion utility
   main.py                      # Main entry point
   .env.example                 # Example environment variables
   docker-compose.yml           # Docker setup for Qdrant
   pyproject.toml               # Project dependencies

Installation

Prerequisites

Python 3.13 or higher
Google AI Studio API key (Get one here)

Setup

Clone or navigate to the project directory
Install dependencies

# Using pip
pip install -e .

# Or using uv (recommended)
uv pip install -e .

Configure environment variables

# Copy the example env file
cp .env.example .env

# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here

Start Qdrant (optional - using Docker)

docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

POST /documents - Upload a document
POST /query - Query the RAG system (standard mode)
POST /query-enhanced - Query with automatic reference following
POST /smart-query - Smart query with automatic routing
GET /documents - List all documents
DELETE /documents/{doc_id} - Delete a document
GET /stats - Get system statistics
GET /health - Health check
GET /tags - List all available tags
GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

POST /v1/chat/completions - OpenAI-compatible chat completions
GET /v1/models - List available models

Example Usage with curl

# Upload a document
curl -X POST "http://localhost:8000/documents" \
  -F "file=@example.txt"

# Upload with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# Query the RAG system
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the main topic of the documents?", "top_k": 5}'

# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a Dagster asset?"}'

# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rag-smart",
    "messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
    "stream": false
  }'

# List documents
curl "http://localhost:8000/documents"

# Get statistics
curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

query_rag - Query the RAG system with a question
query_rag_enhanced - Query with automatic reference following
smart_query - Smart query with automatic routing and classification
add_document - Add a document to the RAG system
list_documents - List all stored documents
delete_document - Delete a document by ID
get_rag_stats - Get system statistics
get_tags - List all available tags
get_document_structure - Get document table of contents

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "rag": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-rag-docs",
        "run",
        "python",
        "-m",
        "mcp_server.server"
      ]
    }
  }
}

See for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

Variable	Description	Default
`GOOGLE_API_KEY`	Google AI Studio API key	(required)
`CHUNK_SIZE`	Size of text chunks in characters	1000
`CHUNK_OVERLAP`	Overlap between chunks	200
`TOP_K_RESULTS`	Number of chunks to retrieve	5
`QDRANT_PATH`	Path to Qdrant storage	./qdrant_storage
`QDRANT_COLLECTION_NAME`	Qdrant collection name	documents
`FASTAPI_HOST`	FastAPI server host	0.0.0.0
`FASTAPI_PORT`	FastAPI server port	8000
`EMBEDDING_MODEL`	Google embedding model	text-embedding-004
`LLM_MODEL`	Google LLM model	gemini-1.5-flash

Architecture

Document Processing Pipeline

Upload - User uploads a .txt or .md file
Processing - Document is read and metadata extracted (including frontmatter)
Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
Embedding - Each chunk is converted to a vector using Google AI embeddings
Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query

Query - User submits a question
Embedding - Question is converted to a vector
Retrieval - Similar chunks are retrieved from Qdrant
Generation - Context is provided to Google AI Studio model
Response - Answer is generated and returned with sources

Smart Query

Classification - Query is classified (documentation, code, conceptual, etc.)
Routing - Automatically selects best retrieval strategy
Multi-Source - May combine documentation search, code search, and direct answers
Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index
python build_code_index.py /path/to/repo

# Query code through the API or MCP server

Code is indexed with:

Class and function definitions
Docstrings and comments
File structure and imports
Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest

# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# List all available tags
curl "http://localhost:8000/tags"

# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'

# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"

# Query specific section
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure
# Installation
#   Prerequisites
#     Python Version
#   Setup Steps

# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

Standard (/query) - Basic vector search and retrieval
Enhanced (/query-enhanced) - Follows documentation references automatically
Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{
  "question": "How do I deploy?",
  "tags": ["dagster"],
  "section_path": "Deployment"
}

smart_query - Smart query with automatic routing

{
  "question": "What is an asset and how do I use it?"
}

add_document - Upload with tags

{
  "file_path": "/path/to/doc.md",
  "tags": ["dagster", "docs"]
}

get_tags - List all tags

get_document_structure - Get table of contents

{
  "doc_id": "abc123"
}

API Reference

Enhanced Endpoints

POST /documents

Body: file (multipart), tags (comma-separated string)
Response: Document info with tags and chunk count

POST /query

Body: {"question": "...", "tags": [...], "section_path": "..."}
Response: Answer with section-aware sources

POST /smart-query

Body: {"question": "..."}
Response: Smart answer with automatic routing and classification

GET /tags

Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

Query filtered by tags
Response: List of matching documents

POST /v1/chat/completions

OpenAI-compatible chat completion endpoint
Supports models: rag-standard, rag-enhanced, rag-smart
Supports streaming with stream: true

GET /v1/models

List available RAG models

Additional Documentation

- Quick setup guide for MCP integration
- Detailed MCP server setup
- OpenAI-compatible API documentation
- Smart query routing guide
- Multi-mode retrieval documentation
- Code indexing and search guide
- Rate limiting configuration
- Test coverage and testing guide

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Google AI Studio for embeddings and LLM capabilities
Qdrant for vector database
FastAPI for the REST API framework
Anthropic MCP for the Model Context Protocol