jaimeferj/mcp-rag-docs
If you are the rightful owner of mcp-rag-docs and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Model Context Protocol (MCP) server facilitates seamless integration with Claude and other MCP-compatible clients, enhancing the capabilities of the RAG system.
RAG Server with MCP Integration
A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.
Features
Core Capabilities
- Document Storage: Upload and store text (.txt) and Markdown (.md) documents
- Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
- Vector Search: Efficient similarity search using Qdrant vector database
- Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
- REST API: FastAPI-based REST API with automatic OpenAPI documentation
- MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
- OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
- Code Indexing: Index and search source code repositories with semantic understanding
- Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods
Advanced Features
- Tag-Based Organization: Organize documents with multiple tags for easy categorization
- Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
- Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
- Context-Enhanced Answers: LLM receives section context for more accurate responses
- Flexible Filtering: Filter documents by tags and/or section paths during queries
- Document Structure API: Explore table of contents and section organization
- GitHub Integration: Parse and extract content from GitHub URLs
- Reference Following: Automatically follow documentation references for comprehensive answers
- Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
- Rate Limiting: Built-in rate limiting for API endpoints
Project Structure
mcp-rag-docs/
config/
__init__.py
settings.py # Configuration and settings
rag_server/
__init__.py
models.py # Pydantic models for API
openai_api.py # OpenAI-compatible API endpoints
openai_models.py # OpenAI API models
rag_system.py # Core RAG system logic
server.py # FastAPI server
smart_query.py # Smart query routing
mcp_server/
__init__.py
server.py # MCP server implementation
utils/
__init__.py
code_indexer.py # Source code indexing
code_index_store.py # Code index storage
document_processor.py # Document processing
embeddings.py # Google AI embeddings
frontmatter_parser.py # YAML frontmatter parsing
github_parser.py # GitHub URL parsing
google_api_client.py # Google AI API client
hierarchical_chunker.py # Hierarchical document chunking
markdown_parser.py # Markdown parsing
query_classifier.py # Query type classification
rate_limit_store.py # Rate limiting
reference_extractor.py # Extract doc references
retrieval_router.py # Multi-mode retrieval routing
source_extractor.py # Extract source code snippets
text_chunker.py # Text chunking utility
vector_store.py # Qdrant vector store wrapper
build_code_index.py # Build code index from repository
check_github_urls.py # Validate GitHub URLs
check_status.py # System status checker
example_usage.py # Example usage scripts
ingest_docs.py # Document ingestion utility
main.py # Main entry point
.env.example # Example environment variables
docker-compose.yml # Docker setup for Qdrant
pyproject.toml # Project dependencies
Installation
Prerequisites
- Python 3.13 or higher
- Google AI Studio API key (Get one here)
Setup
-
Clone or navigate to the project directory
-
Install dependencies
# Using pip
pip install -e .
# Or using uv (recommended)
uv pip install -e .
- Configure environment variables
# Copy the example env file
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
- Start Qdrant (optional - using Docker)
docker-compose up -d
Usage
Running the FastAPI Server
Start the REST API server:
python -m rag_server.server
The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.
API Endpoints
Core Endpoints:
- POST /documents - Upload a document
- POST /query - Query the RAG system (standard mode)
- POST /query-enhanced - Query with automatic reference following
- POST /smart-query - Smart query with automatic routing
- GET /documents - List all documents
- DELETE /documents/{doc_id} - Delete a document
- GET /stats - Get system statistics
- GET /health - Health check
- GET /tags - List all available tags
- GET /documents/{doc_id}/sections - Get document structure
OpenAI-Compatible Endpoints:
- POST /v1/chat/completions - OpenAI-compatible chat completions
- GET /v1/models - List available models
Example Usage with curl
# Upload a document
curl -X POST "http://localhost:8000/documents" \
-F "file=@example.txt"
# Upload with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# Query the RAG system
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic of the documents?", "top_k": 5}'
# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a Dagster asset?"}'
# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "rag-smart",
"messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
"stream": false
}'
# List documents
curl "http://localhost:8000/documents"
# Get statistics
curl "http://localhost:8000/stats"
Running the MCP Server
The MCP server allows integration with Claude and other MCP-compatible clients.
python -m mcp_server.server
MCP Tools Available
- query_rag - Query the RAG system with a question
- query_rag_enhanced - Query with automatic reference following
- smart_query - Smart query with automatic routing and classification
- add_document - Add a document to the RAG system
- list_documents - List all stored documents
- delete_document - Delete a document by ID
- get_rag_stats - Get system statistics
- get_tags - List all available tags
- get_document_structure - Get document table of contents
Using with Claude Desktop
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"rag": {
"command": "uv",
"args": [
"--directory",
"/path/to/mcp-rag-docs",
"run",
"python",
"-m",
"mcp_server.server"
]
}
}
}
See for a quick setup guide.
Configuration
All configuration is managed through environment variables (defined in .env):
| Variable | Description | Default |
|---|---|---|
GOOGLE_API_KEY | Google AI Studio API key | (required) |
CHUNK_SIZE | Size of text chunks in characters | 1000 |
CHUNK_OVERLAP | Overlap between chunks | 200 |
TOP_K_RESULTS | Number of chunks to retrieve | 5 |
QDRANT_PATH | Path to Qdrant storage | ./qdrant_storage |
QDRANT_COLLECTION_NAME | Qdrant collection name | documents |
FASTAPI_HOST | FastAPI server host | 0.0.0.0 |
FASTAPI_PORT | FastAPI server port | 8000 |
EMBEDDING_MODEL | Google embedding model | text-embedding-004 |
LLM_MODEL | Google LLM model | gemini-1.5-flash |
Architecture
Document Processing Pipeline
- Upload - User uploads a .txt or .md file
- Processing - Document is read and metadata extracted (including frontmatter)
- Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
- Embedding - Each chunk is converted to a vector using Google AI embeddings
- Storage - Vectors and metadata are stored in Qdrant
Query Pipeline
Standard Query
- Query - User submits a question
- Embedding - Question is converted to a vector
- Retrieval - Similar chunks are retrieved from Qdrant
- Generation - Context is provided to Google AI Studio model
- Response - Answer is generated and returned with sources
Smart Query
- Classification - Query is classified (documentation, code, conceptual, etc.)
- Routing - Automatically selects best retrieval strategy
- Multi-Source - May combine documentation search, code search, and direct answers
- Synthesis - Generates comprehensive answer from multiple sources
Code Indexing
The system can index source code repositories:
# Build code index
python build_code_index.py /path/to/repo
# Query code through the API or MCP server
Code is indexed with:
- Class and function definitions
- Docstrings and comments
- File structure and imports
- Semantic embeddings for natural language queries
Development
Running Tests
# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest
# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py
Code Style
The project follows Python best practices with type hints and docstrings.
Troubleshooting
Common Issues
Issue: GOOGLE_API_KEY not found
- Solution: Ensure you've created a
.envfile and added your Google API key
Issue: Unsupported file type
- Solution: Only .txt and .md files are supported. Convert other formats first.
Issue: Collection already exists error
- Solution: Delete the
qdrant_storage/directory to reset the database
Issue: MCP server not connecting
- Solution: Check that the path in your MCP config is correct and the
.envfile is in the project root
Advanced Usage
Tag-Based Organization
Organize your documents with tags for easy categorization and filtering:
# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# List all available tags
curl "http://localhost:8000/tags"
# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'
# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"
Hierarchical Document Structure
For markdown documents, the system automatically preserves heading hierarchy:
# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"
# Query specific section
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'
Section-Aware Queries
The system includes section context when generating answers:
# Example: Markdown document structure
# Installation
# Prerequisites
# Python Version
# Setup Steps
# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths
Smart Query Modes
The system supports three query modes:
- Standard (
/query) - Basic vector search and retrieval - Enhanced (
/query-enhanced) - Follows documentation references automatically - Smart (
/smart-query) - Automatic classification and routing
Use the OpenAI-compatible API to access different modes:
# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
MCP Tools
The MCP server provides enhanced tools for Claude and other MCP clients:
query_rag - Query with optional tags and section filtering
{
"question": "How do I deploy?",
"tags": ["dagster"],
"section_path": "Deployment"
}
smart_query - Smart query with automatic routing
{
"question": "What is an asset and how do I use it?"
}
add_document - Upload with tags
{
"file_path": "/path/to/doc.md",
"tags": ["dagster", "docs"]
}
get_tags - List all tags
get_document_structure - Get table of contents
{
"doc_id": "abc123"
}
API Reference
Enhanced Endpoints
POST /documents
- Body:
file(multipart),tags(comma-separated string) - Response: Document info with tags and chunk count
POST /query
- Body:
{"question": "...", "tags": [...], "section_path": "..."} - Response: Answer with section-aware sources
POST /smart-query
- Body:
{"question": "..."} - Response: Smart answer with automatic routing and classification
GET /tags
- Response:
{"tags": [...], "total": N}
GET /documents/{doc_id}/sections
- Response: Document structure with section hierarchy
GET /documents?tags=tag1,tag2
- Query filtered by tags
- Response: List of matching documents
POST /v1/chat/completions
- OpenAI-compatible chat completion endpoint
- Supports models:
rag-standard,rag-enhanced,rag-smart - Supports streaming with
stream: true
GET /v1/models
- List available RAG models
Additional Documentation
- - Quick setup guide for MCP integration
- - Detailed MCP server setup
- - OpenAI-compatible API documentation
- - Smart query routing guide
- - Multi-mode retrieval documentation
- - Code indexing and search guide
- - Rate limiting configuration
- - Test coverage and testing guide
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- Google AI Studio for embeddings and LLM capabilities
- Qdrant for vector database
- FastAPI for the REST API framework
- Anthropic MCP for the Model Context Protocol