alejandro-ao/RAG-MCP
If you are the rightful owner of RAG-MCP and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The RAG MCP Server is a Retrieval Augmented Generation server that utilizes FastMCP and ChromaDB to manage document ingestion and retrieval using vector embeddings.
RAG MCP Server
A Retrieval Augmented Generation (RAG) MCP server built with FastMCP
This server uses LlamaParse for parsing and extracting text from various file formats, including PDFs, Word documents, and PowerPoints. This allows for easy and efficient ETL (Extract, Transform, Load) of your documents into the vector database.
Note: To use LlamaParse for parsing documents, you will need a LlamaParse API key. You can get one from the LlamaParse website.
How it Works
The server automatically ingests files at startup from a designated data directory. Here's a breakdown of the process:
- File Ingestion: When the server starts, it looks for files in the data directory.
- Parsing with LlamaParse: If a
LLAMA_CLOUD_API_KEY
is set, the server uses LlamaParse to extract text from supported file types (.pdf
,.docx
,.pptx
, etc.). If the key is not set, parsing will be limited. - Vectorization: The extracted text is then converted into vector embeddings.
- Database Persistence: These embeddings are stored in a local ChromaDB database, which is persisted on disk.
File Locations
- Data Directory: Files to be ingested should be placed in a
data
directory in the project's root, or a custom path can be specified using theLLAMA_RAG_DATA_DIR
environment variable. - Database Directory: The ChromaDB database is persisted in
~/.local/share/rag-server
by default, but this can be overridden with theLLAMA_RAG_DB_DIR
environment variable.
Features
š§ Tools
query_documents
: Search for relevant documents using semantic similaritylist_ingested_files
: View all files currently stored in the databasereingest_data_directory
: Reingest all files from the data directory (useful to reindex contents when new files are added)get_rag_status
: Get comprehensive system information including server status, database configuration, data directory status, and environment variables
š Resources
- None currently available
š¬ Prompts
rag_analysis_prompt
: Generate structured prompts for analyzing documents on specific topics
Quick Start
1. Installation
The recommended way to install and manage dependencies is with uv.
If you don't have uv
installed, you can install it with:
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
Once uv
is installed, you can sync the dependencies:
# Create a virtual environment and install dependencies
uv sync
Alternatively, you can still use pip
:
# Install dependencies with pip
pip install -r requirements.txt
2. Run the Server
Once the dependencies are installed, you can run the server:
# Start the MCP server
python src/main.py
3. Test the Server
# Run the test suite
python tests/test_rag_server.py
Directory Configuration
The server supports flexible configuration for both data and database directories through environment variables:
Data Directory Configuration:
Priority Order:
LLAMA_RAG_DATA_DIR
environment variable (highest priority)./data
in current working directory (workspace-relative)- Error: If neither is found, the server will log an error and skip auto-ingestion
Important: Unlike the database directory, the data directory requires explicit configuration. If no data directory is found, the server will:
- Log a clear error message with setup instructions
- Skip auto-ingestion (server will still start successfully)
- Require manual configuration before documents can be ingested
Database Directory Configuration:
Priority Order:
LLAMA_RAG_DB_DIR
environment variable (highest priority)~/.local/share/rag-server
(XDG Base Directory standard)./chroma
relative to current working directory (fallback)
Usage Examples:
# Using environment variable (recommended)
export LLAMA_RAG_DATA_DIR=/path/to/your/documents
python rag_server.py
# Using current directory data folder
mkdir data
cp your_documents/* data/
python rag_server.py
# Error case - no configuration
# Server starts but logs: "No data directory found. Please either..."
python rag_server.py
# Use custom database directory only
LLAMA_RAG_DB_DIR=/path/to/your/database python rag_server.py
# Use both custom directories
LLAMA_RAG_DATA_DIR=~/Documents/rag-data LLAMA_RAG_DB_DIR=~/Documents/rag-db python rag_server.py
Testing:
# Test with temporary directories
LLAMA_RAG_DATA_DIR=/tmp/test_data LLAMA_RAG_DB_DIR=/tmp/test_db python rag_server.py
For detailed configuration options, see .
Usage Examples
Ingesting Documents
# The server will chunk your document automatically
result = ingest_file(
file_path="sample_document.txt",
chunk_size=1000, # Characters per chunk
overlap=200 # Overlap between chunks
)
Querying Documents
# Search for relevant information
results = query_documents(
query="What is machine learning?",
n_results=5,
include_metadata=True
)
Checking System Status
# Get current system information
status = get_rag_status()
# Returns: {"status": "active", "total_documents": 42, ...}
Architecture
Components
- FastMCP Server: High-level MCP server framework
1 - ChromaDB: Local vector database for document storage
2 - Sentence Transformers: Embedding model for semantic search
Data Flow
Text File ā Chunking ā Embeddings ā ChromaDB ā Query ā Relevant Chunks
File Structure
mcp-rag/
āāā rag_server.py # Main MCP server implementation
āāā requirements.txt # Python dependencies
āāā test_rag_server.py # Test suite
āāā sample_document.txt # Example document for testing
āāā README.md # This file
āāā chroma_db/ # ChromaDB persistent storage (created automatically)
Configuration
Environment Variables
The server uses sensible defaults, but you can customize:
- Database Location: Modify
persist_directory
inrag_server.py
- Collection Name: Change
rag_documents
to your preferred name - Chunk Settings: Adjust default
chunk_size
andoverlap
parameters
ChromaDB Settings
# Persistent storage configuration
chroma_client = chromadb.PersistentClient(
path="./chroma_db",
settings=Settings(
anonymized_telemetry=False,
allow_reset=True
)
)
Integration with MCP Clients
Claude Desktop
Add to your claude_desktop_config.json
:
{
"mcpServers": {
"rag-server": {
"command": "python",
"args": ["/path/to/your/rag_server.py"],
"cwd": "/path/to/your/mcp-rag"
}
}
}
Cursor IDE
Add to your MCP configuration:
{
"mcpServers": {
"rag-server": {
"command": "python",
"args": ["rag_server.py"],
"cwd": "/path/to/mcp-rag"
}
}
}
Development
Testing with MCP Inspector
FastMCP includes a built-in web interface for testing:
# Install with CLI tools
pip install "fastmcp[cli]"
# Run with inspector
fastmcp dev rag_server.py
# Open browser to http://127.0.0.1:6274
Adding New Tools
@mcp.tool
def your_new_tool(param: str) -> str:
"""
Description of your tool.
Args:
param: Description of parameter
Returns:
Description of return value
"""
# Your implementation here
return "result"
Adding Resources
@mcp.resource("your://resource-uri")
def your_resource() -> dict:
"""
Description of your resource.
"""
return {"data": "value"}
Troubleshooting
Common Issues
-
Import Errors
pip install --upgrade fastmcp chromadb
-
ChromaDB Permission Issues
# Ensure write permissions for chroma_db directory chmod -R 755 ./chroma_db
-
Memory Issues with Large Files
- Reduce
chunk_size
parameter - Process files in smaller batches
- Monitor system memory usage
- Reduce
-
Slow Query Performance
- Reduce
n_results
parameter - Consider using more specific queries
- Check ChromaDB index status
- Reduce
Logging
The server includes comprehensive logging:
import logging
logging.basicConfig(level=logging.DEBUG) # Enable debug logging
Performance Considerations
Optimization Tips
- Chunk Size: Balance between context and performance (500-2000 characters)
- Overlap: Prevent context loss at chunk boundaries (10-20% of chunk size)
- Query Results: Limit
n_results
to avoid overwhelming responses (3-10 results) - File Size: Consider splitting very large files before ingestion
Scaling
For production use:
- Consider ChromaDB's client-server mode
- Implement batch processing for large document sets
- Add caching for frequently accessed documents
- Monitor disk space for the vector database
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is open source. Feel free to use, modify, and distribute according to your needs.
References
Built with ā¤ļø using FastMCP and ChromaDB