alejandro-ao/simple-mcp-rag
If you are the rightful owner of simple-mcp-rag and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The RAG MCP Server is a Retrieval Augmented Generation server built with FastMCP and ChromaDB, designed to manage document ingestion and retrieval using semantic search.
RAG MCP Server
A Retrieval Augmented Generation (RAG) MCP server built with FastMCP
Features
🔧 Tools
query_documents: Search for relevant documents using semantic similaritylist_ingested_files: View all files currently stored in the databasereingest_data_directory: Reingest all files from the data directory (useful to reindex contents when new files are added)get_rag_status: Get comprehensive system information including server status, database configuration, data directory status, and environment variables
📊 Resources
- None currently available
💬 Prompts
rag_analysis_prompt: Generate structured prompts for analyzing documents on specific topics
Quick Start
1. Installation
# Install dependencies
pip install -r requirements.txt
# Or install manually
pip install fastmcp chromadb sentence-transformers
2. Run the Server
# Start the MCP server
python rag_server.py
# Or use FastMCP CLI for development with inspector
fastmcp dev rag_server.py
3. Test the Server
# Run the test suite
python test_rag_server.py
Directory Configuration
The server supports flexible configuration for both data and database directories through environment variables:
Data Directory Configuration:
Priority Order:
LLAMA_RAG_DATA_DIRenvironment variable (highest priority)./datain current working directory (workspace-relative)- Error: If neither is found, the server will log an error and skip auto-ingestion
Important: Unlike the database directory, the data directory requires explicit configuration. If no data directory is found, the server will:
- Log a clear error message with setup instructions
- Skip auto-ingestion (server will still start successfully)
- Require manual configuration before documents can be ingested
Database Directory Configuration:
Priority Order:
LLAMA_RAG_DB_DIRenvironment variable (highest priority)~/.local/share/rag-server(XDG Base Directory standard)./chromarelative to current working directory (fallback)
Usage Examples:
# Using environment variable (recommended)
export LLAMA_RAG_DATA_DIR=/path/to/your/documents
python rag_server.py
# Using current directory data folder
mkdir data
cp your_documents/* data/
python rag_server.py
# Error case - no configuration
# Server starts but logs: "No data directory found. Please either..."
python rag_server.py
# Use custom database directory only
LLAMA_RAG_DB_DIR=/path/to/your/database python rag_server.py
# Use both custom directories
LLAMA_RAG_DATA_DIR=~/Documents/rag-data LLAMA_RAG_DB_DIR=~/Documents/rag-db python rag_server.py
Testing:
# Test with temporary directories
LLAMA_RAG_DATA_DIR=/tmp/test_data LLAMA_RAG_DB_DIR=/tmp/test_db python rag_server.py
For detailed configuration options, see .
Usage Examples
Ingesting Documents
# The server will chunk your document automatically
result = ingest_file(
file_path="sample_document.txt",
chunk_size=1000, # Characters per chunk
overlap=200 # Overlap between chunks
)
Querying Documents
# Search for relevant information
results = query_documents(
query="What is machine learning?",
n_results=5,
include_metadata=True
)
Checking System Status
# Get current system information
status = get_rag_status()
# Returns: {"status": "active", "total_documents": 42, ...}
Architecture
Components
- FastMCP Server: High-level MCP server framework
1 - ChromaDB: Local vector database for document storage
2 - Sentence Transformers: Embedding model for semantic search
Data Flow
Text File → Chunking → Embeddings → ChromaDB → Query → Relevant Chunks
File Structure
mcp-rag/
├── rag_server.py # Main MCP server implementation
├── requirements.txt # Python dependencies
├── test_rag_server.py # Test suite
├── sample_document.txt # Example document for testing
├── README.md # This file
└── chroma_db/ # ChromaDB persistent storage (created automatically)
Configuration
Environment Variables
The server uses sensible defaults, but you can customize:
- Database Location: Modify
persist_directoryinrag_server.py - Collection Name: Change
rag_documentsto your preferred name - Chunk Settings: Adjust default
chunk_sizeandoverlapparameters
ChromaDB Settings
# Persistent storage configuration
chroma_client = chromadb.PersistentClient(
path="./chroma_db",
settings=Settings(
anonymized_telemetry=False,
allow_reset=True
)
)
Integration with MCP Clients
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"rag-server": {
"command": "python",
"args": ["/path/to/your/rag_server.py"],
"cwd": "/path/to/your/mcp-rag"
}
}
}
Cursor IDE
Add to your MCP configuration:
{
"mcpServers": {
"rag-server": {
"command": "python",
"args": ["rag_server.py"],
"cwd": "/path/to/mcp-rag"
}
}
}
Development
Testing with MCP Inspector
FastMCP includes a built-in web interface for testing:
# Install with CLI tools
pip install "fastmcp[cli]"
# Run with inspector
fastmcp dev rag_server.py
# Open browser to http://127.0.0.1:6274
Adding New Tools
@mcp.tool
def your_new_tool(param: str) -> str:
"""
Description of your tool.
Args:
param: Description of parameter
Returns:
Description of return value
"""
# Your implementation here
return "result"
Adding Resources
@mcp.resource("your://resource-uri")
def your_resource() -> dict:
"""
Description of your resource.
"""
return {"data": "value"}
Troubleshooting
Common Issues
-
Import Errors
pip install --upgrade fastmcp chromadb -
ChromaDB Permission Issues
# Ensure write permissions for chroma_db directory chmod -R 755 ./chroma_db -
Memory Issues with Large Files
- Reduce
chunk_sizeparameter - Process files in smaller batches
- Monitor system memory usage
- Reduce
-
Slow Query Performance
- Reduce
n_resultsparameter - Consider using more specific queries
- Check ChromaDB index status
- Reduce
Logging
The server includes comprehensive logging:
import logging
logging.basicConfig(level=logging.DEBUG) # Enable debug logging
Performance Considerations
Optimization Tips
- Chunk Size: Balance between context and performance (500-2000 characters)
- Overlap: Prevent context loss at chunk boundaries (10-20% of chunk size)
- Query Results: Limit
n_resultsto avoid overwhelming responses (3-10 results) - File Size: Consider splitting very large files before ingestion
Scaling
For production use:
- Consider ChromaDB's client-server mode
- Implement batch processing for large document sets
- Add caching for frequently accessed documents
- Monitor disk space for the vector database
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is open source. Feel free to use, modify, and distribute according to your needs.
References
Built with ❤️ using FastMCP and ChromaDB