simple-mcp-rag by alejandro-ao - MCP Server

RAG MCP Server

A Retrieval Augmented Generation (RAG) MCP server built with FastMCP 1 and ChromaDB 2 that provides MCP (Model Context Protocol) tools for ingesting documents into a local vector database and retrieving relevant information based on queries.

Features

🔧 Tools

query_documents: Search for relevant documents using semantic similarity
list_ingested_files: View all files currently stored in the database
reingest_data_directory: Reingest all files from the data directory (useful to reindex contents when new files are added)
get_rag_status: Get comprehensive system information including server status, database configuration, data directory status, and environment variables

📊 Resources

None currently available

💬 Prompts

rag_analysis_prompt: Generate structured prompts for analyzing documents on specific topics

Quick Start

1. Installation

# Install dependencies
pip install -r requirements.txt

# Or install manually
pip install fastmcp chromadb sentence-transformers

2. Run the Server

# Start the MCP server
python rag_server.py

# Or use FastMCP CLI for development with inspector
fastmcp dev rag_server.py

3. Test the Server

# Run the test suite
python test_rag_server.py

Directory Configuration

The server supports flexible configuration for both data and database directories through environment variables:

Data Directory Configuration:

Priority Order:

LLAMA_RAG_DATA_DIR environment variable (highest priority)
./data in current working directory (workspace-relative)
Error: If neither is found, the server will log an error and skip auto-ingestion

Important: Unlike the database directory, the data directory requires explicit configuration. If no data directory is found, the server will:

Log a clear error message with setup instructions
Skip auto-ingestion (server will still start successfully)
Require manual configuration before documents can be ingested

Database Directory Configuration:

Priority Order:

LLAMA_RAG_DB_DIR environment variable (highest priority)
~/.local/share/rag-server (XDG Base Directory standard)
./chroma relative to current working directory (fallback)

Usage Examples:

# Using environment variable (recommended)
export LLAMA_RAG_DATA_DIR=/path/to/your/documents
python rag_server.py

# Using current directory data folder
mkdir data
cp your_documents/* data/
python rag_server.py

# Error case - no configuration
# Server starts but logs: "No data directory found. Please either..."
python rag_server.py

# Use custom database directory only
LLAMA_RAG_DB_DIR=/path/to/your/database python rag_server.py

# Use both custom directories
LLAMA_RAG_DATA_DIR=~/Documents/rag-data LLAMA_RAG_DB_DIR=~/Documents/rag-db python rag_server.py

Testing:

# Test with temporary directories
LLAMA_RAG_DATA_DIR=/tmp/test_data LLAMA_RAG_DB_DIR=/tmp/test_db python rag_server.py

For detailed configuration options, see .

Usage Examples

Ingesting Documents

# The server will chunk your document automatically
result = ingest_file(
    file_path="sample_document.txt",
    chunk_size=1000,  # Characters per chunk
    overlap=200       # Overlap between chunks
)

Querying Documents

# Search for relevant information
results = query_documents(
    query="What is machine learning?",
    n_results=5,
    include_metadata=True
)

Checking System Status

# Get current system information
status = get_rag_status()
# Returns: {"status": "active", "total_documents": 42, ...}

Architecture

Components

FastMCP Server: High-level MCP server framework 1
ChromaDB: Local vector database for document storage 2
Sentence Transformers: Embedding model for semantic search

Data Flow

Text File → Chunking → Embeddings → ChromaDB → Query → Relevant Chunks

File Structure

mcp-rag/
├── rag_server.py           # Main MCP server implementation
├── requirements.txt        # Python dependencies
├── test_rag_server.py     # Test suite
├── sample_document.txt    # Example document for testing
├── README.md              # This file
└── chroma_db/             # ChromaDB persistent storage (created automatically)

Configuration

Environment Variables

The server uses sensible defaults, but you can customize:

Database Location: Modify persist_directory in rag_server.py
Collection Name: Change rag_documents to your preferred name
Chunk Settings: Adjust default chunk_size and overlap parameters

ChromaDB Settings

# Persistent storage configuration
chroma_client = chromadb.PersistentClient(
    path="./chroma_db",
    settings=Settings(
        anonymized_telemetry=False,
        allow_reset=True
    )
)

Integration with MCP Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "rag-server": {
      "command": "python",
      "args": ["/path/to/your/rag_server.py"],
      "cwd": "/path/to/your/mcp-rag"
    }
  }
}

Cursor IDE

Add to your MCP configuration:

{
  "mcpServers": {
    "rag-server": {
      "command": "python",
      "args": ["rag_server.py"],
      "cwd": "/path/to/mcp-rag"
    }
  }
}

Development

Testing with MCP Inspector

FastMCP includes a built-in web interface for testing:

# Install with CLI tools
pip install "fastmcp[cli]"

# Run with inspector
fastmcp dev rag_server.py

# Open browser to http://127.0.0.1:6274

Adding New Tools

@mcp.tool
def your_new_tool(param: str) -> str:
    """
    Description of your tool.
    
    Args:
        param: Description of parameter
    
    Returns:
        Description of return value
    """
    # Your implementation here
    return "result"

Adding Resources

@mcp.resource("your://resource-uri")
def your_resource() -> dict:
    """
    Description of your resource.
    """
    return {"data": "value"}

Troubleshooting

Common Issues

Import Errors
```
pip install --upgrade fastmcp chromadb
```

ChromaDB Permission Issues

# Ensure write permissions for chroma_db directory
chmod -R 755 ./chroma_db

Memory Issues with Large Files
- Reduce chunk_size parameter
- Process files in smaller batches
- Monitor system memory usage
Slow Query Performance
- Reduce n_results parameter
- Consider using more specific queries
- Check ChromaDB index status

Logging

The server includes comprehensive logging:

import logging
logging.basicConfig(level=logging.DEBUG)  # Enable debug logging

Performance Considerations

Optimization Tips

Chunk Size: Balance between context and performance (500-2000 characters)
Overlap: Prevent context loss at chunk boundaries (10-20% of chunk size)
Query Results: Limit n_results to avoid overwhelming responses (3-10 results)
File Size: Consider splitting very large files before ingestion

Scaling

For production use:

Consider ChromaDB's client-server mode
Implement batch processing for large document sets
Add caching for frequently accessed documents
Monitor disk space for the vector database

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

This project is open source. Feel free to use, modify, and distribute according to your needs.

References

Built with ❤️ using FastMCP and ChromaDB