aruc-dev/doc-rag-mcp-server
If you are the rightful owner of doc-rag-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The RAG MCP Server is a Retrieval Augmented Generation system utilizing the Model Context Protocol to enable document ingestion and querying through AI.
Document RAG MCP Server
A Retrieval Augmented Generation (RAG) system using Model Context Protocol (MCP) that allows you to ingest documents and query them using AI.
Features
- Document Ingestion: Load text documents and create vector embeddings
- Semantic Search: Query documents using natural language
- Persistent Storage: ChromaDB vector database for persistent document storage
- MCP Architecture: FastMCP server with LangGraph client integration
- AI-Powered: Support for multiple AI providers:
- Google Gemini for embeddings and conversational AI
- Ollama for local LLM execution (privacy-focused)
Setup
-
Python Requirements:
- Python 3.11 or higher is required
- This project has been tested with Python 3.11
-
Create a virtual environment (recommended):
python3.11 -m venv .venv source .venv/bin/activate # On macOS/Linux # or .venv\Scripts\activate # On Windows -
Install Dependencies:
pip install -r requirements.txt -
Configure LLM Provider:
Choose between Google Gemini (cloud-based) or Ollama (local) as your LLM provider:
Option A: Google Gemini (Default)
# Copy the example environment file cp .env.example .env # Edit .env and set LLM_PROVIDER=gemini # Add your Google Gemini API key # Get your API key from: https://makersuite.google.com/app/apikeyOption B: Ollama (Local)
# Copy the example environment file cp .env.example .env # Edit .env and set LLM_PROVIDER=ollama # Make sure Ollama is installed and running locally # Install Ollama from: https://ollama.ai -
Update .env file:
For Google Gemini:
LLM_PROVIDER=gemini GOOGLE_GEMINI_API_KEY=your_actual_google_gemini_api_key_hereFor Ollama:
LLM_PROVIDER=ollama OLLAMA_BASE_URL=http://localhost:11434 OLLAMA_MODEL=llama3.2 OLLAMA_EMBEDDING_MODEL=nomic-embed-text
Ollama Setup
If you choose to use Ollama, follow these additional steps:
-
Install Ollama:
# Visit https://ollama.ai and download for your platform # Or use package manager (macOS): brew install ollama -
Start Ollama service:
ollama serve -
Pull required models:
# Pull the chat model (choose one): ollama pull llama3.2 # Recommended: Latest Llama model # OR ollama pull llama2 # Alternative: Stable Llama2 model # Pull the embedding model (required): ollama pull nomic-embed-text
Usage
-
Activate Virtual Environment:
source .venv/bin/activate # On macOS/Linux # or .venv\Scripts\activate # On Windows -
Start the MCP Server:
python rag_mcp_server.py -
Start the Client (in a separate terminal):
# Activate the virtual environment in the new terminal too source .venv/bin/activate python mcp_client.py -
Ingest a Document:
You: ingest_document /path/to/your/document.txt # OR provide the file path directly: You: /Users/username/Documents/sample_doc.md -
Query the Document:
You: What are the main topics covered in this document? You: How can I get IT support? You: What are the working hours? -
Exit and Deactivate:
# To exit the client, type 'exit', 'quit', or 'q' # To deactivate the virtual environment when done: deactivate
Available Tools
ingest_document(file_path: str): Loads and processes a text document into the vector storequery_rag_store(query: str): Searches the vector store for relevant content based on a query
Architecture
- RAG Server: FastMCP-based server providing document ingestion and querying tools
- Client: LangGraph-based conversational agent with support for:
- Google Gemini (cloud-based)
- Ollama (local, privacy-focused)
- Vector Store: ChromaDB for persistent document storage
- Embeddings: Provider-specific embedding models:
- Google Gemini:
models/embedding-001 - Ollama:
nomic-embed-text
- Google Gemini:
Troubleshooting
Common Issues
-
404 Error with Ollama Embeddings:
INFO HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 404 Not Found"Solution: Pull the embedding model:
ollama pull nomic-embed-text -
ChromaDB Reset: If you need to clear all ingested documents:
rm -rf rag_chroma_db -
Switching Between Providers:
- Edit
.envfile and changeLLM_PROVIDER - Restart the MCP client
- Clear ChromaDB if switching embedding models
- Edit
Testing Configuration
Use the provided test script to verify your setup:
python test_llm_providers.py
MCP Inspector
To inspect and debug your MCP server tools and capabilities:
# Activate virtual environment and run MCP Inspector
source .venv/bin/activate && npx @modelcontextprotocol/inspector /path/to/your/project/.venv/bin/python rag_mcp_server.py
Replace /path/to/your/project/ with your actual project path. This will open a web interface where you can:
- View available tools (
ingest_documentandquery_rag_store) - Test tool functionality interactively
- Debug tool schemas and responses
- Validate your MCP server implementation
Security
- API keys are loaded from environment variables only
- The
.envfile is excluded from version control - Never commit API keys to the repository
Requirements
- Python 3.11+
- For Google Gemini: Google Gemini API key
- For Ollama: Local Ollama installation with required models
- Required packages listed in
requirements.txt
Additional Features
- Provider Flexibility: Switch between cloud-based and local AI providers
- Privacy Options: Use Ollama for completely local document processing
- Persistent Storage: ChromaDB maintains document embeddings across sessions
- Automatic Model Detection: Server automatically uses the correct embedding model for the selected provider