doc-rag-mcp-server by aruc-dev - MCP Server

Document RAG MCP Server

A Retrieval Augmented Generation (RAG) system using Model Context Protocol (MCP) that allows you to ingest documents and query them using AI.

Features

Document Ingestion: Load text documents and create vector embeddings
Semantic Search: Query documents using natural language
Persistent Storage: ChromaDB vector database for persistent document storage
MCP Architecture: FastMCP server with LangGraph client integration
AI-Powered: Support for multiple AI providers:
- Google Gemini for embeddings and conversational AI
- Ollama for local LLM execution (privacy-focused)

Setup

Python Requirements:
- Python 3.11 or higher is required
- This project has been tested with Python 3.11

Create a virtual environment (recommended):

python3.11 -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate     # On Windows

Install Dependencies:
```
pip install -r requirements.txt
```

Configure LLM Provider:

Choose between Google Gemini (cloud-based) or Ollama (local) as your LLM provider:

Option A: Google Gemini (Default)

# Copy the example environment file
cp .env.example .env

# Edit .env and set LLM_PROVIDER=gemini
# Add your Google Gemini API key
# Get your API key from: https://makersuite.google.com/app/apikey

Option B: Ollama (Local)

# Copy the example environment file
cp .env.example .env

# Edit .env and set LLM_PROVIDER=ollama
# Make sure Ollama is installed and running locally
# Install Ollama from: https://ollama.ai

Update .env file:

For Google Gemini:

LLM_PROVIDER=gemini
GOOGLE_GEMINI_API_KEY=your_actual_google_gemini_api_key_here

For Ollama:

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.2
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

Ollama Setup

If you choose to use Ollama, follow these additional steps:

Install Ollama:

# Visit https://ollama.ai and download for your platform
# Or use package manager (macOS):
brew install ollama

Start Ollama service:
```
ollama serve
```

Pull required models:

# Pull the chat model (choose one):
ollama pull llama3.2  # Recommended: Latest Llama model
# OR
ollama pull llama2    # Alternative: Stable Llama2 model

# Pull the embedding model (required):
ollama pull nomic-embed-text

Usage

Activate Virtual Environment:

source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate     # On Windows

Start the MCP Server:
```
python rag_mcp_server.py
```

Start the Client (in a separate terminal):

# Activate the virtual environment in the new terminal too
source .venv/bin/activate
python mcp_client.py

Ingest a Document:

You: ingest_document /path/to/your/document.txt
# OR provide the file path directly:
You: /Users/username/Documents/sample_doc.md

Query the Document:

You: What are the main topics covered in this document?
You: How can I get IT support?
You: What are the working hours?

Exit and Deactivate:

# To exit the client, type 'exit', 'quit', or 'q'
# To deactivate the virtual environment when done:
deactivate

Available Tools

ingest_document(file_path: str): Loads and processes a text document into the vector store
query_rag_store(query: str): Searches the vector store for relevant content based on a query

Architecture

RAG Server: FastMCP-based server providing document ingestion and querying tools
Client: LangGraph-based conversational agent with support for:
- Google Gemini (cloud-based)
- Ollama (local, privacy-focused)
Vector Store: ChromaDB for persistent document storage
Embeddings: Provider-specific embedding models:
- Google Gemini: models/embedding-001
- Ollama: nomic-embed-text

Troubleshooting

Common Issues

404 Error with Ollama Embeddings:

INFO HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 404 Not Found"

Solution: Pull the embedding model:

ollama pull nomic-embed-text

ChromaDB Reset: If you need to clear all ingested documents:
```
rm -rf rag_chroma_db
```
Switching Between Providers:
- Edit .env file and change LLM_PROVIDER
- Restart the MCP client
- Clear ChromaDB if switching embedding models

Testing Configuration

Use the provided test script to verify your setup:

python test_llm_providers.py

MCP Inspector

To inspect and debug your MCP server tools and capabilities:

# Activate virtual environment and run MCP Inspector
source .venv/bin/activate && npx @modelcontextprotocol/inspector /path/to/your/project/.venv/bin/python rag_mcp_server.py

Replace /path/to/your/project/ with your actual project path. This will open a web interface where you can:

View available tools (ingest_document and query_rag_store)
Test tool functionality interactively
Debug tool schemas and responses
Validate your MCP server implementation

Security

API keys are loaded from environment variables only
The .env file is excluded from version control
Never commit API keys to the repository

Requirements

Python 3.11+
For Google Gemini: Google Gemini API key
For Ollama: Local Ollama installation with required models
Required packages listed in requirements.txt

Additional Features

Provider Flexibility: Switch between cloud-based and local AI providers
Privacy Options: Use Ollama for completely local document processing
Persistent Storage: ChromaDB maintains document embeddings across sessions
Automatic Model Detection: Server automatically uses the correct embedding model for the selected provider