doc-rag-mcp-server

aruc-dev/doc-rag-mcp-server

3.2

If you are the rightful owner of doc-rag-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The RAG MCP Server is a Retrieval Augmented Generation system utilizing the Model Context Protocol to enable document ingestion and querying through AI.

Tools
2
Resources
0
Prompts
0

Document RAG MCP Server

A Retrieval Augmented Generation (RAG) system using Model Context Protocol (MCP) that allows you to ingest documents and query them using AI.

Features

  • Document Ingestion: Load text documents and create vector embeddings
  • Semantic Search: Query documents using natural language
  • Persistent Storage: ChromaDB vector database for persistent document storage
  • MCP Architecture: FastMCP server with LangGraph client integration
  • AI-Powered: Support for multiple AI providers:
    • Google Gemini for embeddings and conversational AI
    • Ollama for local LLM execution (privacy-focused)

Setup

  1. Python Requirements:

    • Python 3.11 or higher is required
    • This project has been tested with Python 3.11
  2. Create a virtual environment (recommended):

    python3.11 -m venv .venv
    source .venv/bin/activate  # On macOS/Linux
    # or
    .venv\Scripts\activate     # On Windows
    
  3. Install Dependencies:

    pip install -r requirements.txt
    
  4. Configure LLM Provider:

    Choose between Google Gemini (cloud-based) or Ollama (local) as your LLM provider:

    Option A: Google Gemini (Default)

    # Copy the example environment file
    cp .env.example .env
    
    # Edit .env and set LLM_PROVIDER=gemini
    # Add your Google Gemini API key
    # Get your API key from: https://makersuite.google.com/app/apikey
    

    Option B: Ollama (Local)

    # Copy the example environment file
    cp .env.example .env
    
    # Edit .env and set LLM_PROVIDER=ollama
    # Make sure Ollama is installed and running locally
    # Install Ollama from: https://ollama.ai
    
  5. Update .env file:

    For Google Gemini:

    LLM_PROVIDER=gemini
    GOOGLE_GEMINI_API_KEY=your_actual_google_gemini_api_key_here
    

    For Ollama:

    LLM_PROVIDER=ollama
    OLLAMA_BASE_URL=http://localhost:11434
    OLLAMA_MODEL=llama3.2
    OLLAMA_EMBEDDING_MODEL=nomic-embed-text
    

Ollama Setup

If you choose to use Ollama, follow these additional steps:

  1. Install Ollama:

    # Visit https://ollama.ai and download for your platform
    # Or use package manager (macOS):
    brew install ollama
    
  2. Start Ollama service:

    ollama serve
    
  3. Pull required models:

    # Pull the chat model (choose one):
    ollama pull llama3.2  # Recommended: Latest Llama model
    # OR
    ollama pull llama2    # Alternative: Stable Llama2 model
    
    # Pull the embedding model (required):
    ollama pull nomic-embed-text
    

Usage

  1. Activate Virtual Environment:

    source .venv/bin/activate  # On macOS/Linux
    # or
    .venv\Scripts\activate     # On Windows
    
  2. Start the MCP Server:

    python rag_mcp_server.py
    
  3. Start the Client (in a separate terminal):

    # Activate the virtual environment in the new terminal too
    source .venv/bin/activate
    python mcp_client.py
    
  4. Ingest a Document:

    You: ingest_document /path/to/your/document.txt
    # OR provide the file path directly:
    You: /Users/username/Documents/sample_doc.md
    
  5. Query the Document:

    You: What are the main topics covered in this document?
    You: How can I get IT support?
    You: What are the working hours?
    
  6. Exit and Deactivate:

    # To exit the client, type 'exit', 'quit', or 'q'
    # To deactivate the virtual environment when done:
    deactivate
    

Available Tools

  • ingest_document(file_path: str): Loads and processes a text document into the vector store
  • query_rag_store(query: str): Searches the vector store for relevant content based on a query

Architecture

  • RAG Server: FastMCP-based server providing document ingestion and querying tools
  • Client: LangGraph-based conversational agent with support for:
    • Google Gemini (cloud-based)
    • Ollama (local, privacy-focused)
  • Vector Store: ChromaDB for persistent document storage
  • Embeddings: Provider-specific embedding models:
    • Google Gemini: models/embedding-001
    • Ollama: nomic-embed-text

Troubleshooting

Common Issues

  1. 404 Error with Ollama Embeddings:

    INFO HTTP Request: POST http://localhost:11434/api/embed "HTTP/1.1 404 Not Found"
    

    Solution: Pull the embedding model:

    ollama pull nomic-embed-text
    
  2. ChromaDB Reset: If you need to clear all ingested documents:

    rm -rf rag_chroma_db
    
  3. Switching Between Providers:

    • Edit .env file and change LLM_PROVIDER
    • Restart the MCP client
    • Clear ChromaDB if switching embedding models

Testing Configuration

Use the provided test script to verify your setup:

python test_llm_providers.py

MCP Inspector

To inspect and debug your MCP server tools and capabilities:

# Activate virtual environment and run MCP Inspector
source .venv/bin/activate && npx @modelcontextprotocol/inspector /path/to/your/project/.venv/bin/python rag_mcp_server.py

Replace /path/to/your/project/ with your actual project path. This will open a web interface where you can:

  • View available tools (ingest_document and query_rag_store)
  • Test tool functionality interactively
  • Debug tool schemas and responses
  • Validate your MCP server implementation

Security

  • API keys are loaded from environment variables only
  • The .env file is excluded from version control
  • Never commit API keys to the repository

Requirements

  • Python 3.11+
  • For Google Gemini: Google Gemini API key
  • For Ollama: Local Ollama installation with required models
  • Required packages listed in requirements.txt

Additional Features

  • Provider Flexibility: Switch between cloud-based and local AI providers
  • Privacy Options: Use Ollama for completely local document processing
  • Persistent Storage: ChromaDB maintains document embeddings across sessions
  • Automatic Model Detection: Server automatically uses the correct embedding model for the selected provider