simple_local_rag

dataML007/simple_local_rag

3.2

If you are the rightful owner of simple_local_rag and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The Model Context Protocol (MCP) server is designed to facilitate external integration by exposing various tools and resources for document management and question answering.

Tools
3
Resources
0
Prompts
0

Simple Local RAG System

A multi-modal Retrieval-Augmented Generation (RAG) system with FAISS vector database, featuring a Streamlit frontend, FastAPI backend, and MCP server integration.

📋 Table of Contents

✨ Features

  • PDF Document Upload: Upload and index PDF documents in a FAISS vector database
  • Question Answering: Ask questions and get AI-powered answers based on indexed documents
  • Conversation Memory: Maintains context across multiple questions in a conversation
  • Document Management: List and view all indexed documents with metadata
  • MCP Server: Expose functionality as MCP tools for other applications
  • Persistent Storage: FAISS indices and metadata are saved to disk for persistence
  • Source Attribution: Answers include references to source documents

🏗️ Architecture

High-Level Architecture

┌─────────────────┐
│  Streamlit UI   │  ← User Interface
└────────┬────────┘
         │ HTTP REST API
┌────────▼────────┐
│  FastAPI Backend│  ← Main API Server
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
┌───▼───┐ ┌──▼──────┐
│ FAISS │ │ OpenAI  │
│ Vector│ │ API     │
│ Store │ │         │
└───────┘ └─────────┘

┌─────────────────┐
│   MCP Server    │  ← External Integration
└─────────────────┘

Component Details

1. Backend (FastAPI)

The backend provides REST API endpoints for:

  • Document upload and processing
  • Document listing
  • Question answering with RAG
  • Conversation management

Key Modules:

  • main.py: FastAPI application with route handlers
  • vector_store.py: FAISS vector database operations
  • pdf_processor.py: PDF text extraction and chunking
  • rag.py: Retrieval-Augmented Generation logic with conversation memory
  • models.py: Pydantic models for request/response validation
2. Frontend (Streamlit)

Interactive web interface with:

  • Document upload interface
  • Document list viewer
  • Chat interface with conversation history
  • Source attribution display
3. MCP Server

Model Context Protocol server exposing three tools:

  • list_documents: Get all indexed documents
  • upload_document: Upload and index new documents
  • ask_question: Query documents with conversation support
4. Vector Store (FAISS)
  • Index Type: L2 distance with normalized vectors (cosine similarity)
  • Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
  • Storage: Persistent on-disk storage with metadata
5. RAG Pipeline
  1. Document Processing:

    • Extract text from PDF
    • Split into overlapping chunks (1000 chars, 200 overlap)
    • Generate embeddings for each chunk
    • Store in FAISS index
  2. Query Processing:

    • Generate embedding for user question
    • Search FAISS for similar chunks (top-k retrieval)
    • Retrieve conversation history (if conversation_id provided)
    • Generate answer using OpenAI LLM with context
    • Update conversation history

📁 Project Structure

simple_local_rag/
├── backend/
│   ├── app/
│   │   ├── __init__.py
│   │   ├── main.py              # FastAPI application
│   │   ├── models.py            # Pydantic models
│   │   ├── vector_store.py      # FAISS operations
│   │   ├── pdf_processor.py     # PDF processing
│   │   └── rag.py               # RAG logic with memory
│   └── pyproject.toml           # UV dependencies
├── frontend/
│   ├── streamlit_app.py         # Streamlit UI
│   └── requirements.txt         # Frontend dependencies
├── mcp/
│   ├── __init__.py
│   ├── server.py                # MCP server
│   └── pyproject.toml           # MCP dependencies
├── data/
│   ├── uploads/                 # Temporary PDF storage
│   ├── faiss_index.index        # FAISS vector index
│   ├── metadata.json            # Document metadata
│   └── chunks.pkl               # Chunk metadata
├── .env                         # Environment variables (create this)
├── .gitignore
└── README.md

🔧 Installation

Prerequisites

Setup Steps

  1. Clone or navigate to the project directory:

    cd simple_local_rag
    
  2. Install UV (if not already installed):

    curl -LsSf https://astral.sh/uv/install.sh | sh
    
  3. Create virtual environment and install backend dependencies:

    cd backend
    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    uv pip install -e .
    cd ..
    
  4. Install frontend dependencies:

    cd frontend
    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    uv pip install -r requirements.txt
    cd ..
    
  5. Install MCP server dependencies (optional):

    cd mcp
    uv venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    uv pip install -e .
    cd ..
    
  6. Create .env file in the root directory:

    cp .env.example .env
    # Edit .env and add your OpenAI API key
    

⚙️ Configuration

Create a .env file in the root directory with the following variables:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Backend API Configuration
API_HOST=0.0.0.0
API_PORT=8000

# Frontend Configuration
STREAMLIT_SERVER_PORT=8501

# MCP Server Configuration
MCP_SERVER_PORT=8001

🚀 Usage

Starting the Backend

cd backend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python -m app.main

Or using uvicorn directly:

uvicorn app.main:app --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

Starting the Frontend

cd frontend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
streamlit run streamlit_app.py

The UI will be available at http://localhost:8501

Starting the MCP Server

cd mcp
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python server.py

The MCP server will be available at http://localhost:8001

📡 API Endpoints

Health Check

  • GET /health
    • Returns API health status

Upload Document

  • POST /upload
    • Upload a PDF file for indexing
    • Request: Multipart form data with file field
    • Response: Document ID, filename, chunk count

List Documents

  • GET /documents
    • Get list of all indexed documents
    • Response: List of documents with metadata

Query Documents

  • POST /query
    • Ask a question about indexed documents
    • Request Body:
      {
        "question": "What is the main topic?",
        "conversation_id": "optional-conversation-id",
        "top_k": 5
      }
      
    • Response: Answer, conversation_id, relevant chunks, sources

Get Conversation

  • GET /conversation/{conversation_id}
    • Get full conversation history
    • Response: All messages in the conversation

🔌 MCP Server

The MCP server exposes three tools:

1. list_documents

List all indexed documents with descriptions.

Example MCP call:

{
  "tool": "list_documents",
  "arguments": {}
}

2. upload_document

Upload and index a new PDF document.

Example MCP call:

{
  "tool": "upload_document",
  "arguments": {
    "file_path": "/path/to/document.pdf",
    "filename": "document.pdf"
  }
}

3. ask_question

Ask questions about indexed documents with conversation support.

Example MCP call:

{
  "tool": "ask_question",
  "arguments": {
    "question": "What is the main topic?",
    "conversation_id": "optional-id",
    "top_k": 5
  }
}

MCP HTTP Endpoints

  • GET /tools: Get available MCP tools
  • POST /tools/call: Execute an MCP tool

💬 Conversation Memory

The system maintains conversation context through conversation_id:

  1. First Question: No conversation_id needed - system generates one
  2. Follow-up Questions: Use the returned conversation_id for context
  3. Context Window: Last 20 messages (10 exchanges) are maintained
  4. Storage: In-memory (can be persisted to database in production)

Example Flow

# First question
response1 = ask_question("What is machine learning?")
conversation_id = response1["conversation_id"]

# Follow-up question (maintains context)
response2 = ask_question("Can you give examples?", conversation_id=conversation_id)

# Conversation history includes both questions and answers

🛠️ Technology Stack

  • Backend Framework: FastAPI
  • Vector Database: FAISS (CPU version)
  • Embeddings: OpenAI text-embedding-3-small
  • LLM: OpenAI GPT-4o-mini
  • PDF Processing: PyPDF2
  • Frontend: Streamlit
  • Package Manager: UV
  • Language: Python 3.9+

📝 Notes

  • Memory Limitation: Conversation history is stored in-memory. For production, consider using a database (Redis, PostgreSQL, etc.)
  • Vector Store: FAISS indices are stored on disk and persist across restarts
  • Chunking Strategy: Documents are split into 1000-character chunks with 200-character overlap
  • Embedding Dimension: 1536 (OpenAI text-embedding-3-small)
  • Error Handling: Basic error handling is implemented; enhance for production use

🔒 Security Considerations

  • Store API keys securely in .env file (never commit to version control)
  • Add authentication/authorization for production deployments
  • Validate file uploads (type, size limits)
  • Implement rate limiting for API endpoints
  • Use HTTPS in production

📚 Additional Resources

🤝 Contributing

This is a simple local RAG implementation. Feel free to extend it with:

  • Support for more document types (DOCX, TXT, etc.)
  • Advanced chunking strategies
  • Database persistence for conversations
  • User authentication
  • Multi-tenant support
  • Advanced retrieval strategies (reranking, hybrid search)

📄 License

This project is provided as-is for educational and development purposes.