simple_local_rag by dataML007 - MCP Server

Simple Local RAG System

A multi-modal Retrieval-Augmented Generation (RAG) system with FAISS vector database, featuring a Streamlit frontend, FastAPI backend, and MCP server integration.

📋 Table of Contents

Features
Architecture
Project Structure
Installation
Configuration
Usage
API Endpoints
MCP Server
Conversation Memory
Technology Stack

✨ Features

PDF Document Upload: Upload and index PDF documents in a FAISS vector database
Question Answering: Ask questions and get AI-powered answers based on indexed documents
Conversation Memory: Maintains context across multiple questions in a conversation
Document Management: List and view all indexed documents with metadata
MCP Server: Expose functionality as MCP tools for other applications
Persistent Storage: FAISS indices and metadata are saved to disk for persistence
Source Attribution: Answers include references to source documents

🏗️ Architecture

High-Level Architecture

┌─────────────────┐
│  Streamlit UI   │  ← User Interface
└────────┬────────┘
         │ HTTP REST API
┌────────▼────────┐
│  FastAPI Backend│  ← Main API Server
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
┌───▼───┐ ┌──▼──────┐
│ FAISS │ │ OpenAI  │
│ Vector│ │ API     │
│ Store │ │         │
└───────┘ └─────────┘

┌─────────────────┐
│   MCP Server    │  ← External Integration
└─────────────────┘

Component Details

1. Backend (FastAPI)

The backend provides REST API endpoints for:

Document upload and processing
Document listing
Question answering with RAG
Conversation management

Key Modules:

main.py: FastAPI application with route handlers
vector_store.py: FAISS vector database operations
pdf_processor.py: PDF text extraction and chunking
rag.py: Retrieval-Augmented Generation logic with conversation memory
models.py: Pydantic models for request/response validation

2. Frontend (Streamlit)

Interactive web interface with:

Document upload interface
Document list viewer
Chat interface with conversation history
Source attribution display

3. MCP Server

Model Context Protocol server exposing three tools:

list_documents: Get all indexed documents
upload_document: Upload and index new documents
ask_question: Query documents with conversation support

4. Vector Store (FAISS)

Index Type: L2 distance with normalized vectors (cosine similarity)
Embedding Model: OpenAI text-embedding-3-small (1536 dimensions)
Storage: Persistent on-disk storage with metadata

5. RAG Pipeline

Document Processing:
- Extract text from PDF
- Split into overlapping chunks (1000 chars, 200 overlap)
- Generate embeddings for each chunk
- Store in FAISS index
Query Processing:
- Generate embedding for user question
- Search FAISS for similar chunks (top-k retrieval)
- Retrieve conversation history (if conversation_id provided)
- Generate answer using OpenAI LLM with context
- Update conversation history

📁 Project Structure

simple_local_rag/
├── backend/
│   ├── app/
│   │   ├── __init__.py
│   │   ├── main.py              # FastAPI application
│   │   ├── models.py            # Pydantic models
│   │   ├── vector_store.py      # FAISS operations
│   │   ├── pdf_processor.py     # PDF processing
│   │   └── rag.py               # RAG logic with memory
│   └── pyproject.toml           # UV dependencies
├── frontend/
│   ├── streamlit_app.py         # Streamlit UI
│   └── requirements.txt         # Frontend dependencies
├── mcp/
│   ├── __init__.py
│   ├── server.py                # MCP server
│   └── pyproject.toml           # MCP dependencies
├── data/
│   ├── uploads/                 # Temporary PDF storage
│   ├── faiss_index.index        # FAISS vector index
│   ├── metadata.json            # Document metadata
│   └── chunks.pkl               # Chunk metadata
├── .env                         # Environment variables (create this)
├── .gitignore
└── README.md

🔧 Installation

Prerequisites

Python 3.9 or higher
UV package manager (install from https://github.com/astral-sh/uv)
OpenAI API key

Setup Steps

Clone or navigate to the project directory:
```
cd simple_local_rag
```

Install UV (if not already installed):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create virtual environment and install backend dependencies:

cd backend
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .
cd ..

Install frontend dependencies:

cd frontend
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt
cd ..

Install MCP server dependencies (optional):

cd mcp
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e .
cd ..

Create .env file in the root directory:

cp .env.example .env
# Edit .env and add your OpenAI API key

⚙️ Configuration

Create a .env file in the root directory with the following variables:

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Backend API Configuration
API_HOST=0.0.0.0
API_PORT=8000

# Frontend Configuration
STREAMLIT_SERVER_PORT=8501

# MCP Server Configuration
MCP_SERVER_PORT=8001

🚀 Usage

Starting the Backend

cd backend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python -m app.main

Or using uvicorn directly:

uvicorn app.main:app --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

Starting the Frontend

cd frontend
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
streamlit run streamlit_app.py

The UI will be available at http://localhost:8501

Starting the MCP Server

cd mcp
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
python server.py

The MCP server will be available at http://localhost:8001

📡 API Endpoints

Health Check

GET /health
- Returns API health status

Upload Document

POST /upload
- Upload a PDF file for indexing
- Request: Multipart form data with file field
- Response: Document ID, filename, chunk count

List Documents

GET /documents
- Get list of all indexed documents
- Response: List of documents with metadata

Query Documents

POST /query

Ask a question about indexed documents

Request Body:

{
  "question": "What is the main topic?",
  "conversation_id": "optional-conversation-id",
  "top_k": 5
}

Response: Answer, conversation_id, relevant chunks, sources

Get Conversation

GET /conversation/{conversation_id}
- Get full conversation history
- Response: All messages in the conversation

🔌 MCP Server

The MCP server exposes three tools:

1. list_documents

List all indexed documents with descriptions.

Example MCP call:

{
  "tool": "list_documents",
  "arguments": {}
}

2. upload_document

Upload and index a new PDF document.

Example MCP call:

{
  "tool": "upload_document",
  "arguments": {
    "file_path": "/path/to/document.pdf",
    "filename": "document.pdf"
  }
}

3. ask_question

Ask questions about indexed documents with conversation support.

Example MCP call:

{
  "tool": "ask_question",
  "arguments": {
    "question": "What is the main topic?",
    "conversation_id": "optional-id",
    "top_k": 5
  }
}

MCP HTTP Endpoints

GET /tools: Get available MCP tools
POST /tools/call: Execute an MCP tool

💬 Conversation Memory

The system maintains conversation context through conversation_id:

First Question: No conversation_id needed - system generates one
Follow-up Questions: Use the returned conversation_id for context
Context Window: Last 20 messages (10 exchanges) are maintained
Storage: In-memory (can be persisted to database in production)

Example Flow

# First question
response1 = ask_question("What is machine learning?")
conversation_id = response1["conversation_id"]

# Follow-up question (maintains context)
response2 = ask_question("Can you give examples?", conversation_id=conversation_id)

# Conversation history includes both questions and answers

🛠️ Technology Stack

Backend Framework: FastAPI
Vector Database: FAISS (CPU version)
Embeddings: OpenAI text-embedding-3-small
LLM: OpenAI GPT-4o-mini
PDF Processing: PyPDF2
Frontend: Streamlit
Package Manager: UV
Language: Python 3.9+

📝 Notes

Memory Limitation: Conversation history is stored in-memory. For production, consider using a database (Redis, PostgreSQL, etc.)
Vector Store: FAISS indices are stored on disk and persist across restarts
Chunking Strategy: Documents are split into 1000-character chunks with 200-character overlap
Embedding Dimension: 1536 (OpenAI text-embedding-3-small)
Error Handling: Basic error handling is implemented; enhance for production use

🔒 Security Considerations

Store API keys securely in .env file (never commit to version control)
Add authentication/authorization for production deployments
Validate file uploads (type, size limits)
Implement rate limiting for API endpoints
Use HTTPS in production

📚 Additional Resources

🤝 Contributing

This is a simple local RAG implementation. Feel free to extend it with:

Support for more document types (DOCX, TXT, etc.)
Advanced chunking strategies
Database persistence for conversations
User authentication
Multi-tenant support
Advanced retrieval strategies (reranking, hybrid search)

📄 License

This project is provided as-is for educational and development purposes.