dataML007/simple_local_rag
If you are the rightful owner of simple_local_rag and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Model Context Protocol (MCP) server is designed to facilitate external integration by exposing various tools and resources for document management and question answering.
Simple Local RAG System
A multi-modal Retrieval-Augmented Generation (RAG) system with FAISS vector database, featuring a Streamlit frontend, FastAPI backend, and MCP server integration.
📋 Table of Contents
- Features
- Architecture
- Project Structure
- Installation
- Configuration
- Usage
- API Endpoints
- MCP Server
- Conversation Memory
- Technology Stack
✨ Features
- PDF Document Upload: Upload and index PDF documents in a FAISS vector database
- Question Answering: Ask questions and get AI-powered answers based on indexed documents
- Conversation Memory: Maintains context across multiple questions in a conversation
- Document Management: List and view all indexed documents with metadata
- MCP Server: Expose functionality as MCP tools for other applications
- Persistent Storage: FAISS indices and metadata are saved to disk for persistence
- Source Attribution: Answers include references to source documents
🏗️ Architecture
High-Level Architecture
┌─────────────────┐
│ Streamlit UI │ ← User Interface
└────────┬────────┘
│ HTTP REST API
┌────────▼────────┐
│ FastAPI Backend│ ← Main API Server
└────────┬────────┘
│
┌────┴────┐
│ │
┌───▼───┐ ┌──▼──────┐
│ FAISS │ │ OpenAI │
│ Vector│ │ API │
│ Store │ │ │
└───────┘ └─────────┘
┌─────────────────┐
│ MCP Server │ ← External Integration
└─────────────────┘
Component Details
1. Backend (FastAPI)
The backend provides REST API endpoints for:
- Document upload and processing
- Document listing
- Question answering with RAG
- Conversation management
Key Modules:
main.py: FastAPI application with route handlersvector_store.py: FAISS vector database operationspdf_processor.py: PDF text extraction and chunkingrag.py: Retrieval-Augmented Generation logic with conversation memorymodels.py: Pydantic models for request/response validation
2. Frontend (Streamlit)
Interactive web interface with:
- Document upload interface
- Document list viewer
- Chat interface with conversation history
- Source attribution display
3. MCP Server
Model Context Protocol server exposing three tools:
list_documents: Get all indexed documentsupload_document: Upload and index new documentsask_question: Query documents with conversation support
4. Vector Store (FAISS)
- Index Type: L2 distance with normalized vectors (cosine similarity)
- Embedding Model: OpenAI
text-embedding-3-small(1536 dimensions) - Storage: Persistent on-disk storage with metadata
5. RAG Pipeline
-
Document Processing:
- Extract text from PDF
- Split into overlapping chunks (1000 chars, 200 overlap)
- Generate embeddings for each chunk
- Store in FAISS index
-
Query Processing:
- Generate embedding for user question
- Search FAISS for similar chunks (top-k retrieval)
- Retrieve conversation history (if conversation_id provided)
- Generate answer using OpenAI LLM with context
- Update conversation history
📁 Project Structure
simple_local_rag/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── main.py # FastAPI application
│ │ ├── models.py # Pydantic models
│ │ ├── vector_store.py # FAISS operations
│ │ ├── pdf_processor.py # PDF processing
│ │ └── rag.py # RAG logic with memory
│ └── pyproject.toml # UV dependencies
├── frontend/
│ ├── streamlit_app.py # Streamlit UI
│ └── requirements.txt # Frontend dependencies
├── mcp/
│ ├── __init__.py
│ ├── server.py # MCP server
│ └── pyproject.toml # MCP dependencies
├── data/
│ ├── uploads/ # Temporary PDF storage
│ ├── faiss_index.index # FAISS vector index
│ ├── metadata.json # Document metadata
│ └── chunks.pkl # Chunk metadata
├── .env # Environment variables (create this)
├── .gitignore
└── README.md
🔧 Installation
Prerequisites
- Python 3.9 or higher
- UV package manager (install from https://github.com/astral-sh/uv)
- OpenAI API key
Setup Steps
-
Clone or navigate to the project directory:
cd simple_local_rag -
Install UV (if not already installed):
curl -LsSf https://astral.sh/uv/install.sh | sh -
Create virtual environment and install backend dependencies:
cd backend uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -e . cd .. -
Install frontend dependencies:
cd frontend uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -r requirements.txt cd .. -
Install MCP server dependencies (optional):
cd mcp uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -e . cd .. -
Create
.envfile in the root directory:cp .env.example .env # Edit .env and add your OpenAI API key
⚙️ Configuration
Create a .env file in the root directory with the following variables:
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Backend API Configuration
API_HOST=0.0.0.0
API_PORT=8000
# Frontend Configuration
STREAMLIT_SERVER_PORT=8501
# MCP Server Configuration
MCP_SERVER_PORT=8001
🚀 Usage
Starting the Backend
cd backend
source .venv/bin/activate # On Windows: .venv\Scripts\activate
python -m app.main
Or using uvicorn directly:
uvicorn app.main:app --host 0.0.0.0 --port 8000
The API will be available at http://localhost:8000
Starting the Frontend
cd frontend
source .venv/bin/activate # On Windows: .venv\Scripts\activate
streamlit run streamlit_app.py
The UI will be available at http://localhost:8501
Starting the MCP Server
cd mcp
source .venv/bin/activate # On Windows: .venv\Scripts\activate
python server.py
The MCP server will be available at http://localhost:8001
📡 API Endpoints
Health Check
- GET
/health- Returns API health status
Upload Document
- POST
/upload- Upload a PDF file for indexing
- Request: Multipart form data with
filefield - Response: Document ID, filename, chunk count
List Documents
- GET
/documents- Get list of all indexed documents
- Response: List of documents with metadata
Query Documents
- POST
/query- Ask a question about indexed documents
- Request Body:
{ "question": "What is the main topic?", "conversation_id": "optional-conversation-id", "top_k": 5 } - Response: Answer, conversation_id, relevant chunks, sources
Get Conversation
- GET
/conversation/{conversation_id}- Get full conversation history
- Response: All messages in the conversation
🔌 MCP Server
The MCP server exposes three tools:
1. list_documents
List all indexed documents with descriptions.
Example MCP call:
{
"tool": "list_documents",
"arguments": {}
}
2. upload_document
Upload and index a new PDF document.
Example MCP call:
{
"tool": "upload_document",
"arguments": {
"file_path": "/path/to/document.pdf",
"filename": "document.pdf"
}
}
3. ask_question
Ask questions about indexed documents with conversation support.
Example MCP call:
{
"tool": "ask_question",
"arguments": {
"question": "What is the main topic?",
"conversation_id": "optional-id",
"top_k": 5
}
}
MCP HTTP Endpoints
- GET
/tools: Get available MCP tools - POST
/tools/call: Execute an MCP tool
💬 Conversation Memory
The system maintains conversation context through conversation_id:
- First Question: No
conversation_idneeded - system generates one - Follow-up Questions: Use the returned
conversation_idfor context - Context Window: Last 20 messages (10 exchanges) are maintained
- Storage: In-memory (can be persisted to database in production)
Example Flow
# First question
response1 = ask_question("What is machine learning?")
conversation_id = response1["conversation_id"]
# Follow-up question (maintains context)
response2 = ask_question("Can you give examples?", conversation_id=conversation_id)
# Conversation history includes both questions and answers
🛠️ Technology Stack
- Backend Framework: FastAPI
- Vector Database: FAISS (CPU version)
- Embeddings: OpenAI text-embedding-3-small
- LLM: OpenAI GPT-4o-mini
- PDF Processing: PyPDF2
- Frontend: Streamlit
- Package Manager: UV
- Language: Python 3.9+
📝 Notes
- Memory Limitation: Conversation history is stored in-memory. For production, consider using a database (Redis, PostgreSQL, etc.)
- Vector Store: FAISS indices are stored on disk and persist across restarts
- Chunking Strategy: Documents are split into 1000-character chunks with 200-character overlap
- Embedding Dimension: 1536 (OpenAI text-embedding-3-small)
- Error Handling: Basic error handling is implemented; enhance for production use
🔒 Security Considerations
- Store API keys securely in
.envfile (never commit to version control) - Add authentication/authorization for production deployments
- Validate file uploads (type, size limits)
- Implement rate limiting for API endpoints
- Use HTTPS in production
📚 Additional Resources
- FAISS Documentation
- FastAPI Documentation
- Streamlit Documentation
- OpenAI API Documentation
- MCP Specification
🤝 Contributing
This is a simple local RAG implementation. Feel free to extend it with:
- Support for more document types (DOCX, TXT, etc.)
- Advanced chunking strategies
- Database persistence for conversations
- User authentication
- Multi-tenant support
- Advanced retrieval strategies (reranking, hybrid search)
📄 License
This project is provided as-is for educational and development purposes.