Ahmed-Nezar/RAGPlayground
If you are the rightful owner of RAGPlayground and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Model Context Protocol (MCP) server facilitates interaction between GitHub Copilot Agent Mode and a knowledge base, enabling advanced query handling and context retrieval.
RAGPlayground
RAGPlayground is a project for experimenting with Retrieval-Augmented Generation (RAG) pipelines. It integrates PostgreSQL + pgvector for semantic search, Hugging Face models for embeddings and generation, and an MCP server that allows GitHub Copilot Agent Mode to interact with the knowledge base.
The project demonstrates two workflows:
- Classic RAG: Hugging Face LLMs are used to generate responses based on retrieved context.
- Agent Mode: GitHub Copilot uses MCP (Model Context Protocol) to query the knowledge base.
Features
- End-to-end RAG pipeline (data ingestion ā embeddings ā database ā retrieval ā LLM response)
- PostgreSQL with pgvector for vector similarity search
- SentenceTransformers for embeddings
- Hugging Face Inference API for LLM chat completion
- MCP server for Copilot Agent integration
- Automatic database and table creation if not present
- Interactive CLI loop for queries with colored prompts
RAG Pipeline Flow
Project Structure
RAGPlayground/
āāā README.md
āāā LICENSE
āāā main.py
āāā requirements.txt
āāā src/
āāā db/
ā āāā rag.py # Database setup, table creation, embeddings insertion
āāā extract/
ā āāā processing.py # Dataset download, preprocessing, embeddings generation
āāā llm/
ā āāā llm.py # Hugging Face client, context retrieval, LLM querying
āāā mcp/
āāā server.py # MCP server for GitHub Copilot Agent
Installation
-
Clone the repository
git clone https://github.com/ahmednezar/RAGPlayground.git
cd RAGPlayground
-
Install dependencies
pip install -r requirements.txt
-
Install PostgreSQL and enable the pgvector extension Inside
psql
:CREATE EXTENSION IF NOT EXISTS vector;
-
Configure environment variables in a
.env
file:HF_TOKEN=your_huggingface_token_here
Dataset
The project uses the AG News Classification Dataset from Kaggle. It is downloaded automatically using kagglehub
when you run the pipeline, then preprocessed into embeddings.
How to Use
Running with Hugging Face Models (Classic RAG)
python main.py --llm mistralai/Mistral-7B-Instruct-v0.3 --embedding BAAI/bge-base-en-v1.5
Example:
[RagPlayground] Enter your query: Tell me about soccer news
[RagPlayground] Response: <contextual answer based on retrieved docs>
Running in MCP (Copilot Agent Mode)
python main.py --mcp --embedding BAAI/bge-base-en-v1.5
- Starts an MCP server at
localhost:5001
. - Copilot Agent will use the
retrieve_documents
tool to fetch context before answering. - The agent is guided by instructions defined under
.github/instructions/
.
Architecture
- Data Layer: Downloads AG News dataset, generates embeddings.
- Storage Layer: Embeddings stored in PostgreSQL with pgvector.
- Retrieval Layer: Similarity search using cosine distance.
- Generation Layer: Either Hugging Face LLM or Copilot Agent with MCP.