xuyangbocn/rag-mcp-server
3.2
If you are the rightful owner of rag-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The RAG-MCP-Server project is designed to create a pipeline for custom embedding and serve it as an API and MCP for LLM applications.
RAG-MCP-Server
This project intends to
- build a pipeline for custom embedding, and
- use it to serve as API and MCP for LLM applications.
Design
(ticked: resources have been IaC-ed)
How it Works
- Files are dropped in preprocessing S3 bucket
- supports txt for now
- Transform_and_Load lambda chunks, embeds as each doc is saved
- uses nomic-text-embedding setup on Ollama EC2
- Postgres saves the embedding, serves as Vector store
- via vector extension
- via langchain
- MCP server delivers a retreiver tool for client usage
- uses MCP python SDK, docker, Fargate
- via sse for now
- Host & MCP client
- to be done
Folder Structure
-
infra/ Terraform IaC for all infra
- mains/ each folder under main is a separate Terraform project
- extract/ IaC for resources used in the extract portion
- transform_and_load/ IaC for resources from S3 all the way to Postgres as vector store
- llm_app/ IaC for building up the serving portion
- extract/ IaC for resources used in the extract portion
- modules/ Terraform IaC modules used by projects in main
- container_on_ecs_fargate/ Docker image --> Workload endpoint
- ollama_on_ec2/ setup an EC2 with Ollama and deploy chosen model
- s3_event_to_sqs/ setup event delivery for S3 object creation/deletion.
- mains/ each folder under main is a separate Terraform project
-
src/ business logic and source code
- lambda_layers/ zipped package for psycopg, langchain and etc
- transform_and_load/ source code to transform data in S3, embed, and insert to vector db
- llm_app/ code for the MCP / API serving
- mcp_server/ code and Dockerfile
-
test/ For test cases and automation
-
docs/ For documentation and images
Random notes
Nomic as Embedding model
- nomic-text-embedding:v1.5
- vector size: 768
Using Ollama for hosting
- Self hosted model for better data security management
- Embedding model generally not heavy
Postgres as Vector store
Manual turn on vector
extension on RDS
# Use an EC2 as bastion
dnf install postgresql17
psql --host=xxx.ap-southeast-1.rds.amazonaws.com --port=5432 --dbname=xxx --username=xxx
# ***** enter password
SHOW rds.extensions;
CREATE EXTENSION vector;
DB Schema for langchain-postgres.PGVectorStore
CREATE TABLE "public"."embedding_table_name_here"(
"langchain_id" UUID PRIMARY KEY,
"content" TEXT NOT NULL,
"embedding" vector(768) NOT NULL,
"langchain_metadata" JSON
);
Langchain & Unstructured to glue up everything
- Langchain to integration of load of data, embedding, and vector store management
- Unstructured to extract data from many file types, it also chunks files by semantics
Lambda as data transformation workload
- Severless solution integrates with S3, SQS
- Limitation on dependency package size, (langchain, unstructured and psychopg add together exceeded limit)
- *Need other solution
Fargate as MCP server
- RAG MCP server docker image: xuyangbo/rag-mcp-server
- Need to replace
sse
withstreamable-http
- MCP integrates well with starlette for serving (python SDK)