rag-mcp-server

xuyangbocn/rag-mcp-server

3.2

If you are the rightful owner of rag-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The RAG-MCP-Server project is designed to create a pipeline for custom embedding and serve it as an API and MCP for LLM applications.

RAG-MCP-Server

This project intends to

  • build a pipeline for custom embedding, and
  • use it to serve as API and MCP for LLM applications.

Design

(ticked: resources have been IaC-ed)

design.v1

How it Works

  • Files are dropped in preprocessing S3 bucket
    • supports txt for now
  • Transform_and_Load lambda chunks, embeds as each doc is saved
    • uses nomic-text-embedding setup on Ollama EC2
  • Postgres saves the embedding, serves as Vector store
    • via vector extension
    • via langchain
  • MCP server delivers a retreiver tool for client usage
    • uses MCP python SDK, docker, Fargate
    • via sse for now
  • Host & MCP client
    • to be done

Folder Structure

  • infra/ Terraform IaC for all infra

    • mains/ each folder under main is a separate Terraform project
      • extract/ IaC for resources used in the extract portion
        • transform_and_load/ IaC for resources from S3 all the way to Postgres as vector store
        • llm_app/ IaC for building up the serving portion
    • modules/ Terraform IaC modules used by projects in main
      • container_on_ecs_fargate/ Docker image --> Workload endpoint
      • ollama_on_ec2/ setup an EC2 with Ollama and deploy chosen model
      • s3_event_to_sqs/ setup event delivery for S3 object creation/deletion.
  • src/ business logic and source code

    • lambda_layers/ zipped package for psycopg, langchain and etc
    • transform_and_load/ source code to transform data in S3, embed, and insert to vector db
    • llm_app/ code for the MCP / API serving
      • mcp_server/ code and Dockerfile
  • test/ For test cases and automation

  • docs/ For documentation and images

Random notes

Nomic as Embedding model

  • nomic-text-embedding:v1.5
  • vector size: 768

Using Ollama for hosting

  • Self hosted model for better data security management
  • Embedding model generally not heavy

Postgres as Vector store

Manual turn on vector extension on RDS
# Use an EC2 as bastion
dnf install postgresql17
psql --host=xxx.ap-southeast-1.rds.amazonaws.com --port=5432 --dbname=xxx --username=xxx
# ***** enter password

SHOW rds.extensions;
CREATE EXTENSION vector;
DB Schema for langchain-postgres.PGVectorStore
CREATE TABLE "public"."embedding_table_name_here"(
            "langchain_id" UUID PRIMARY KEY,
            "content" TEXT NOT NULL,
            "embedding" vector(768) NOT NULL,
            "langchain_metadata" JSON
            );

Langchain & Unstructured to glue up everything

  • Langchain to integration of load of data, embedding, and vector store management
  • Unstructured to extract data from many file types, it also chunks files by semantics

Lambda as data transformation workload

  • Severless solution integrates with S3, SQS
  • Limitation on dependency package size, (langchain, unstructured and psychopg add together exceeded limit)
  • *Need other solution

Fargate as MCP server

  • RAG MCP server docker image: xuyangbo/rag-mcp-server
  • Need to replace sse with streamable-http
  • MCP integrates well with starlette for serving (python SDK)