Daniel-Barta/mcp-rag-server
If you are the rightful owner of mcp-rag-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The MCP-RAG-Server is a specialized server designed to facilitate the integration and management of model context protocols, particularly in environments utilizing LLM and MCP technologies.
mcp-rag-server (local RAG MCP server for any repository)
mcp-rag-server
is a lightweight, zeroânetwork (after model download) RetrievalâAugmented Generation helper you can plug into any client that speaks the [Model Context Protocol (MCP)]. GitHub Copilot Agent mode in Visual Studio / VS Code is just one option â you can also use the official MCP Inspector, future MCPâaware IDEs, or custom tooling.
It indexes a target repository directory, chunks the content (default chunk size 800 chars with 120 char overlap â both configurable via CHUNK_SIZE
/ CHUNK_OVERLAP
), builds local embeddings using @xenova/transformers
, and exposes MCP tools:
rag_query
â semantic search returning scored snippets (path, score, snippet)read_file
â secure file read (optional line range) constrained toREPO_ROOT
list_files
â list directory contents (files & subdirectories) with optional recursion, depth and extension filtering
Two transports are supported (select with MCP_TRANSPORT=stdio|http
):
stdio
â simplest integration for IDEs that spawn a process (backward compatible default)http
(Streamable HTTP) â recommended for large repos / first run so you can watch logs & poll readiness before attaching a client. Enable viaMCP_TRANSPORT=http
. Includes DNS rebinding protection by default.
Features
- Pure local embedding inference (no external API calls) via
@xenova/transformers
- Multiâlanguage source + docs support (configurable via
ALLOWED_EXT
) - Excluded folder patterns support (configurable via
EXCLUDED_FOLDERS
) - Fast glob file discovery and overlapping chunking for better recall
- Simple cosine similarity ranking (optionally swap to ANN later)
- Pluggable model selection via
MODEL_NAME
(see guidance below) - Optional persistent JSON index + warm start & incremental reindexing via
INDEX_STORE_PATH
- Incremental change detection (additions / deletions / file size changes) to avoid full rebuilds
- Stdio or Streamable HTTP transport (with optional host allowâlist / DNS rebinding protection)
- Safe path handling (rejects attempts to escape
REPO_ROOT
) - Minimal dependencies; quick startup after first model load
- Ready for extension: add new MCP tools or ANN / hybrid retrieval backends
Planned / Niceâtoâhave: hybrid BM25 + embedding search, ANN acceleration (HNSW / IVF), perâlanguage tokenizer heuristics, batched / parallel embedding, semantic boundary aware chunking.
Requirements
- Node.js 18+
- Visual Studio 2022 17.14+ with GitHub Copilot (Agent mode enabled)
- Path to your repository (
REPO_ROOT
)
Install
npm install npm run build
Run (local test)
Build then start (stdio transport by default). Use either npm start
or invoke the built file directly.
Windows PowerShell
npm run build
$env:REPO_ROOT="C:\path\to\your-repo"; node dist/index.js
Or:
$env:REPO_ROOT="C:\path\to\your-repo"; npm start
macOS / Linux (bash/zsh)
npm run build
export REPO_ROOT="/path/to/your-repo"; node dist/index.js
Or:
export REPO_ROOT="/path/to/your-repo"; npm start
Optionally set a model cache to speed up subsequent runs (first start downloads the model once):
export TRANSFORMERS_CACHE="/path/to/cache" # macOS/Linux
$env:TRANSFORMERS_CACHE="C:\path\to\cache" # Windows PowerShell
Streamable HTTP mode (recommended for large initial indexes)
Run the MCP server as an HTTP endpoint and only open your IDE after Embeddings ready.
shows (avoids client timeouts on cold start):
npm run build
$env:REPO_ROOT="C:\path\to\your-repo"; $env:MCP_TRANSPORT="http"; npm start
export REPO_ROOT="/path/to/your-repo"; MCP_TRANSPORT=http npm start
Default HTTP bind: http://127.0.0.1:3000/mcp. Override with HOST
and MCP_PORT
envs. A readiness endpoint is available at http://127.0.0.1:3000/health
returning JSON like:
{
"version": "0.x.y",
"repoRoot": "C:/abs/path",
"modelName": "<embedding model>",
"transport": "stdio" | "http",
"ready": true | false,
"startedAt": "2025-01-01T00:00:00.000Z",
"indexing": {
"filesDiscovered": 123,
"chunksTotal": 456,
"chunksEmbedded": 456
}
}
ready
flips to true only once all discovered chunks have embeddings (post cold build or incremental update completion).
Instructions endpoint
The server also exposes GET /instructions
, which serves the Markdown file docs/copilot-instructions.md
with all occurrences of <FOLDER_INFO_NAME>
replaced by the FOLDER_INFO_NAME
value from your environment (default REPO_ROOT
).
Notes:
- Start the server from the repository root so
docs/copilot-instructions.md
resolves via the current working directory. - Response content type is
text/markdown; charset=utf-8
.
Linting & Formatting
- Run ESLint (check):
npm run lint
- Auto-fix ESLint issues:
npm run lint:fix
- Format with Prettier:
npm run format
- Check formatting:
npm run format:check
Test with MCP Inspector (without VS)
Use the MCP Inspector to exercise the server locally and try the tools without Visual Studio.
Windows PowerShell:
npm run build
$env:REPO_ROOT="C:\path\to\your-repo"; npx @modelcontextprotocol/inspector node .\\dist\\index.js
Streamable HTTP via Inspector (Windows):
npm run build
$env:REPO_ROOT="C:\path\to\your-repo"; $env:MCP_TRANSPORT="http"; npx @modelcontextprotocol/inspector http://localhost:3000/mcp --transport http
macOS/Linux (bash/zsh):
export REPO_ROOT="/path/to/your-repo"
npx @modelcontextprotocol/inspector node dist/index.js
Streamable HTTP (macOS/Linux):
export REPO_ROOT="/path/to/your-repo"; MCP_TRANSPORT=http npx @modelcontextprotocol/inspector http://localhost:3000/mcp --transport http
Notes:
- First run downloads the embedding model and builds embeddings; the Inspector will connect only after startup completes. Watch the terminal for progress logs printed to stderr.
- You can also put settings in a
.env
file at the project root (e.g.,REPO_ROOT
,TRANSFORMERS_CACHE
).
In the Inspector UI:
- Click "List tools" to verify these tools are available:
rag_query
,read_file
,list_files
. - Select a tool and click "Call tool". Provide JSON input as shown below.
Examples
- Semantic search over the repo
Tool: rag_query
Input JSON:
{
"query": "protobuf message X schema",
"top_k": 5
}
The response includes an array of matches with path
, score
, and snippet
.
- List files in a directory (non-recursive by default)
Tool: list_files
Input JSON:
{
"dir": "src",
"recursive": false
}
Recursive with filters and limits:
Tool: list_files
Input JSON:
{
"dir": "src",
"recursive": true,
"maxDepth": 3,
"includeExtensions": ["ts", "md"],
"limit": 200
}
Response shape:
{
"entries": [
{ "path": "src/", "type": "dir" },
{ "path": "src/index.ts", "type": "file", "size": 1234 },
{ "path": "src/lib/", "type": "dir" }
]
}
- Read a file (optionally with a line range)
Tool: read_file
Input JSON:
{
"path": "src/path/to/file.txt", // relative to REPO_ROOT
"startLine": 1,
"endLine": 120
}
Troubleshooting
- Slow startup: set
TRANSFORMERS_CACHE
to a fast local folder and (optionally) setALLOWED_EXT
(e.g.,ts,tsx,js
for TypeScript/JS only, or any list you need). - Path errors:
path
must be relative toREPO_ROOT
. Absolute paths are rejected for safety. - Nothing appears in Inspector for minutes: the server is still initializing (model download + embedding). This is expected on first run.
.- Slow warm restarts: provide
INDEX_STORE_PATH
so embeddings persist and only changed files reâembed.
Environment configuration (.env)
You can configure environment variables via a local .env
file.
Steps:
- Copy
.env.example
to.env
. - Edit values as needed.
Supported variables:
REPO_ROOT
(required): path to the repository to index.FOLDER_INFO_NAME
(optional): display label used inside MCP tool descriptions for the repository root (defaultREPO_ROOT
). This is purely cosmetic for client UX; it does NOT affect which directory is indexed (that is controlled only byREPO_ROOT
). Set it if you prefer a friendlier name (e.g.,frontend-app
ormonorepo-root
) to appear in tool metadata and path guidance returned to the client.TRANSFORMERS_CACHE
(optional): cache folder for model files.ALLOWED_EXT
(optional): comma-separated list of file extensions to index.EXCLUDED_FOLDERS
(optional): comma-separated list of folder patterns to exclude from indexing. Supports both exact folder names (e.g.,node_modules,dist,build,.git
) and basic glob patterns (e.g.,**/test/**,**/tests/**
). Files in these folders will be skipped during indexing. Defaults include common build/dependency folders:node_modules
,dist
,build
,.git
,target
,bin
,obj
,.cache
,coverage
,.nyc_output
.MCP_TRANSPORT
(optional):http
orstdio
.VERBOSE
(optional): true/1/yes/on for more granular progress logs during indexing & embedding.INDEX_STORE_PATH
(optional): path to a persisted JSON embedding index (e.g.,C:\repo\.mcp-index.json
or/repo/.mcp-index.json
). Enables fast warm starts + incremental reindex (new / deleted / sizeâchanged files only).MODEL_NAME
(optional): override the default embedding model (jinaai/jina-embeddings-v2-base-code
). Examples:MODEL_NAME=jinaai/jina-embeddings-v2-base-code
(default) â Balanced multilingual/code embedding model; strong for mixed natural language + source code semantic search.MODEL_NAME=Xenova/bge-base-en-v1.5
â High-quality English general-purpose text embeddings (good for documentation/wiki style corpora).MODEL_NAME=Xenova/bge-small-en-v1.5
â Faster/lighter English model when latency or memory matters more than a few points of recall. Any compatible sentence / feature-extraction model supported by@xenova/transformers
should work.
HOST
(optional, HTTP mode): bind host (default127.0.0.1
).MCP_PORT
(optional, HTTP mode): TCP port (default3000
).ENABLE_DNS_REBINDING_PROTECTION
(optional, HTTP mode): defaults totrue
; set tofalse
to disable host allowâlist checks.ALLOWED_HOSTS
(optional, HTTP mode): comma-separated list of hosts allowed when DNS rebinding protection is enabled. Defaults include localhost and 127.0.0.1 with/without port.CHUNK_SIZE
(optional): maximum characters per chunk before embedding (default 800). Larger values reduce total embeddings (faster build, less memory) but can blur fine-grained matches. Typical ranges:- 700â900 (balanced default)
- 1000â1400 (large prose / long functions; fewer vectors)
- 400â600 (fineâgrained code navigation; more vectors / memory)
CHUNK_OVERLAP
(optional): trailing characters carried into the next chunk (default 120 â 15%). Recommended 10â20% ofCHUNK_SIZE
(e.g., 80â160 for an 800 size). Increase slightly (up to ~20â25%) if you observe answers missing crossâboundary context; decrease to speed up builds.
Safety caps: CHUNK_SIZE
is clamped to 8000 and CHUNK_OVERLAP
to 4000; if overlap >= size it's automatically reduced (logged) to preserve forward progress.
Persistence & Incremental Reindexing
Set INDEX_STORE_PATH
to enable a persisted JSON index storing chunks + embeddings. On startup:
- If the file exists and its metadata (model name, chunk size, overlap) matches, it is loaded into memory.
- The repository is rescanned; removed files' chunks are discarded, and new or sizeâchanged files are reâchunked & reâembedded.
- The merged index is saved back (cold build path also persists when configured).
Benefits:
- Dramatically faster warm starts for large repositories.
- Avoids reâembedding unchanged content.
Current limitations:
- Change detection uses file size only (content edits keeping identical size won't reâembed yet).
- Embedding generation is sequential (no parallel batching yet).
- Store schema is minimal (version 1); future versions may add hashing or mtime heuristics.
Force a full rebuild by deleting the store file or changing chunk/model parameters.
Visual Studio integration (MCP)
Copy example.mcp.json
to:
- %USERPROFILE%.mcp.json (Windows), or
- your solution root as
.mcp.json
(recommended for teams)
Adjust the paths in "command"/"args" and the REPO_ROOT
env.
For Streamable HTTP, use a config entry like:
{
"servers": {
"mcp-rag-server": {
"type": "streamable-http",
"url": "http://127.0.0.1:3000/mcp"
}
}
}
Open VS -> Copilot Chat -> switch to Agent mode -> enable the "mcp-rag-server" and its tools (you will be asked to grant permission on first use). If using HTTP transport, ensure the config entry uses "type": "streamable-http"
and the server has finished indexing (check /health
).
Usage in Agent mode
Sample prompt:
"Modify the C# handler for message X. Before you start, use the tool rag_query
with the query 'message X schema' and take the found contracts into account. If you get file paths back, read them via read_file
."
Notes
- First run will download and cache the model (tens to ~100 MB) and build embeddings â this may take minutes depending on repo size.
- Logs are written to stderr (console.error) to keep MCP stdout clean.
- For very large repos, consider adding an ANN index (
hnswlib-node
) or a hybrid BM25+embeddings setup.
Model selection guidance
Choose an embedding model based on your repository characteristics:
jinaai/jina-embeddings-v2-base-code
(default): Use when your corpus contains a meaningful amount of source code (multi-language) mixed with README / design docs. Provides strong cross-domain alignment for code-symbol + natural language queries.Xenova/bge-base-en-v1.5
: Use when the content is predominantly English natural language (docs, knowledge base) and you want slightly stronger pure text semantic quality.Xenova/bge-small-en-v1.5
: Use for faster startup / lower memory on constrained machines or when indexing very large repos where throughput matters.
Feel free to experimentâswap via MODEL_NAME
and rebuild the embedding cache (delete any existing cached vectors if you persist them externally).
Chunk sizing guidance
Why 800 / 120? Empirically this keeps most selfâcontained code constructs (functions/classes) and short doc sections in a single chunk while providing enough continuity for crossâblock semantic matches. Adjust based on corpus:
- Mostly short functions or config files: smaller chunks (500â700) aid pinpoint retrieval.
- Large narrative docs / design specs: larger chunks (1000â1400) reduce vector count without much recall loss.
- Heavily interdependent code where context spans multiple files: keep default or modestly raise overlap (to ~160) rather than shrinking size.
Rule of thumb: Overlap â 15% of size. Avoid overlap >= size (autoâcorrected) and avoid extremely small sizes (<300) unless you have a downstream reâranking stage.
Stepâbyâstep: Index a Java repo (ProjectB in IntelliJ) and use it from C# (ProjectA in Visual Studio 2022)
This walkthrough shows how to index a Java project (ProjectB) and make that knowledge available to GitHub Copilot (Agent mode) while you work in a separate C# solution (ProjectA) in Visual Studio 2022.
Assumptions
- Youâre on Windows and use PowerShell.
- ProjectB is a Java codebase you typically open in IntelliJ IDEA (location:
C:\path\to\ProjectB
). IntelliJ does not need to be open for indexing. - ProjectA is a C# solution you open in Visual Studio 2022 (location:
C:\path\to\ProjectA
).
1) Build this MCP server
npm install
npm run build
2) Start the server in HTTP mode pointing at ProjectB
Set environment variables once in your PowerShell session, then start. The optional index store speeds up warm starts.
$env:REPO_ROOT = "C:\path\to\ProjectB"
$env:MCP_TRANSPORT = "http"
$env:INDEX_STORE_PATH = "C:\path\to\ProjectB\.mcp-index.json" # optional but recommended
$env:ALLOWED_EXT = "java,kt,kts,md,xml,gradle,properties" # tailor for Java projects
# Optional: cache model files to a fast local folder
# $env:TRANSFORMERS_CACHE = "C:\model-cache"
npm start
Wait until the console prints âEmbeddings ready.â You can also confirm readiness:
- Health: http://127.0.0.1:3000/health (ready: true)
- Tools are exposed at: http://127.0.0.1:3000/mcp (for MCP clients)
Leave this window running.
3) Point Visual Studio (ProjectA) at this server
Create a .mcp.json
next to your ProjectA solution file (or place it at %USERPROFILE%\.mcp.json
to apply globally). Use the HTTP entry so VS doesnât need to spawn the server.
{
"servers": {
"mcp-rag-server": {
"type": "streamable-http",
"url": "http://127.0.0.1:3000/mcp"
}
}
}
Open ProjectA in Visual Studio 2022, open Copilot Chat, switch to Agent mode, and enable the "mcp-rag-server". Grant permissions if prompted.
Tips
- If this is your first run on a large repo, keep the MCP server window open until indexing completes before connecting from VS. Using HTTP avoids timeouts during the cold build.
- For subsequent runs, the
INDEX_STORE_PATH
makes startup much faster.
4) Use it from Copilot while coding in ProjectA
Ask Copilot to search ProjectB before answering questions or generating code in ProjectA. Example prompts:
- âUse the tool rag_query to find the Java service responsible for authentication in ProjectB; then show me the equivalent interface I should implement in C# here.â
- âList files under src/main/java that reference âInvoiceâ in ProjectB, then open the key file.â
Behind the scenes, Copilot will call:
rag_query
â to locate relevant snippets from ProjectBread_file
â to fetch exact code/lineslist_files
â to navigate directories
5) (Optional) Use MCP Inspector to sanityâcheck
If you want to test the tools before involving Visual Studio:
# In a separate PowerShell
$env:REPO_ROOT = "C:\path\to\ProjectB"
$env:MCP_TRANSPORT = "http"
npm run build
npx @modelcontextprotocol/inspector http://127.0.0.1:3000/mcp --transport http
How to use docs/copilot-instructions.md
The file docs/copilot-instructions.md
contains clear, copyâpastable guidance that teaches the assistant how to leverage this MCP server effectively (when to call rag_query
, read_file
, list_files
, how to quote code, etc.).
There are two easy ways to use it:
- Via the serverâs /instructions endpoint (best with HTTP mode)
-
Ensure the server is running with
MCP_TRANSPORT=http
. -
Optionally set a friendly label for your repo in the UI:
$env:FOLDER_INFO_NAME = "ProjectB"
-
Open http://127.0.0.1:3000/instructions in a browser. The page renders the instructions with
<FOLDER_INFO_NAME>
replaced (e.g., âProjectBâ). -
Copy the content into Copilot Chat in Visual Studio and pin it for the current session/conversation to guide the assistantâs behavior.
- Sync and store in ProjectAâs .github folder (from /instructions)
-
Ensure the server is running with
MCP_TRANSPORT=http
and set a friendly label:$env:FOLDER_INFO_NAME = "ProjectB"
-
Create (if not exists)
C:\path\to\ProjectA\.github\
. -
Pull the latest rendered instructions and save them to the repo:
$dest = "C:\path\to\ProjectA\.github\copilot-instructions.md" Invoke-RestMethod 'http://127.0.0.1:3000/instructions' | Set-Content -Encoding UTF8 $dest
-
Commit the file so your team can reuse it. Re-run the command above anytime you update the instructions in this server and want to refresh the checked-in copy.
Notes
- These instructions are optional but help keep Copilot disciplined: it will search before answering, cite paths, and fetch exact code lines before quoting.
- The serverâs tool descriptions also reference
FOLDER_INFO_NAME
to provide consistent, repoâspecific guidance in tool metadata.