rag-mcp-server by purlieu-studios - MCP Server

RAG MCP Server

A local Retrieval-Augmented Generation (RAG) server that uses OLLAMA for embeddings and provides semantic search capabilities through the Model Context Protocol (MCP).

Features

OLLAMA Integration: Uses OLLAMA for generating text embeddings
Vector Storage: In-memory vector store with persistent storage to JSON
Semantic Search: Cosine similarity-based document retrieval
Document Chunking: Automatic text chunking for large documents
MCP Protocol: Exposes RAG functionality through standard MCP tools
Persistent Index: Automatically saves and loads the document index

Prerequisites

Node.js: Version 18 or higher
OLLAMA: Must be installed and running locally
- Download from: https://ollama.ai
- Default URL: http://localhost:11434
OLLAMA Embedding Model: Install an embedding model
```
ollama pull nomic-embed-text
```

Installation

Install dependencies:
```
npm install
```
Build the project:
```
npm run build
```

Configuration

Configure the server using environment variables:

OLLAMA_BASE_URL: OLLAMA API base URL (default: http://localhost:11434)
OLLAMA_MODEL: Embedding model to use (default: nomic-embed-text)
INDEX_PATH: Path to save the vector index (default: ./rag-index.json)

Example Configuration

Create a .env file:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text
INDEX_PATH=./rag-index.json

Usage

As an MCP Server

Add to your Claude Code MCP settings:

{
  "mcpServers": {
    "rag": {
      "command": "node",
      "args": ["C:\\programming\\rag-mcp-server\\dist\\index.js"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "nomic-embed-text"
      }
    }
  }
}

Standalone Usage

Run the server directly:

npm start

Available Tools

1. `rag_index_document`

Index a document for semantic search.

Parameters:

content (required): The text content to index
id (optional): Document ID (auto-generated if not provided)
metadata (optional): Additional metadata (e.g., source, title)
chunkText (optional): Whether to chunk large texts (default: true)

Example:

{
  "content": "The quick brown fox jumps over the lazy dog.",
  "metadata": {
    "source": "example.txt",
    "title": "Fox Example"
  }
}

2. `rag_search`

Search for relevant documents using semantic similarity.

Parameters:

query (required): The search query
topK (optional): Number of results to return (default: 5)
threshold (optional): Minimum similarity score 0-1 (default: 0.0)

Example:

{
  "query": "information about foxes",
  "topK": 3,
  "threshold": 0.5
}

3. `rag_get_document`

Retrieve a specific document by ID.

Parameters:

id (required): The document ID

4. `rag_remove_document`

Remove a document from the index.

Parameters:

id (required): The document ID to remove

5. `rag_list_documents`

List all indexed documents with their metadata.

6. `rag_get_stats`

Get statistics about the RAG index (total documents, chunks, etc.).

7. `rag_clear_index`

Clear all documents from the index. Use with caution!

How It Works

Indexing: Documents are chunked (if needed) and sent to OLLAMA to generate embeddings
Storage: Embeddings and document metadata are stored in an in-memory vector store
Persistence: The index is automatically saved to a JSON file
Search: Query text is embedded and compared against stored embeddings using cosine similarity
Retrieval: The most similar documents are returned with similarity scores

Architecture

┌─────────────┐
│   MCP       │
│   Client    │
└──────┬──────┘
       │
       │ MCP Protocol
       │
┌──────▼──────────────────────┐
│   RAG MCP Server            │
│  ┌──────────────────────┐   │
│  │   RAG Service        │   │
│  │  ┌────────┐ ┌──────┐ │   │
│  │  │ Vector │ │OLLAMA│ │   │
│  │  │ Store  │ │Client│ │   │
│  │  └────────┘ └──────┘ │   │
│  └──────────────────────┘   │
└─────────────────────────────┘
       │
       │ HTTP
       │
┌──────▼──────┐
│   OLLAMA    │
│   Server    │
└─────────────┘

Development

Build

npm run build

Watch mode

npm run dev

Inspect mode (for debugging)

npm run inspect

Testing

The project includes comprehensive unit and integration tests.

Run Tests

npm test

Run Tests in Watch Mode

npm run test:watch

Generate Coverage Report

npm run test:coverage

Test Categories

Unit Tests: Fast, isolated tests with mocked dependencies
- OLLAMA client tests
- Vector store tests
- RAG service tests
Integration Tests: End-to-end tests with real OLLAMA
- Requires OLLAMA running locally
- Tests full indexing and search workflow
- Can be skipped with SKIP_INTEGRATION_TESTS=true

See for detailed testing documentation.

Recommended OLLAMA Models

For embeddings:

nomic-embed-text (recommended) - 768 dimensions
mxbai-embed-large - 1024 dimensions
all-minilm - 384 dimensions (faster, smaller)

Install a model:

ollama pull nomic-embed-text

Performance Considerations

In-memory storage: Fast but limited by RAM
Chunk size: Default 512 tokens with 50 token overlap
Batch processing: Documents are indexed sequentially
Persistence: Index is saved after each modification

Limitations

Vector store is in-memory (not suitable for massive datasets)
Sequential embedding generation (no batch API support)
Basic cosine similarity (no advanced filtering)
No incremental updates (full document re-indexing required)

Future Enhancements

Add support for file uploads (PDF, DOCX, TXT)
Implement batch embedding generation
Add metadata filtering in search
Support for multiple vector stores
Add hybrid search (keyword + semantic)
Implement vector store backends (SQLite, PostgreSQL with pgvector)

License

MIT

purlieu-studios/rag-mcp-server

RAG MCP Server

Features

Prerequisites

Installation

Configuration

Example Configuration

Usage

As an MCP Server

Standalone Usage

Available Tools

1. rag_index_document

2. rag_search

3. rag_get_document

4. rag_remove_document

5. rag_list_documents

6. rag_get_stats

7. rag_clear_index

How It Works

Architecture

Development

Build

Watch mode

Inspect mode (for debugging)

Testing

Run Tests

Run Tests in Watch Mode

Generate Coverage Report

Test Categories

Recommended OLLAMA Models

Performance Considerations

Limitations

Future Enhancements

License

1. `rag_index_document`

2. `rag_search`

3. `rag_get_document`

4. `rag_remove_document`

5. `rag_list_documents`

6. `rag_get_stats`

7. `rag_clear_index`