purlieu-studios/rag-mcp-server
If you are the rightful owner of rag-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The RAG MCP Server is a local Retrieval-Augmented Generation server that integrates with OLLAMA for embeddings and provides semantic search capabilities through the Model Context Protocol (MCP).
RAG MCP Server
A local Retrieval-Augmented Generation (RAG) server that uses OLLAMA for embeddings and provides semantic search capabilities through the Model Context Protocol (MCP).
Features
- OLLAMA Integration: Uses OLLAMA for generating text embeddings
- Vector Storage: In-memory vector store with persistent storage to JSON
- Semantic Search: Cosine similarity-based document retrieval
- Document Chunking: Automatic text chunking for large documents
- MCP Protocol: Exposes RAG functionality through standard MCP tools
- Persistent Index: Automatically saves and loads the document index
Prerequisites
- Node.js: Version 18 or higher
- OLLAMA: Must be installed and running locally
- Download from: https://ollama.ai
- Default URL:
http://localhost:11434
- OLLAMA Embedding Model: Install an embedding model
ollama pull nomic-embed-text
Installation
-
Install dependencies:
npm install -
Build the project:
npm run build
Configuration
Configure the server using environment variables:
OLLAMA_BASE_URL: OLLAMA API base URL (default:http://localhost:11434)OLLAMA_MODEL: Embedding model to use (default:nomic-embed-text)INDEX_PATH: Path to save the vector index (default:./rag-index.json)
Example Configuration
Create a .env file:
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=nomic-embed-text
INDEX_PATH=./rag-index.json
Usage
As an MCP Server
Add to your Claude Code MCP settings:
{
"mcpServers": {
"rag": {
"command": "node",
"args": ["C:\\programming\\rag-mcp-server\\dist\\index.js"],
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "nomic-embed-text"
}
}
}
}
Standalone Usage
Run the server directly:
npm start
Available Tools
1. rag_index_document
Index a document for semantic search.
Parameters:
content(required): The text content to indexid(optional): Document ID (auto-generated if not provided)metadata(optional): Additional metadata (e.g., source, title)chunkText(optional): Whether to chunk large texts (default: true)
Example:
{
"content": "The quick brown fox jumps over the lazy dog.",
"metadata": {
"source": "example.txt",
"title": "Fox Example"
}
}
2. rag_search
Search for relevant documents using semantic similarity.
Parameters:
query(required): The search querytopK(optional): Number of results to return (default: 5)threshold(optional): Minimum similarity score 0-1 (default: 0.0)
Example:
{
"query": "information about foxes",
"topK": 3,
"threshold": 0.5
}
3. rag_get_document
Retrieve a specific document by ID.
Parameters:
id(required): The document ID
4. rag_remove_document
Remove a document from the index.
Parameters:
id(required): The document ID to remove
5. rag_list_documents
List all indexed documents with their metadata.
6. rag_get_stats
Get statistics about the RAG index (total documents, chunks, etc.).
7. rag_clear_index
Clear all documents from the index. Use with caution!
How It Works
- Indexing: Documents are chunked (if needed) and sent to OLLAMA to generate embeddings
- Storage: Embeddings and document metadata are stored in an in-memory vector store
- Persistence: The index is automatically saved to a JSON file
- Search: Query text is embedded and compared against stored embeddings using cosine similarity
- Retrieval: The most similar documents are returned with similarity scores
Architecture
┌─────────────┐
│ MCP │
│ Client │
└──────┬──────┘
│
│ MCP Protocol
│
┌──────▼──────────────────────┐
│ RAG MCP Server │
│ ┌──────────────────────┐ │
│ │ RAG Service │ │
│ │ ┌────────┐ ┌──────┐ │ │
│ │ │ Vector │ │OLLAMA│ │ │
│ │ │ Store │ │Client│ │ │
│ │ └────────┘ └──────┘ │ │
│ └──────────────────────┘ │
└─────────────────────────────┘
│
│ HTTP
│
┌──────▼──────┐
│ OLLAMA │
│ Server │
└─────────────┘
Development
Build
npm run build
Watch mode
npm run dev
Inspect mode (for debugging)
npm run inspect
Testing
The project includes comprehensive unit and integration tests.
Run Tests
npm test
Run Tests in Watch Mode
npm run test:watch
Generate Coverage Report
npm run test:coverage
Test Categories
-
Unit Tests: Fast, isolated tests with mocked dependencies
- OLLAMA client tests
- Vector store tests
- RAG service tests
-
Integration Tests: End-to-end tests with real OLLAMA
- Requires OLLAMA running locally
- Tests full indexing and search workflow
- Can be skipped with
SKIP_INTEGRATION_TESTS=true
See for detailed testing documentation.
Recommended OLLAMA Models
For embeddings:
nomic-embed-text(recommended) - 768 dimensionsmxbai-embed-large- 1024 dimensionsall-minilm- 384 dimensions (faster, smaller)
Install a model:
ollama pull nomic-embed-text
Performance Considerations
- In-memory storage: Fast but limited by RAM
- Chunk size: Default 512 tokens with 50 token overlap
- Batch processing: Documents are indexed sequentially
- Persistence: Index is saved after each modification
Limitations
- Vector store is in-memory (not suitable for massive datasets)
- Sequential embedding generation (no batch API support)
- Basic cosine similarity (no advanced filtering)
- No incremental updates (full document re-indexing required)
Future Enhancements
- Add support for file uploads (PDF, DOCX, TXT)
- Implement batch embedding generation
- Add metadata filtering in search
- Support for multiple vector stores
- Add hybrid search (keyword + semantic)
- Implement vector store backends (SQLite, PostgreSQL with pgvector)
License
MIT