CAG-MCP by devdotbo - MCP Server

CAG MCP Server

A high-performance Cache-Augmented Generation (CAG) server implementing the Model Context Protocol (MCP) standard.

What is CAG?

Cache-Augmented Generation (CAG) is an alternative to RAG (Retrieval-Augmented Generation) that preloads documents into the model's context window or cache instead of searching a vector database at runtime. This approach offers:

Faster response times - No vector search overhead
Better context coherence - All relevant documents are already in memory
Simpler architecture - No need for embeddings or vector databases
Lower latency - Direct access to cached content

Features

🚀 High Performance - In-memory caching with sub-millisecond access times
🔌 MCP Compatible - Full implementation of Model Context Protocol
📚 Smart Caching - LRU eviction, size limits, and automatic content management
🔍 Advanced Search - Full-text search across cached documents
🛠️ Dual Implementation - Both Python (FastMCP) and TypeScript versions
🧪 Thoroughly Tested - TDD approach with real integration tests
🔒 Production Ready - Error handling, logging, and monitoring

Quick Start

Python Version

# Install dependencies
uv sync

# Run the server
uv run python -m cag_mcp_server

# Run tests
uv run pytest

TypeScript Version

# Install dependencies
pnpm install

# Build and run
pnpm build
pnpm start

# Run tests
pnpm test

Integration with Claude Desktop

Add to your Claude Desktop config:

{
  "mcpServers": {
    "cag-server": {
      "command": "uv",
      "args": ["run", "python", "-m", "cag_mcp_server"],
      "cwd": "/path/to/cag-mcp-server"
    }
  }
}

Restart Claude Desktop
The CAG server will appear in the MCP menu

Architecture

The CAG MCP server consists of:

Cache Manager - Handles document storage with size limits and LRU eviction
Document Loader - Loads and validates documents at startup
MCP Server - Implements the Model Context Protocol with resources, tools, and prompts
Search Engine - Provides fast full-text search across cached content

Configuration

Create a config.json file:

{
  "cache": {
    "maxSizeMB": 100,
    "evictionPolicy": "lru",
    "preloadDirectory": "./documents"
  },
  "server": {
    "logLevel": "info",
    "enableMetrics": true
  }
}

MCP Capabilities

Resources

Each cached document is exposed as an MCP resource with URI pattern cache://filename.

Tools

search_cache - Search across all cached documents
get_document - Retrieve a specific document
cache_stats - Get cache statistics and performance metrics

Prompts

Pre-built prompt templates for common query patterns.

Development

# Setup development environment
./scripts/setup-dev.sh

# Run all tests
./scripts/integration-test.sh

# Verify MCP compatibility
./scripts/verify-mcp.sh

Testing

This project uses Test-Driven Development (TDD) without mocks. All tests use real implementations:

# Python tests
uv run pytest -xvs

# TypeScript tests
pnpm test

# Integration tests
./scripts/test-server.sh

Performance

Document access: < 1ms
Search latency: < 10ms for 1000 documents
Memory efficiency: ~1.2x document size
Startup time: < 5s for 100MB cache

Contributing

Read CLAUDE.md for development guidelines
Follow TDD approach - write tests first
Ensure all tests pass before submitting PR
Verify with real MCP client integration

License

MIT License - see LICENSE file for details