metamcp-rag-server

cordlesssteve/metamcp-rag-server

3.2

If you are the rightful owner of metamcp-rag-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The metamcp-rag-server is a specialized server designed to enhance Retrieval-Augmented Generation (RAG) processes by utilizing metadata-driven context.

MetaMCP RAG Server

Attribution

This repository is based on metatool-ai/metamcp - "A specification for Model Context Protocol (MCP) servers that can dynamically discover and connect to other MCP servers."

Important: This repository is intentionally disconnected from the upstream to prevent accidental pull requests. Please submit contributions to the original project.

Original Project

Overview

The MetaMCP RAG Server solves the context bloat problem that occurs when Claude Code starts up with many MCP servers. Instead of loading all MCP tools into context immediately, this server provides lazy loading and on-demand tool discovery to keep your context window clean and efficient.

The Problem

  • Multiple MCP servers consume significant context tokens on startup
  • Hundreds of tool definitions loaded whether you need them or not
  • Context window filled with unused tool schemas
  • Slower startup times and reduced available context for actual work

The Solution

The MetaMCP RAG Server acts as a smart proxy that:

  • Prevents context bloat by not auto-loading all MCP tools on startup
  • Discovers tools on-demand only when you need them
  • Uses RAG (Retrieval-Augmented Generation) to find relevant tools for your queries
  • Lazy loads MCP servers and their tools as needed
  • Maintains clean context by exposing only essential tools by default

Key Benefits

  • Startup Optimization: Dramatically faster Claude Code startup with minimal context usage
  • Context Management: Keep your 200k token context window available for actual work
  • Smart Discovery: RAG-powered tool selection finds relevant tools without loading everything
  • Lazy Loading: MCP servers start only when their tools are needed
  • Query-Aware Filtering: Semantic tool selection based on what you're actually trying to do

Features

  • Context Bloat Prevention: Minimal tool exposure on startup to preserve context
  • On-Demand Tool Discovery: Tools are discovered and loaded only when needed
  • RAG-Powered Selection: Semantic search to filter relevant tools for each query
  • Multi-Server Aggregation: Manages multiple MCP servers behind a single interface
  • Lazy Server Initialization: MCP servers start on first tool request, not at startup
  • Smart Tool Routing: Routes tool calls to appropriate underlying servers
  • Graceful Degradation: Falls back to all tools if RAG service is unavailable

Installation

npm install
npm run build

Configuration

Add to your Claude Code MCP configuration:

{
  "mcpServers": {
    "metamcp-rag": {
      "command": "node",
      "args": ["./dist/index.js"],
      "env": {
        "RAG_MAX_DISCOVERY_TOOLS": "30",
        "RAG_MAX_ESSENTIAL_TOOLS": "10"
      }
    }
  }
}

Environment Variables

  • RAG_MAX_DISCOVERY_TOOLS (default: 30): Maximum tools returned by discover_tools. Higher values provide more options but increase context usage.
  • RAG_MAX_ESSENTIAL_TOOLS (default: 10): Maximum essential tools exposed by default to minimize startup context bloat.
  • RAG_SERVICE_HOST (default: 127.0.0.1): RAG service host address.
  • RAG_SERVICE_PORT (default: 8002): RAG service port.

Context Management Strategy

Startup Behavior

  1. Minimal Tool Exposure: Only essential tools (≤10) loaded into context on startup
  2. Lazy Server Discovery: MCP servers are discovered but not immediately connected
  3. Clean Context Window: Maximum context preserved for your actual work

On-Demand Discovery

  1. Query Analysis: When you use discover_tools, your query is analyzed
  2. RAG Filtering: Semantic search finds relevant tools from all available servers
  3. Lazy Loading: Only relevant servers are started and connected
  4. Context Efficiency: Tools loaded only when they match your needs

Tool Routing

  1. Smart Routing: Tool calls automatically routed to appropriate MCP server
  2. Connection Management: Maintains persistent connections to active servers
  3. Error Handling: Graceful degradation if servers become unavailable

Supported MCP Servers

The server automatically manages connections to:

Core Knowledge & Analytics

  • memory: Knowledge graph and entity relationship management
  • document-organizer: Document processing and organization
  • claude-telemetry: Usage analytics and telemetry tracking

Development Workflow

  • mitosis: Session handoff and context transfer
  • github: GitHub repository and issue management
  • security-scanner: Package and repository security analysis
  • git: Git repository operations and version control

RAG Integration

The server integrates with an external RAG service for intelligent tool selection:

  • RAG Service URL: http://localhost:8002
  • Tool Selection: Uses semantic similarity to filter relevant tools
  • Query Extraction: Automatically extracts query context from tool arguments
  • Fallback Behavior: Returns all tools if RAG service is unavailable

RAG Service Endpoints

  • GET /health - Health check
  • POST /select-tools - Semantic tool selection

Query Extraction Patterns

The server extracts queries from various argument patterns:

  • Direct query arguments: query, question, content, text
  • Action-based queries: description, expression, filename
  • Composite queries: Joins all string values from arguments

How It Works

1. Startup Optimization

// Minimal context consumption on startup
const essentialTools = [
  'discover_tools',  // For finding tools when needed
  'health_check',    // For service monitoring
  // ... maximum 10 essential tools
];

2. Lazy Tool Discovery

// Tools discovered on-demand via RAG
const response = await axios.post('/select-tools', {
  query: userQuery,
  available_tools: allAvailableToolNames,
  limit: 30,
  similarity_threshold: 0.1
});

3. Smart Server Management

// Servers started only when their tools are needed
if (!serverConnections[serverName]) {
  await startMCPServer(serverName);
}
const result = await sendMCPRequest(connection, 'tools/call', args);

Performance Benefits

  • Startup Speed: 5-10x faster Claude Code startup with clean context
  • Context Preservation: 95%+ of context window available for actual work
  • Memory Efficiency: Servers loaded only when needed
  • Response Time: RAG service provides sub-second tool discovery

Development

# Install dependencies
npm install

# Build project
npm run build

# Start in development mode
npm run dev

# Test server directly
echo '{"jsonrpc": "2.0", "id": 1, "method": "tools/list"}' | node dist/index.js

Troubleshooting

RAG Service Issues

# Check if RAG service is running
curl http://localhost:8002/health

# Start RAG service manually
cd /path/to/metaMCP-RAG/rag-tool-retriever
python rag_service.py

Context Bloat Detection

# Monitor context usage before/after
token-analyzer-mcp analyze

# Check startup time improvement
time claude-code --startup-benchmark

License

MIT