mcp-multimodel-comparator by oaslananka - MCP Server

Multi-Model Comparator MCP Server

Enterprise-grade Multi-Model Comparator MCP (Model Context Protocol) server that enables side-by-side comparison of AI models with comprehensive observability, cost tracking, and report generation.

🎯 Features

Core Capabilities

Multi-Model Support: OpenAI GPT, Google Gemini, Ollama local models
MCP Integration: Full Model Context Protocol compliance for Claude Desktop
Comprehensive Metrics: Performance, cost, quality scoring with observability
Report Generation: HTML, PDF, JSON reports with detailed analytics
Smart Caching: Redis-based response caching for cost optimization
Cost Tracking: Real-time cost calculation with provider-specific pricing

Enterprise Features

Observability: Prometheus metrics + OpenTelemetry tracing
Monitoring: Grafana dashboards for real-time insights
Containerization: Docker Compose with full stack deployment
High Availability: Scalable architecture with Redis clustering support
Security: API key management and rate limiting

🚀 Quick Start

Prerequisites

Python 3.11+
Docker & Docker Compose
uv package manager
Claude Desktop (for MCP integration)

Installation

Option 1: Claude Desktop Extension (Recommended)

# Download and install the DXT package
curl -L https://github.com/oaslananka/mcp-multimodel-comparator/releases/latest/download/multimodel-comparator.dxt -o multimodel-comparator.dxt
# Double-click to install in Claude Desktop

Option 2: Development Setup

# Clone repository
git clone https://github.com/oaslananka/mcp-multimodel-comparator.git
cd multimodel-comparator

# Install dependencies
cd servers/comparator_python
uv sync

# Start development environment
docker-compose -f ../../infra/docker/compose.yaml up -d

Configuration

API Keys: Set environment variables

export OPENAI_API_KEY="your-openai-key"
export GEMINI_API_KEY="your-gemini-key"
# Ollama runs locally, no key needed

Claude Desktop Integration: Add to your config

{
  "mcpServers": {
    "comparator": {
      "command": "uv",
      "args": ["run", "python", "-m", "comparator.server"],
      "cwd": "/path/to/multimodel-comparator/servers/comparator_python",
      "env": {
        "OPENAI_API_KEY": "your-key",
        "GEMINI_API_KEY": "your-key"
      }
    }
  }
}

💡 Usage Examples

Basic Comparison

// In Claude Desktop
compare_models({
  prompt: "Explain quantum computing",
  models: ["gpt-4", "gemini-pro", "llama3"],
  options: {
    temperature: 0.7,
    max_tokens: 500
  }
})

Batch Analysis

run_all({
  prompts: [
    "Summarize this document",
    "Generate creative content", 
    "Solve math problems"
  ],
  models: ["gpt-4", "gemini-pro"],
  report_format: "html"
})

Quality Scoring

score_responses({
  responses: [...], // From previous comparison
  criteria: ["accuracy", "creativity", "coherence"],
  weights: [0.4, 0.3, 0.3]
})

📊 Monitoring & Observability

Dashboards

Grafana: http://localhost:3000 (admin/admin)
- Model Performance Dashboard
- Cost Analytics Dashboard
- Usage Patterns Dashboard

Metrics

Prometheus: http://localhost:9090
- Request latency, success rates
- Cost per model, token usage
- Cache hit rates, error counts

Example Metrics

# Average response time by model
rate(mcp_comparator_response_time_sum[5m]) / rate(mcp_comparator_response_time_count[5m])

# Cost per hour by provider
sum(rate(mcp_comparator_cost_total[1h])) by (provider)

# Cache efficiency
rate(mcp_comparator_cache_hits_total[5m]) / rate(mcp_comparator_cache_requests_total[5m])

🏗️ Architecture

├── servers/comparator_python/     # MCP Server Implementation
│   ├── src/comparator/
│   │   ├── server.py             # Main MCP server
│   │   ├── schema.py             # Pydantic data models
│   │   ├── costs.py              # Cost calculation engine
│   │   ├── cache.py              # Redis caching layer
│   │   ├── metrics.py            # Prometheus metrics
│   │   └── reporters/            # Report generators
│   └── pyproject.toml            # Package configuration
├── runners/                      # Model API integrations
│   ├── openai_runner.py         # OpenAI GPT integration
│   ├── gemini_runner.py         # Google Gemini integration
│   └── ollama_runner.py         # Local Ollama integration
├── configs/                     # Configuration files
│   ├── pricing.yaml            # Provider pricing data
│   └── models.yaml             # Model specifications
├── infra/docker/               # Infrastructure
│   ├── compose.yaml           # Full stack deployment
│   └── Dockerfile.server      # MCP server container
├── metrics/                   # Monitoring configuration
│   ├── prometheus.yml        # Prometheus config
│   └── grafana/             # Grafana dashboards
└── clients/claude-desktop/   # Claude Desktop integration
    ├── extension.json       # MCP extension manifest
    └── pack_dxt.ps1        # DXT packaging script

🔧 Development

Setup Development Environment

# Install development dependencies
cd servers/comparator_python
uv sync --dev

# Run linting
uv run ruff check src/ ../../runners/
uv run mypy src/

# Run tests
uv run pytest ../../tests/ -v

Testing

# Unit tests
uv run pytest tests/test_runners.py -v

# Contract tests (MCP compliance)
uv run pytest tests/test_contract.py -v

# Integration tests with Docker
docker-compose -f infra/docker/compose.yaml up -d
uv run pytest tests/test_integration.py -v

Adding New Models

Create runner in runners/your_provider_runner.py
Implement BaseRunner interface
Add pricing to configs/pricing.yaml
Add model config to configs/models.yaml
Update server registration in server.py

Custom Reporters

from comparator.reporters.base import BaseReporter

class CustomReporter(BaseReporter):
    def generate(self, results: List[RunResult]) -> str:
        # Your custom report logic
        return custom_report_content

📋 API Reference

MCP Tools

`run_all`

Compares multiple models with given prompts

Input: ComparisonRequest with prompts, models, options
Output: ComparisonResult with responses and metadata

`score_responses`

Scores model responses based on quality criteria

Input: ScoringRequest with responses and criteria
Output: ScoringResult with numerical scores

`generate_report`

Generates comprehensive comparison reports

Input: ReportRequest with results and format
Output: ReportResult with generated report

MCP Resources

`config://models`

Lists available models and their specifications

`config://pricing`

Current pricing information for all providers

`metrics://prometheus`

Real-time metrics in Prometheus format

🔒 Security

API Key Management

Environment variables only (never in code)
Separate keys per environment
Regular rotation recommended

Rate Limiting

Provider-specific rate limits enforced
Exponential backoff for failures
Circuit breaker pattern implemented

Data Privacy

No prompt/response logging by default
Optional anonymous usage metrics
Full GDPR compliance mode available

🚀 Deployment

Production Docker Setup

# Production environment
docker-compose -f infra/docker/compose.yaml up -d

# With custom configuration
REDIS_URL=redis://prod-redis:6379 \
PROMETHEUS_URL=http://prod-prometheus:9090 \
docker-compose up -d

Kubernetes Deployment

# See infra/k8s/ for full Kubernetes manifests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-comparator
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: comparator
        image: mcp-comparator:latest
        env:
        - name: REDIS_URL
          value: "redis://redis-service:6379"

📈 Performance

Benchmarks

Latency: <200ms for cached responses
Throughput: 100+ concurrent requests
Memory: ~100MB base, scales with cache size
CPU: Minimal overhead, I/O bound

Scaling Recommendations

Horizontal: Multiple server instances behind load balancer
Caching: Redis cluster for high availability
Database: Separate metrics storage for large deployments

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

Development Guidelines

Follow PEP 8 style guide
Add type hints for all functions
Include tests for new features
Update documentation

📄 License

This project is licensed under the MIT License - see the file for details.

🆘 Support

Documentation

Community

GitHub Issues for bug reports
GitHub Discussions for questions
Discord community channel

Enterprise Support

🎉 Acknowledgments

Model Context Protocol team
FastMCP framework
OpenAI, Google, and Ollama teams for model APIs
Community contributors and testers

Made with ❤️ for the AI development community