mcp-multimodel-comparator

oaslananka/mcp-multimodel-comparator

3.2

If you are the rightful owner of mcp-multimodel-comparator and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Multi-Model Comparator MCP Server is an enterprise-grade solution designed for side-by-side comparison of AI models, offering comprehensive observability, cost tracking, and report generation.

Tools
3
Resources
0
Prompts
0

Multi-Model Comparator MCP Server

Python 3.11+ MCP Protocol

Enterprise-grade Multi-Model Comparator MCP (Model Context Protocol) server that enables side-by-side comparison of AI models with comprehensive observability, cost tracking, and report generation.

šŸŽÆ Features

Core Capabilities

  • Multi-Model Support: OpenAI GPT, Google Gemini, Ollama local models
  • MCP Integration: Full Model Context Protocol compliance for Claude Desktop
  • Comprehensive Metrics: Performance, cost, quality scoring with observability
  • Report Generation: HTML, PDF, JSON reports with detailed analytics
  • Smart Caching: Redis-based response caching for cost optimization
  • Cost Tracking: Real-time cost calculation with provider-specific pricing

Enterprise Features

  • Observability: Prometheus metrics + OpenTelemetry tracing
  • Monitoring: Grafana dashboards for real-time insights
  • Containerization: Docker Compose with full stack deployment
  • High Availability: Scalable architecture with Redis clustering support
  • Security: API key management and rate limiting

šŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • Docker & Docker Compose
  • uv package manager
  • Claude Desktop (for MCP integration)

Installation

Option 1: Claude Desktop Extension (Recommended)
# Download and install the DXT package
curl -L https://github.com/oaslananka/mcp-multimodel-comparator/releases/latest/download/multimodel-comparator.dxt -o multimodel-comparator.dxt
# Double-click to install in Claude Desktop
Option 2: Development Setup
# Clone repository
git clone https://github.com/oaslananka/mcp-multimodel-comparator.git
cd multimodel-comparator

# Install dependencies
cd servers/comparator_python
uv sync

# Start development environment
docker-compose -f ../../infra/docker/compose.yaml up -d

Configuration

  1. API Keys: Set environment variables
export OPENAI_API_KEY="your-openai-key"
export GEMINI_API_KEY="your-gemini-key"
# Ollama runs locally, no key needed
  1. Claude Desktop Integration: Add to your config
{
  "mcpServers": {
    "comparator": {
      "command": "uv",
      "args": ["run", "python", "-m", "comparator.server"],
      "cwd": "/path/to/multimodel-comparator/servers/comparator_python",
      "env": {
        "OPENAI_API_KEY": "your-key",
        "GEMINI_API_KEY": "your-key"
      }
    }
  }
}

šŸ’” Usage Examples

Basic Comparison

// In Claude Desktop
compare_models({
  prompt: "Explain quantum computing",
  models: ["gpt-4", "gemini-pro", "llama3"],
  options: {
    temperature: 0.7,
    max_tokens: 500
  }
})

Batch Analysis

run_all({
  prompts: [
    "Summarize this document",
    "Generate creative content", 
    "Solve math problems"
  ],
  models: ["gpt-4", "gemini-pro"],
  report_format: "html"
})

Quality Scoring

score_responses({
  responses: [...], // From previous comparison
  criteria: ["accuracy", "creativity", "coherence"],
  weights: [0.4, 0.3, 0.3]
})

šŸ“Š Monitoring & Observability

Dashboards

  • Grafana: http://localhost:3000 (admin/admin)
    • Model Performance Dashboard
    • Cost Analytics Dashboard
    • Usage Patterns Dashboard

Metrics

  • Prometheus: http://localhost:9090
    • Request latency, success rates
    • Cost per model, token usage
    • Cache hit rates, error counts

Example Metrics

# Average response time by model
rate(mcp_comparator_response_time_sum[5m]) / rate(mcp_comparator_response_time_count[5m])

# Cost per hour by provider
sum(rate(mcp_comparator_cost_total[1h])) by (provider)

# Cache efficiency
rate(mcp_comparator_cache_hits_total[5m]) / rate(mcp_comparator_cache_requests_total[5m])

šŸ—ļø Architecture

ā”œā”€ā”€ servers/comparator_python/     # MCP Server Implementation
│   ā”œā”€ā”€ src/comparator/
│   │   ā”œā”€ā”€ server.py             # Main MCP server
│   │   ā”œā”€ā”€ schema.py             # Pydantic data models
│   │   ā”œā”€ā”€ costs.py              # Cost calculation engine
│   │   ā”œā”€ā”€ cache.py              # Redis caching layer
│   │   ā”œā”€ā”€ metrics.py            # Prometheus metrics
│   │   └── reporters/            # Report generators
│   └── pyproject.toml            # Package configuration
ā”œā”€ā”€ runners/                      # Model API integrations
│   ā”œā”€ā”€ openai_runner.py         # OpenAI GPT integration
│   ā”œā”€ā”€ gemini_runner.py         # Google Gemini integration
│   └── ollama_runner.py         # Local Ollama integration
ā”œā”€ā”€ configs/                     # Configuration files
│   ā”œā”€ā”€ pricing.yaml            # Provider pricing data
│   └── models.yaml             # Model specifications
ā”œā”€ā”€ infra/docker/               # Infrastructure
│   ā”œā”€ā”€ compose.yaml           # Full stack deployment
│   └── Dockerfile.server      # MCP server container
ā”œā”€ā”€ metrics/                   # Monitoring configuration
│   ā”œā”€ā”€ prometheus.yml        # Prometheus config
│   └── grafana/             # Grafana dashboards
└── clients/claude-desktop/   # Claude Desktop integration
    ā”œā”€ā”€ extension.json       # MCP extension manifest
    └── pack_dxt.ps1        # DXT packaging script

šŸ”§ Development

Setup Development Environment

# Install development dependencies
cd servers/comparator_python
uv sync --dev

# Run linting
uv run ruff check src/ ../../runners/
uv run mypy src/

# Run tests
uv run pytest ../../tests/ -v

Testing

# Unit tests
uv run pytest tests/test_runners.py -v

# Contract tests (MCP compliance)
uv run pytest tests/test_contract.py -v

# Integration tests with Docker
docker-compose -f infra/docker/compose.yaml up -d
uv run pytest tests/test_integration.py -v

Adding New Models

  1. Create runner in runners/your_provider_runner.py
  2. Implement BaseRunner interface
  3. Add pricing to configs/pricing.yaml
  4. Add model config to configs/models.yaml
  5. Update server registration in server.py

Custom Reporters

from comparator.reporters.base import BaseReporter

class CustomReporter(BaseReporter):
    def generate(self, results: List[RunResult]) -> str:
        # Your custom report logic
        return custom_report_content

šŸ“‹ API Reference

MCP Tools

run_all

Compares multiple models with given prompts

  • Input: ComparisonRequest with prompts, models, options
  • Output: ComparisonResult with responses and metadata
score_responses

Scores model responses based on quality criteria

  • Input: ScoringRequest with responses and criteria
  • Output: ScoringResult with numerical scores
generate_report

Generates comprehensive comparison reports

  • Input: ReportRequest with results and format
  • Output: ReportResult with generated report

MCP Resources

config://models

Lists available models and their specifications

config://pricing

Current pricing information for all providers

metrics://prometheus

Real-time metrics in Prometheus format

šŸ”’ Security

API Key Management

  • Environment variables only (never in code)
  • Separate keys per environment
  • Regular rotation recommended

Rate Limiting

  • Provider-specific rate limits enforced
  • Exponential backoff for failures
  • Circuit breaker pattern implemented

Data Privacy

  • No prompt/response logging by default
  • Optional anonymous usage metrics
  • Full GDPR compliance mode available

šŸš€ Deployment

Production Docker Setup

# Production environment
docker-compose -f infra/docker/compose.yaml up -d

# With custom configuration
REDIS_URL=redis://prod-redis:6379 \
PROMETHEUS_URL=http://prod-prometheus:9090 \
docker-compose up -d

Kubernetes Deployment

# See infra/k8s/ for full Kubernetes manifests
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-comparator
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: comparator
        image: mcp-comparator:latest
        env:
        - name: REDIS_URL
          value: "redis://redis-service:6379"

šŸ“ˆ Performance

Benchmarks

  • Latency: <200ms for cached responses
  • Throughput: 100+ concurrent requests
  • Memory: ~100MB base, scales with cache size
  • CPU: Minimal overhead, I/O bound

Scaling Recommendations

  • Horizontal: Multiple server instances behind load balancer
  • Caching: Redis cluster for high availability
  • Database: Separate metrics storage for large deployments

šŸ¤ Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

Development Guidelines

  • Follow PEP 8 style guide
  • Add type hints for all functions
  • Include tests for new features
  • Update documentation

šŸ“„ License

This project is licensed under the MIT License - see the file for details.

šŸ†˜ Support

Documentation

Community

  • GitHub Issues for bug reports
  • GitHub Discussions for questions
  • Discord community channel

Enterprise Support

Contact us for enterprise support, custom integrations, and SLA options.

šŸŽ‰ Acknowledgments


Made with ā¤ļø for the AI development community