oaslananka/mcp-multimodel-comparator
If you are the rightful owner of mcp-multimodel-comparator and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Multi-Model Comparator MCP Server is an enterprise-grade solution designed for side-by-side comparison of AI models, offering comprehensive observability, cost tracking, and report generation.
Multi-Model Comparator MCP Server
Enterprise-grade Multi-Model Comparator MCP (Model Context Protocol) server that enables side-by-side comparison of AI models with comprehensive observability, cost tracking, and report generation.
šÆ Features
Core Capabilities
- Multi-Model Support: OpenAI GPT, Google Gemini, Ollama local models
- MCP Integration: Full Model Context Protocol compliance for Claude Desktop
- Comprehensive Metrics: Performance, cost, quality scoring with observability
- Report Generation: HTML, PDF, JSON reports with detailed analytics
- Smart Caching: Redis-based response caching for cost optimization
- Cost Tracking: Real-time cost calculation with provider-specific pricing
Enterprise Features
- Observability: Prometheus metrics + OpenTelemetry tracing
- Monitoring: Grafana dashboards for real-time insights
- Containerization: Docker Compose with full stack deployment
- High Availability: Scalable architecture with Redis clustering support
- Security: API key management and rate limiting
š Quick Start
Prerequisites
- Python 3.11+
- Docker & Docker Compose
- uv package manager
- Claude Desktop (for MCP integration)
Installation
Option 1: Claude Desktop Extension (Recommended)
# Download and install the DXT package
curl -L https://github.com/oaslananka/mcp-multimodel-comparator/releases/latest/download/multimodel-comparator.dxt -o multimodel-comparator.dxt
# Double-click to install in Claude Desktop
Option 2: Development Setup
# Clone repository
git clone https://github.com/oaslananka/mcp-multimodel-comparator.git
cd multimodel-comparator
# Install dependencies
cd servers/comparator_python
uv sync
# Start development environment
docker-compose -f ../../infra/docker/compose.yaml up -d
Configuration
- API Keys: Set environment variables
export OPENAI_API_KEY="your-openai-key"
export GEMINI_API_KEY="your-gemini-key"
# Ollama runs locally, no key needed
- Claude Desktop Integration: Add to your config
{
"mcpServers": {
"comparator": {
"command": "uv",
"args": ["run", "python", "-m", "comparator.server"],
"cwd": "/path/to/multimodel-comparator/servers/comparator_python",
"env": {
"OPENAI_API_KEY": "your-key",
"GEMINI_API_KEY": "your-key"
}
}
}
}
š” Usage Examples
Basic Comparison
// In Claude Desktop
compare_models({
prompt: "Explain quantum computing",
models: ["gpt-4", "gemini-pro", "llama3"],
options: {
temperature: 0.7,
max_tokens: 500
}
})
Batch Analysis
run_all({
prompts: [
"Summarize this document",
"Generate creative content",
"Solve math problems"
],
models: ["gpt-4", "gemini-pro"],
report_format: "html"
})
Quality Scoring
score_responses({
responses: [...], // From previous comparison
criteria: ["accuracy", "creativity", "coherence"],
weights: [0.4, 0.3, 0.3]
})
š Monitoring & Observability
Dashboards
- Grafana: http://localhost:3000 (admin/admin)
- Model Performance Dashboard
- Cost Analytics Dashboard
- Usage Patterns Dashboard
Metrics
- Prometheus: http://localhost:9090
- Request latency, success rates
- Cost per model, token usage
- Cache hit rates, error counts
Example Metrics
# Average response time by model
rate(mcp_comparator_response_time_sum[5m]) / rate(mcp_comparator_response_time_count[5m])
# Cost per hour by provider
sum(rate(mcp_comparator_cost_total[1h])) by (provider)
# Cache efficiency
rate(mcp_comparator_cache_hits_total[5m]) / rate(mcp_comparator_cache_requests_total[5m])
šļø Architecture
āāā servers/comparator_python/ # MCP Server Implementation
ā āāā src/comparator/
ā ā āāā server.py # Main MCP server
ā ā āāā schema.py # Pydantic data models
ā ā āāā costs.py # Cost calculation engine
ā ā āāā cache.py # Redis caching layer
ā ā āāā metrics.py # Prometheus metrics
ā ā āāā reporters/ # Report generators
ā āāā pyproject.toml # Package configuration
āāā runners/ # Model API integrations
ā āāā openai_runner.py # OpenAI GPT integration
ā āāā gemini_runner.py # Google Gemini integration
ā āāā ollama_runner.py # Local Ollama integration
āāā configs/ # Configuration files
ā āāā pricing.yaml # Provider pricing data
ā āāā models.yaml # Model specifications
āāā infra/docker/ # Infrastructure
ā āāā compose.yaml # Full stack deployment
ā āāā Dockerfile.server # MCP server container
āāā metrics/ # Monitoring configuration
ā āāā prometheus.yml # Prometheus config
ā āāā grafana/ # Grafana dashboards
āāā clients/claude-desktop/ # Claude Desktop integration
āāā extension.json # MCP extension manifest
āāā pack_dxt.ps1 # DXT packaging script
š§ Development
Setup Development Environment
# Install development dependencies
cd servers/comparator_python
uv sync --dev
# Run linting
uv run ruff check src/ ../../runners/
uv run mypy src/
# Run tests
uv run pytest ../../tests/ -v
Testing
# Unit tests
uv run pytest tests/test_runners.py -v
# Contract tests (MCP compliance)
uv run pytest tests/test_contract.py -v
# Integration tests with Docker
docker-compose -f infra/docker/compose.yaml up -d
uv run pytest tests/test_integration.py -v
Adding New Models
- Create runner in
runners/your_provider_runner.py
- Implement
BaseRunner
interface - Add pricing to
configs/pricing.yaml
- Add model config to
configs/models.yaml
- Update server registration in
server.py
Custom Reporters
from comparator.reporters.base import BaseReporter
class CustomReporter(BaseReporter):
def generate(self, results: List[RunResult]) -> str:
# Your custom report logic
return custom_report_content
š API Reference
MCP Tools
run_all
Compares multiple models with given prompts
- Input:
ComparisonRequest
with prompts, models, options - Output:
ComparisonResult
with responses and metadata
score_responses
Scores model responses based on quality criteria
- Input:
ScoringRequest
with responses and criteria - Output:
ScoringResult
with numerical scores
generate_report
Generates comprehensive comparison reports
- Input:
ReportRequest
with results and format - Output:
ReportResult
with generated report
MCP Resources
config://models
Lists available models and their specifications
config://pricing
Current pricing information for all providers
metrics://prometheus
Real-time metrics in Prometheus format
š Security
API Key Management
- Environment variables only (never in code)
- Separate keys per environment
- Regular rotation recommended
Rate Limiting
- Provider-specific rate limits enforced
- Exponential backoff for failures
- Circuit breaker pattern implemented
Data Privacy
- No prompt/response logging by default
- Optional anonymous usage metrics
- Full GDPR compliance mode available
š Deployment
Production Docker Setup
# Production environment
docker-compose -f infra/docker/compose.yaml up -d
# With custom configuration
REDIS_URL=redis://prod-redis:6379 \
PROMETHEUS_URL=http://prod-prometheus:9090 \
docker-compose up -d
Kubernetes Deployment
# See infra/k8s/ for full Kubernetes manifests
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-comparator
spec:
replicas: 3
template:
spec:
containers:
- name: comparator
image: mcp-comparator:latest
env:
- name: REDIS_URL
value: "redis://redis-service:6379"
š Performance
Benchmarks
- Latency: <200ms for cached responses
- Throughput: 100+ concurrent requests
- Memory: ~100MB base, scales with cache size
- CPU: Minimal overhead, I/O bound
Scaling Recommendations
- Horizontal: Multiple server instances behind load balancer
- Caching: Redis cluster for high availability
- Database: Separate metrics storage for large deployments
š¤ Contributing
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/amazing-feature
) - Open Pull Request
Development Guidelines
- Follow PEP 8 style guide
- Add type hints for all functions
- Include tests for new features
- Update documentation
š License
This project is licensed under the MIT License - see the file for details.
š Support
Documentation
Community
- GitHub Issues for bug reports
- GitHub Discussions for questions
- Discord community channel
Enterprise Support
Contact us for enterprise support, custom integrations, and SLA options.
š Acknowledgments
- Model Context Protocol team
- FastMCP framework
- OpenAI, Google, and Ollama teams for model APIs
- Community contributors and testers
Made with ā¤ļø for the AI development community