universal-ai-mcp-server by Daniel-DDV - MCP Server

Universal AI MCP Server v3.0

A comprehensive MCP (Model Context Protocol) server that provides intelligent access to multiple AI providers and models including GPT-5, Grok-4, Gemini 2.5, Groq, and local Ollama models - with advanced routing, cost optimization, and 25+ specialized tools.

🆕 NEW in v3.0: Multi-provider support with xAI Grok, Google Gemini, Groq inference, cost tracking, and intelligent model routing across all providers.

🚀 Key Features

🌐 Universal Multi-Provider Support

Azure OpenAI: GPT-5, GPT-5-Chat, GPT-4o, o3, o1-preview, o1-mini, DALL-E 3
OpenAI Direct: All OpenAI models with direct API access
xAI Grok: Grok-4 (with real-time search), Grok-3-Mini, Grok-2-Image
Google AI: Gemini 2.5 Pro/Flash/Flash-Thinking (1M context)
Groq: Ultra-fast inference (1,500+ tokens/sec) for Llama and GPT models
Ollama: Local models for complete privacy (llama3.2, codellama, mistral, qwen2.5)

🧠 Intelligent Model Routing

Smart Selection: Automatically chooses optimal model based on task complexity, cost, speed, and quality requirements
Multi-Strategy Routing: Cost-optimized, speed-optimized, quality-first, privacy-first strategies
Fallback Chain: Automatic failover between providers for reliability
Real-time Optimization: Learns from usage patterns to improve selections

💰 Advanced Cost Management

Real-time Cost Tracking: Monitor spending across all providers
Budget Controls: Daily/monthly spending limits with alerts
Cost Optimization: Automatically route to cost-effective models when appropriate
Usage Analytics: Detailed reports with optimization recommendations

⚡ Performance & Capabilities

Ultra-fast Responses: Sub-second first tokens with Groq infrastructure
Massive Context: Up to 1M tokens with Gemini models
Real-time Search: Live web search with Grok-4
Multimodal Support: Image analysis and generation across providers
Session Management: Persistent conversations with automatic cleanup

🛡️ Privacy & Security

Privacy Levels: Public, private, and local-only processing options
Data Retention: Configurable session and data retention policies
Local Processing: Complete privacy with Ollama integration
Secure API Handling: Best practices for API key management

📊 Model Comparison Matrix

Provider	Model	Context	Speed	Cost/1M	Best For	Capabilities
Azure/OpenAI	GPT-5	272k	Slow	$10/$30	Complex reasoning	Text, Code, Reasoning
Azure/OpenAI	GPT-5-Chat	128k	Fast	$5/$15	Conversations	Text, Multimodal, Functions
Azure/OpenAI	GPT-4o	128k	Fast	$5/$15	Multimodal	Text, Vision, Functions
xAI	Grok-4	128k	Medium	$3/$15	Real-time research	Text, Search, Reasoning
xAI	Grok-3-Mini	32k	Fast	$0.30/$0.50	Budget tasks	Text, Basic reasoning
Google	Gemini 2.5 Pro	1M	Medium	$2.50/$10	Large contexts	Multimodal, Reasoning
Google	Gemini 2.5 Flash	1M	Very Fast	$0.075/$0.30	Speed + quality	Multimodal, Fast
Groq	Llama 3.3 70B	131k	Ultra Fast	$0.59/$0.79	Real-time chat	Text, Code
Groq	GPT-OSS 120B	32k	Ultra Fast	$1.25/$1.25	Quality + speed	Text, Reasoning
Ollama	Llama 3.2	131k	Fast	Free	Private/local	Text, Code, Local
Ollama	Code Llama	16k	Fast	Free	Code generation	Code, Local
Ollama	Mistral 7B	32k	Very Fast	Free	Efficient chat	Text, Multilingual
Ollama	Qwen 2.5	131k	Medium	Free	Advanced reasoning	Text, Code, Reasoning

📦 Installation

Clone the repository:

git clone https://github.com/yourusername/gpt5-mcp-agent.git
cd gpt5-mcp-agent

Install dependencies:

npm install

Build the TypeScript code:

npm run build

⚙️ Configuration

The server supports both Azure OpenAI and standard OpenAI API. Configure via environment variables in .env:

Option 1: Azure OpenAI

# Provider Selection
API_PROVIDER=azure

# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2025-01-01-preview

# Azure Deployment Names (customize to match your deployments)
AZURE_OPENAI_DEPLOYMENT_GPT5=your-gpt5-deployment-name
AZURE_OPENAI_DEPLOYMENT_GPT5_CHAT=your-gpt5-chat-deployment-name

Option 2: Standard OpenAI API

# Provider Selection
API_PROVIDER=openai

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Custom base URL (leave empty for default)
OPENAI_BASE_URL=
# Optional: Organization ID
OPENAI_ORG_ID=

# OpenAI Model Names
OPENAI_MODEL_GPT5=gpt-5
OPENAI_MODEL_GPT5_CHAT=gpt-5-chat

Option 3: Universal Multi-Provider Setup

# Multiple Providers Enabled
PROVIDERS_ENABLED=azure,xai,google,groq,ollama

# xAI Grok Configuration
XAI_API_KEY=your_xai_api_key_here
XAI_ENABLE_LIVE_SEARCH=true
XAI_SEARCH_DEPTH=deep

# Google AI Configuration  
GOOGLE_AI_API_KEY=your_google_api_key_here
GOOGLE_PROJECT_ID=your_project_id
GOOGLE_LOCATION=us-central1

# Groq Configuration
GROQ_API_KEY=your_groq_api_key_here
GROQ_PRIORITY_TIER=paid

# Ollama Local Models
OLLAMA_HOST=http://localhost:11434
OLLAMA_AUTO_INSTALL=true
OLLAMA_DEFAULT_MODEL=llama3.2

Common Settings (All Providers)


# Model Selection
DEFAULT_CHAT_MODEL=gpt-5-chat          # For fast conversations
DEFAULT_REASONING_MODEL=gpt-5          # For complex analysis
ENABLE_MODEL_ROUTING=true              # Auto-select optimal model

# GPT-5-Chat Settings (Traditional Chat Model)
DEFAULT_TEMPERATURE=0.7                # Controls randomness (0.0-2.0)
DEFAULT_TOP_P=1.0                      # Nucleus sampling
DEFAULT_PRESENCE_PENALTY=0             # Encourages new topics (-2.0 to 2.0)
DEFAULT_FREQUENCY_PENALTY=0            # Reduces repetition (-2.0 to 2.0)

# GPT-5 Reasoning Model Settings
DEFAULT_REASONING_EFFORT=medium        # minimal, low, medium, high
DEFAULT_VERBOSITY=medium               # low, medium, high

# Performance Settings
MAX_COMPLETION_TOKENS=128000          # Max tokens for responses
ENABLE_STREAMING=true                 # Enable streaming for GPT-5-Chat
SESSION_TIMEOUT_MINUTES=60            # Session cleanup interval

🛠️ Available Tools

Smart Routing

gpt5_smart - Intelligently routes to the best model based on query complexity

GPT-5-Chat Tools (Fast & Creative)

gpt5_chat_fast - Fast conversational AI with full parameter control
gpt5_chat_creative - Optimized for creative writing

GPT-5 Reasoning Tools (Deep Analysis)

gpt5_reasoning - Advanced reasoning for complex problems
gpt5_chain_of_thought - Step-by-step problem solving

Specialized Tools

gpt5_code - Code analysis, review, optimization, debugging
gpt5_code_generate - Generate code from specifications
gpt5_design - UI/UX, architecture, database design
gpt5_write - Generate various types of text content
gpt5_brainstorm - Creative idea generation
gpt5_translate - Multi-language translation
gpt5_summarize - Text summarization

Conversation Management

gpt5_conversation - Multi-turn conversations with context
gpt5_list_sessions - View active sessions
gpt5_clear_sessions - Clear all sessions

Universal Tools (New in v3.0)

ai_smart - Universal smart routing across all providers
ai_search - Real-time web search with multiple providers
ai_multimodal - Image analysis across multiple providers
ai_image_generate - Image generation with provider selection
ai_compare_models - Side-by-side model comparison
ai_best_for - Get model recommendations for specific tasks
ai_usage_report - Comprehensive cost and usage analytics
ai_benchmark_models - Performance analysis across models
ai_model_recommendations - Use case-specific model suggestions

Utility Tools

gpt5_explain_routing - Explain model selection logic
gpt5_model_info - Get model capabilities

💡 Usage Examples

Smart Auto-Routing

{
  "tool": "gpt5_smart",
  "parameters": {
    "message": "Analyze the algorithmic complexity of this sorting algorithm",
    "autoRoute": true
  }
}
// Automatically routes to GPT-5 for complex analysis

Fast Chat with Temperature Control

{
  "tool": "gpt5_chat_fast",
  "parameters": {
    "message": "Write a creative story about AI",
    "temperature": 0.9,
    "topP": 0.95,
    "presencePenalty": 0.3
  }
}

Code Review with Reasoning

{
  "tool": "gpt5_code",
  "parameters": {
    "code": "function quickSort(arr) { ... }",
    "task": "review",
    "language": "javascript",
    "reasoningEffort": "high"
  }
}

System Design

{
  "tool": "gpt5_design",
  "parameters": {
    "brief": "Design a scalable microservices architecture for e-commerce",
    "type": "architecture",
    "depth": "deep",
    "constraints": ["AWS cloud", "100k concurrent users", "Sub-second response"]
  }
}

Multi-turn Conversation

{
  "tool": "gpt5_conversation",
  "parameters": {
    "message": "Let's discuss machine learning",
    "model": "gpt-5-chat",
    "temperature": 0.7
  }
}

Local Private Models (Ollama)

{
  "tool": "ai_smart",
  "parameters": {
    "message": "Analyze this private code locally",
    "forceModel": "llama3.2",
    "preferredModel": "llama3.2"
  }
}
// Automatically uses local Ollama for complete privacy

Universal Model Comparison

{
  "tool": "ai_benchmark_models",
  "parameters": {
    "testPrompts": ["Explain quantum computing", "Write a Python function"],
    "models": [
      {"provider": "azure", "model": "gpt-5"},
      {"provider": "xai", "model": "grok-4"},
      {"provider": "ollama", "model": "llama3.2"}
    ]
  }
}

🎯 Model Selection Strategy

The server uses intelligent routing based on query analysis:

GPT-5-Chat is selected for:

Quick conversations and Q&A
Creative writing tasks
Brainstorming sessions
General assistance
Real-time applications

GPT-5 is selected for:

Complex problem solving
Code analysis and generation
Mathematical proofs
System design
Multi-step reasoning
Technical documentation

🔧 Parameter Optimization Guide

GPT-5-Chat Temperature Settings

0.0-0.3: Technical documentation, factual responses
0.4-0.7: Balanced conversations, general assistance
0.8-1.2: Creative writing, brainstorming
1.3-2.0: Highly creative, experimental content

GPT-5 Reasoning Effort Levels

minimal: Quick analysis, simple problems
low: Standard analysis with basic reasoning
medium: Thorough analysis with detailed reasoning
high: Exhaustive analysis with comprehensive reasoning

🚀 Running the Server

Windows (Batch File)

start.bat

Manual Start

npm start

Development Mode

npm run dev

🔌 Claude Code Integration

Add to your Claude Code configuration:

Windows (%APPDATA%\Claude\claude_desktop_config.json):

{
  "mcpServers": {
    "gpt5-agent": {
      "command": "node",
      "args": ["C:\\path\\to\\gpt5-mcp-agent\\dist\\index.js"]
    }
  }
}

macOS/Linux (~/.config/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "gpt5-agent": {
      "command": "node",
      "args": ["/path/to/gpt5-mcp-agent/dist/index.js"]
    }
  }
}

Replace the path with your actual installation directory. After adding the configuration, restart Claude Code completely.

📊 Performance Characteristics

Use Case	Recommended Model	Settings	Expected Latency
Quick chat	GPT-5-Chat	temp: 0.7	1-3 seconds
Creative writing	GPT-5-Chat	temp: 0.9-1.2	1-3 seconds
Code generation	GPT-5	effort: high	5-15 seconds
Complex analysis	GPT-5	effort: high, verbosity: high	10-30 seconds
Simple Q&A	GPT-5-Chat	temp: 0.3	1-2 seconds

🐛 Troubleshooting

Common Issues

"max_tokens is too large" error
- Fixed in v2.0: GPT-5-Chat max output is now correctly limited to 16384 tokens
- The server automatically enforces model-specific token limits
- Override with maxTokens parameter if needed (will be capped at model limit)
"Unsupported parameter" errors
- GPT-5 reasoning models don't support temperature
- GPT-5-Chat doesn't support reasoning_effort
- The server automatically handles parameter routing
Slow responses
- Reasoning models have higher latency
- Reduce reasoning_effort for faster responses
- Use fast models (Groq, GPT-4o) for real-time needs
Model selection issues
- Check gpt5_explain_routing to understand selection logic
- Use forceModel parameter to override auto-routing
- Adjust ENABLE_MODEL_ROUTING in .env
Session management
- Sessions auto-expire after 60 minutes
- Use gpt5_clear_sessions to reset
- Check SESSION_TIMEOUT_MINUTES in .env

🔒 Security Notes

API keys stored in .env (excluded from version control)
Supports both Azure and OpenAI API providers
Automatic session cleanup prevents memory leaks
Error messages sanitized to avoid exposing sensitive data
All requests validated with Zod schemas

📈 Advanced Features

Streaming Support

GPT-5-Chat supports streaming for real-time responses. Enable with stream: true parameter.

Smart Context Management

Sessions automatically manage conversation history, keeping the most relevant messages within token limits.

Model Fallback

If a model fails, the system can automatically fallback to alternative models.

🛠️ Development

Project Structure

gpt5-agent/
├── src/
│   ├── index.ts           # Main server with all tools
│   ├── gpt5-client.ts     # Enhanced client with dual model support
│   ├── model-router.ts    # Intelligent model selection logic
│   └── types.ts           # TypeScript definitions
├── dist/                  # Compiled JavaScript
├── .env                   # Configuration
└── package.json

Building

npm run build

Testing

# Test build
debug.bat

# Test server startup
test-server.bat

📝 License

MIT License

🤝 Contributing

Contributions welcome! Please ensure:

TypeScript types are properly defined
Error handling is comprehensive
Documentation is updated
Code follows existing patterns

📚 Resources

🎉 Version History

v3.0.0 (Current)

Universal Multi-Provider Support: Azure, OpenAI, xAI, Google, Groq, Ollama
6 AI Providers: 20+ models with intelligent routing
Advanced Routing Algorithms: Multi-criteria decision analysis, context-aware selection
Complete Privacy Options: Local Ollama models with auto-installation
Cost Optimization: Real-time tracking, budget controls, usage analytics
Model Comparison Tools: Performance benchmarking and recommendations
25+ Specialized Tools: Universal tools + full backward compatibility
Enhanced Capabilities: Real-time search, image generation, multimodal support

v2.0.0

Full dual-model support (GPT-5 + GPT-5-Chat)
Intelligent model routing
Comprehensive parameter support
17 specialized tools
Streaming support
Enhanced error handling

v1.0.0

Initial release with GPT-5 support
Basic conversation tools
Session management