universal-ai-mcp-server

Daniel-DDV/universal-ai-mcp-server

3.1

If you are the rightful owner of universal-ai-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The Universal AI MCP Server v3.0 is a comprehensive Model Context Protocol server that provides intelligent access to multiple AI providers and models.

Tools
5
Resources
0
Prompts
0

Universal AI MCP Server v3.0

A comprehensive MCP (Model Context Protocol) server that provides intelligent access to multiple AI providers and models including GPT-5, Grok-4, Gemini 2.5, Groq, and local Ollama models - with advanced routing, cost optimization, and 25+ specialized tools.

🆕 NEW in v3.0: Multi-provider support with xAI Grok, Google Gemini, Groq inference, cost tracking, and intelligent model routing across all providers.

🚀 Key Features

🌐 Universal Multi-Provider Support

  • Azure OpenAI: GPT-5, GPT-5-Chat, GPT-4o, o3, o1-preview, o1-mini, DALL-E 3
  • OpenAI Direct: All OpenAI models with direct API access
  • xAI Grok: Grok-4 (with real-time search), Grok-3-Mini, Grok-2-Image
  • Google AI: Gemini 2.5 Pro/Flash/Flash-Thinking (1M context)
  • Groq: Ultra-fast inference (1,500+ tokens/sec) for Llama and GPT models
  • Ollama: Local models for complete privacy (llama3.2, codellama, mistral, qwen2.5)

🧠 Intelligent Model Routing

  • Smart Selection: Automatically chooses optimal model based on task complexity, cost, speed, and quality requirements
  • Multi-Strategy Routing: Cost-optimized, speed-optimized, quality-first, privacy-first strategies
  • Fallback Chain: Automatic failover between providers for reliability
  • Real-time Optimization: Learns from usage patterns to improve selections

💰 Advanced Cost Management

  • Real-time Cost Tracking: Monitor spending across all providers
  • Budget Controls: Daily/monthly spending limits with alerts
  • Cost Optimization: Automatically route to cost-effective models when appropriate
  • Usage Analytics: Detailed reports with optimization recommendations

⚡ Performance & Capabilities

  • Ultra-fast Responses: Sub-second first tokens with Groq infrastructure
  • Massive Context: Up to 1M tokens with Gemini models
  • Real-time Search: Live web search with Grok-4
  • Multimodal Support: Image analysis and generation across providers
  • Session Management: Persistent conversations with automatic cleanup

🛡️ Privacy & Security

  • Privacy Levels: Public, private, and local-only processing options
  • Data Retention: Configurable session and data retention policies
  • Local Processing: Complete privacy with Ollama integration
  • Secure API Handling: Best practices for API key management

📊 Model Comparison Matrix

ProviderModelContextSpeedCost/1MBest ForCapabilities
Azure/OpenAIGPT-5272kSlow$10/$30Complex reasoningText, Code, Reasoning
Azure/OpenAIGPT-5-Chat128kFast$5/$15ConversationsText, Multimodal, Functions
Azure/OpenAIGPT-4o128kFast$5/$15MultimodalText, Vision, Functions
xAIGrok-4128kMedium$3/$15Real-time researchText, Search, Reasoning
xAIGrok-3-Mini32kFast$0.30/$0.50Budget tasksText, Basic reasoning
GoogleGemini 2.5 Pro1MMedium$2.50/$10Large contextsMultimodal, Reasoning
GoogleGemini 2.5 Flash1MVery Fast$0.075/$0.30Speed + qualityMultimodal, Fast
GroqLlama 3.3 70B131kUltra Fast$0.59/$0.79Real-time chatText, Code
GroqGPT-OSS 120B32kUltra Fast$1.25/$1.25Quality + speedText, Reasoning
OllamaLlama 3.2131kFastFreePrivate/localText, Code, Local
OllamaCode Llama16kFastFreeCode generationCode, Local
OllamaMistral 7B32kVery FastFreeEfficient chatText, Multilingual
OllamaQwen 2.5131kMediumFreeAdvanced reasoningText, Code, Reasoning

📦 Installation

  1. Clone the repository:
git clone https://github.com/yourusername/gpt5-mcp-agent.git
cd gpt5-mcp-agent
  1. Install dependencies:
npm install
  1. Build the TypeScript code:
npm run build

⚙️ Configuration

The server supports both Azure OpenAI and standard OpenAI API. Configure via environment variables in .env:

Option 1: Azure OpenAI

# Provider Selection
API_PROVIDER=azure

# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2025-01-01-preview

# Azure Deployment Names (customize to match your deployments)
AZURE_OPENAI_DEPLOYMENT_GPT5=your-gpt5-deployment-name
AZURE_OPENAI_DEPLOYMENT_GPT5_CHAT=your-gpt5-chat-deployment-name

Option 2: Standard OpenAI API

# Provider Selection
API_PROVIDER=openai

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Custom base URL (leave empty for default)
OPENAI_BASE_URL=
# Optional: Organization ID
OPENAI_ORG_ID=

# OpenAI Model Names
OPENAI_MODEL_GPT5=gpt-5
OPENAI_MODEL_GPT5_CHAT=gpt-5-chat

Option 3: Universal Multi-Provider Setup

# Multiple Providers Enabled
PROVIDERS_ENABLED=azure,xai,google,groq,ollama

# xAI Grok Configuration
XAI_API_KEY=your_xai_api_key_here
XAI_ENABLE_LIVE_SEARCH=true
XAI_SEARCH_DEPTH=deep

# Google AI Configuration  
GOOGLE_AI_API_KEY=your_google_api_key_here
GOOGLE_PROJECT_ID=your_project_id
GOOGLE_LOCATION=us-central1

# Groq Configuration
GROQ_API_KEY=your_groq_api_key_here
GROQ_PRIORITY_TIER=paid

# Ollama Local Models
OLLAMA_HOST=http://localhost:11434
OLLAMA_AUTO_INSTALL=true
OLLAMA_DEFAULT_MODEL=llama3.2

Common Settings (All Providers)


# Model Selection
DEFAULT_CHAT_MODEL=gpt-5-chat          # For fast conversations
DEFAULT_REASONING_MODEL=gpt-5          # For complex analysis
ENABLE_MODEL_ROUTING=true              # Auto-select optimal model

# GPT-5-Chat Settings (Traditional Chat Model)
DEFAULT_TEMPERATURE=0.7                # Controls randomness (0.0-2.0)
DEFAULT_TOP_P=1.0                      # Nucleus sampling
DEFAULT_PRESENCE_PENALTY=0             # Encourages new topics (-2.0 to 2.0)
DEFAULT_FREQUENCY_PENALTY=0            # Reduces repetition (-2.0 to 2.0)

# GPT-5 Reasoning Model Settings
DEFAULT_REASONING_EFFORT=medium        # minimal, low, medium, high
DEFAULT_VERBOSITY=medium               # low, medium, high

# Performance Settings
MAX_COMPLETION_TOKENS=128000          # Max tokens for responses
ENABLE_STREAMING=true                 # Enable streaming for GPT-5-Chat
SESSION_TIMEOUT_MINUTES=60            # Session cleanup interval

🛠️ Available Tools

Smart Routing

  • gpt5_smart - Intelligently routes to the best model based on query complexity

GPT-5-Chat Tools (Fast & Creative)

  • gpt5_chat_fast - Fast conversational AI with full parameter control
  • gpt5_chat_creative - Optimized for creative writing

GPT-5 Reasoning Tools (Deep Analysis)

  • gpt5_reasoning - Advanced reasoning for complex problems
  • gpt5_chain_of_thought - Step-by-step problem solving

Specialized Tools

  • gpt5_code - Code analysis, review, optimization, debugging
  • gpt5_code_generate - Generate code from specifications
  • gpt5_design - UI/UX, architecture, database design
  • gpt5_write - Generate various types of text content
  • gpt5_brainstorm - Creative idea generation
  • gpt5_translate - Multi-language translation
  • gpt5_summarize - Text summarization

Conversation Management

  • gpt5_conversation - Multi-turn conversations with context
  • gpt5_list_sessions - View active sessions
  • gpt5_clear_sessions - Clear all sessions

Universal Tools (New in v3.0)

  • ai_smart - Universal smart routing across all providers
  • ai_search - Real-time web search with multiple providers
  • ai_multimodal - Image analysis across multiple providers
  • ai_image_generate - Image generation with provider selection
  • ai_compare_models - Side-by-side model comparison
  • ai_best_for - Get model recommendations for specific tasks
  • ai_usage_report - Comprehensive cost and usage analytics
  • ai_benchmark_models - Performance analysis across models
  • ai_model_recommendations - Use case-specific model suggestions

Utility Tools

  • gpt5_explain_routing - Explain model selection logic
  • gpt5_model_info - Get model capabilities

💡 Usage Examples

Smart Auto-Routing

{
  "tool": "gpt5_smart",
  "parameters": {
    "message": "Analyze the algorithmic complexity of this sorting algorithm",
    "autoRoute": true
  }
}
// Automatically routes to GPT-5 for complex analysis

Fast Chat with Temperature Control

{
  "tool": "gpt5_chat_fast",
  "parameters": {
    "message": "Write a creative story about AI",
    "temperature": 0.9,
    "topP": 0.95,
    "presencePenalty": 0.3
  }
}

Code Review with Reasoning

{
  "tool": "gpt5_code",
  "parameters": {
    "code": "function quickSort(arr) { ... }",
    "task": "review",
    "language": "javascript",
    "reasoningEffort": "high"
  }
}

System Design

{
  "tool": "gpt5_design",
  "parameters": {
    "brief": "Design a scalable microservices architecture for e-commerce",
    "type": "architecture",
    "depth": "deep",
    "constraints": ["AWS cloud", "100k concurrent users", "Sub-second response"]
  }
}

Multi-turn Conversation

{
  "tool": "gpt5_conversation",
  "parameters": {
    "message": "Let's discuss machine learning",
    "model": "gpt-5-chat",
    "temperature": 0.7
  }
}

Local Private Models (Ollama)

{
  "tool": "ai_smart",
  "parameters": {
    "message": "Analyze this private code locally",
    "forceModel": "llama3.2",
    "preferredModel": "llama3.2"
  }
}
// Automatically uses local Ollama for complete privacy

Universal Model Comparison

{
  "tool": "ai_benchmark_models",
  "parameters": {
    "testPrompts": ["Explain quantum computing", "Write a Python function"],
    "models": [
      {"provider": "azure", "model": "gpt-5"},
      {"provider": "xai", "model": "grok-4"},
      {"provider": "ollama", "model": "llama3.2"}
    ]
  }
}

🎯 Model Selection Strategy

The server uses intelligent routing based on query analysis:

GPT-5-Chat is selected for:

  • Quick conversations and Q&A
  • Creative writing tasks
  • Brainstorming sessions
  • General assistance
  • Real-time applications

GPT-5 is selected for:

  • Complex problem solving
  • Code analysis and generation
  • Mathematical proofs
  • System design
  • Multi-step reasoning
  • Technical documentation

🔧 Parameter Optimization Guide

GPT-5-Chat Temperature Settings

  • 0.0-0.3: Technical documentation, factual responses
  • 0.4-0.7: Balanced conversations, general assistance
  • 0.8-1.2: Creative writing, brainstorming
  • 1.3-2.0: Highly creative, experimental content

GPT-5 Reasoning Effort Levels

  • minimal: Quick analysis, simple problems
  • low: Standard analysis with basic reasoning
  • medium: Thorough analysis with detailed reasoning
  • high: Exhaustive analysis with comprehensive reasoning

🚀 Running the Server

Windows (Batch File)

start.bat

Manual Start

npm start

Development Mode

npm run dev

🔌 Claude Code Integration

Add to your Claude Code configuration:

Windows (%APPDATA%\Claude\claude_desktop_config.json):

{
  "mcpServers": {
    "gpt5-agent": {
      "command": "node",
      "args": ["C:\\path\\to\\gpt5-mcp-agent\\dist\\index.js"]
    }
  }
}

macOS/Linux (~/.config/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "gpt5-agent": {
      "command": "node",
      "args": ["/path/to/gpt5-mcp-agent/dist/index.js"]
    }
  }
}

Replace the path with your actual installation directory. After adding the configuration, restart Claude Code completely.

📊 Performance Characteristics

Use CaseRecommended ModelSettingsExpected Latency
Quick chatGPT-5-Chattemp: 0.71-3 seconds
Creative writingGPT-5-Chattemp: 0.9-1.21-3 seconds
Code generationGPT-5effort: high5-15 seconds
Complex analysisGPT-5effort: high, verbosity: high10-30 seconds
Simple Q&AGPT-5-Chattemp: 0.31-2 seconds

🐛 Troubleshooting

Common Issues

  1. "max_tokens is too large" error

    • Fixed in v2.0: GPT-5-Chat max output is now correctly limited to 16384 tokens
    • The server automatically enforces model-specific token limits
    • Override with maxTokens parameter if needed (will be capped at model limit)
  2. "Unsupported parameter" errors

    • GPT-5 reasoning models don't support temperature
    • GPT-5-Chat doesn't support reasoning_effort
    • The server automatically handles parameter routing
  3. Slow responses

    • Reasoning models have higher latency
    • Reduce reasoning_effort for faster responses
    • Use fast models (Groq, GPT-4o) for real-time needs
  4. Model selection issues

    • Check gpt5_explain_routing to understand selection logic
    • Use forceModel parameter to override auto-routing
    • Adjust ENABLE_MODEL_ROUTING in .env
  5. Session management

    • Sessions auto-expire after 60 minutes
    • Use gpt5_clear_sessions to reset
    • Check SESSION_TIMEOUT_MINUTES in .env

🔒 Security Notes

  • API keys stored in .env (excluded from version control)
  • Supports both Azure and OpenAI API providers
  • Automatic session cleanup prevents memory leaks
  • Error messages sanitized to avoid exposing sensitive data
  • All requests validated with Zod schemas

📈 Advanced Features

Streaming Support

GPT-5-Chat supports streaming for real-time responses. Enable with stream: true parameter.

Smart Context Management

Sessions automatically manage conversation history, keeping the most relevant messages within token limits.

Model Fallback

If a model fails, the system can automatically fallback to alternative models.

🛠️ Development

Project Structure

gpt5-agent/
├── src/
│   ├── index.ts           # Main server with all tools
│   ├── gpt5-client.ts     # Enhanced client with dual model support
│   ├── model-router.ts    # Intelligent model selection logic
│   └── types.ts           # TypeScript definitions
├── dist/                  # Compiled JavaScript
├── .env                   # Configuration
└── package.json

Building

npm run build

Testing

# Test build
debug.bat

# Test server startup
test-server.bat

📝 License

MIT License

🤝 Contributing

Contributions welcome! Please ensure:

  • TypeScript types are properly defined
  • Error handling is comprehensive
  • Documentation is updated
  • Code follows existing patterns

📚 Resources

🎉 Version History

v3.0.0 (Current)

  • Universal Multi-Provider Support: Azure, OpenAI, xAI, Google, Groq, Ollama
  • 6 AI Providers: 20+ models with intelligent routing
  • Advanced Routing Algorithms: Multi-criteria decision analysis, context-aware selection
  • Complete Privacy Options: Local Ollama models with auto-installation
  • Cost Optimization: Real-time tracking, budget controls, usage analytics
  • Model Comparison Tools: Performance benchmarking and recommendations
  • 25+ Specialized Tools: Universal tools + full backward compatibility
  • Enhanced Capabilities: Real-time search, image generation, multimodal support

v2.0.0

  • Full dual-model support (GPT-5 + GPT-5-Chat)
  • Intelligent model routing
  • Comprehensive parameter support
  • 17 specialized tools
  • Streaming support
  • Enhanced error handling

v1.0.0

  • Initial release with GPT-5 support
  • Basic conversation tools
  • Session management