Daniel-DDV/universal-ai-mcp-server
If you are the rightful owner of universal-ai-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.
The Universal AI MCP Server v3.0 is a comprehensive Model Context Protocol server that provides intelligent access to multiple AI providers and models.
Universal AI MCP Server v3.0
A comprehensive MCP (Model Context Protocol) server that provides intelligent access to multiple AI providers and models including GPT-5, Grok-4, Gemini 2.5, Groq, and local Ollama models - with advanced routing, cost optimization, and 25+ specialized tools.
🆕 NEW in v3.0: Multi-provider support with xAI Grok, Google Gemini, Groq inference, cost tracking, and intelligent model routing across all providers.
🚀 Key Features
🌐 Universal Multi-Provider Support
- Azure OpenAI: GPT-5, GPT-5-Chat, GPT-4o, o3, o1-preview, o1-mini, DALL-E 3
- OpenAI Direct: All OpenAI models with direct API access
- xAI Grok: Grok-4 (with real-time search), Grok-3-Mini, Grok-2-Image
- Google AI: Gemini 2.5 Pro/Flash/Flash-Thinking (1M context)
- Groq: Ultra-fast inference (1,500+ tokens/sec) for Llama and GPT models
- Ollama: Local models for complete privacy (llama3.2, codellama, mistral, qwen2.5)
🧠 Intelligent Model Routing
- Smart Selection: Automatically chooses optimal model based on task complexity, cost, speed, and quality requirements
- Multi-Strategy Routing: Cost-optimized, speed-optimized, quality-first, privacy-first strategies
- Fallback Chain: Automatic failover between providers for reliability
- Real-time Optimization: Learns from usage patterns to improve selections
💰 Advanced Cost Management
- Real-time Cost Tracking: Monitor spending across all providers
- Budget Controls: Daily/monthly spending limits with alerts
- Cost Optimization: Automatically route to cost-effective models when appropriate
- Usage Analytics: Detailed reports with optimization recommendations
⚡ Performance & Capabilities
- Ultra-fast Responses: Sub-second first tokens with Groq infrastructure
- Massive Context: Up to 1M tokens with Gemini models
- Real-time Search: Live web search with Grok-4
- Multimodal Support: Image analysis and generation across providers
- Session Management: Persistent conversations with automatic cleanup
🛡️ Privacy & Security
- Privacy Levels: Public, private, and local-only processing options
- Data Retention: Configurable session and data retention policies
- Local Processing: Complete privacy with Ollama integration
- Secure API Handling: Best practices for API key management
📊 Model Comparison Matrix
| Provider | Model | Context | Speed | Cost/1M | Best For | Capabilities |
|---|---|---|---|---|---|---|
| Azure/OpenAI | GPT-5 | 272k | Slow | $10/$30 | Complex reasoning | Text, Code, Reasoning |
| Azure/OpenAI | GPT-5-Chat | 128k | Fast | $5/$15 | Conversations | Text, Multimodal, Functions |
| Azure/OpenAI | GPT-4o | 128k | Fast | $5/$15 | Multimodal | Text, Vision, Functions |
| xAI | Grok-4 | 128k | Medium | $3/$15 | Real-time research | Text, Search, Reasoning |
| xAI | Grok-3-Mini | 32k | Fast | $0.30/$0.50 | Budget tasks | Text, Basic reasoning |
| Gemini 2.5 Pro | 1M | Medium | $2.50/$10 | Large contexts | Multimodal, Reasoning | |
| Gemini 2.5 Flash | 1M | Very Fast | $0.075/$0.30 | Speed + quality | Multimodal, Fast | |
| Groq | Llama 3.3 70B | 131k | Ultra Fast | $0.59/$0.79 | Real-time chat | Text, Code |
| Groq | GPT-OSS 120B | 32k | Ultra Fast | $1.25/$1.25 | Quality + speed | Text, Reasoning |
| Ollama | Llama 3.2 | 131k | Fast | Free | Private/local | Text, Code, Local |
| Ollama | Code Llama | 16k | Fast | Free | Code generation | Code, Local |
| Ollama | Mistral 7B | 32k | Very Fast | Free | Efficient chat | Text, Multilingual |
| Ollama | Qwen 2.5 | 131k | Medium | Free | Advanced reasoning | Text, Code, Reasoning |
📦 Installation
- Clone the repository:
git clone https://github.com/yourusername/gpt5-mcp-agent.git
cd gpt5-mcp-agent
- Install dependencies:
npm install
- Build the TypeScript code:
npm run build
⚙️ Configuration
The server supports both Azure OpenAI and standard OpenAI API. Configure via environment variables in .env:
Option 1: Azure OpenAI
# Provider Selection
API_PROVIDER=azure
# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY=your_api_key_here
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_VERSION=2025-01-01-preview
# Azure Deployment Names (customize to match your deployments)
AZURE_OPENAI_DEPLOYMENT_GPT5=your-gpt5-deployment-name
AZURE_OPENAI_DEPLOYMENT_GPT5_CHAT=your-gpt5-chat-deployment-name
Option 2: Standard OpenAI API
# Provider Selection
API_PROVIDER=openai
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Custom base URL (leave empty for default)
OPENAI_BASE_URL=
# Optional: Organization ID
OPENAI_ORG_ID=
# OpenAI Model Names
OPENAI_MODEL_GPT5=gpt-5
OPENAI_MODEL_GPT5_CHAT=gpt-5-chat
Option 3: Universal Multi-Provider Setup
# Multiple Providers Enabled
PROVIDERS_ENABLED=azure,xai,google,groq,ollama
# xAI Grok Configuration
XAI_API_KEY=your_xai_api_key_here
XAI_ENABLE_LIVE_SEARCH=true
XAI_SEARCH_DEPTH=deep
# Google AI Configuration
GOOGLE_AI_API_KEY=your_google_api_key_here
GOOGLE_PROJECT_ID=your_project_id
GOOGLE_LOCATION=us-central1
# Groq Configuration
GROQ_API_KEY=your_groq_api_key_here
GROQ_PRIORITY_TIER=paid
# Ollama Local Models
OLLAMA_HOST=http://localhost:11434
OLLAMA_AUTO_INSTALL=true
OLLAMA_DEFAULT_MODEL=llama3.2
Common Settings (All Providers)
# Model Selection
DEFAULT_CHAT_MODEL=gpt-5-chat # For fast conversations
DEFAULT_REASONING_MODEL=gpt-5 # For complex analysis
ENABLE_MODEL_ROUTING=true # Auto-select optimal model
# GPT-5-Chat Settings (Traditional Chat Model)
DEFAULT_TEMPERATURE=0.7 # Controls randomness (0.0-2.0)
DEFAULT_TOP_P=1.0 # Nucleus sampling
DEFAULT_PRESENCE_PENALTY=0 # Encourages new topics (-2.0 to 2.0)
DEFAULT_FREQUENCY_PENALTY=0 # Reduces repetition (-2.0 to 2.0)
# GPT-5 Reasoning Model Settings
DEFAULT_REASONING_EFFORT=medium # minimal, low, medium, high
DEFAULT_VERBOSITY=medium # low, medium, high
# Performance Settings
MAX_COMPLETION_TOKENS=128000 # Max tokens for responses
ENABLE_STREAMING=true # Enable streaming for GPT-5-Chat
SESSION_TIMEOUT_MINUTES=60 # Session cleanup interval
🛠️ Available Tools
Smart Routing
gpt5_smart- Intelligently routes to the best model based on query complexity
GPT-5-Chat Tools (Fast & Creative)
gpt5_chat_fast- Fast conversational AI with full parameter controlgpt5_chat_creative- Optimized for creative writing
GPT-5 Reasoning Tools (Deep Analysis)
gpt5_reasoning- Advanced reasoning for complex problemsgpt5_chain_of_thought- Step-by-step problem solving
Specialized Tools
gpt5_code- Code analysis, review, optimization, debugginggpt5_code_generate- Generate code from specificationsgpt5_design- UI/UX, architecture, database designgpt5_write- Generate various types of text contentgpt5_brainstorm- Creative idea generationgpt5_translate- Multi-language translationgpt5_summarize- Text summarization
Conversation Management
gpt5_conversation- Multi-turn conversations with contextgpt5_list_sessions- View active sessionsgpt5_clear_sessions- Clear all sessions
Universal Tools (New in v3.0)
ai_smart- Universal smart routing across all providersai_search- Real-time web search with multiple providersai_multimodal- Image analysis across multiple providersai_image_generate- Image generation with provider selectionai_compare_models- Side-by-side model comparisonai_best_for- Get model recommendations for specific tasksai_usage_report- Comprehensive cost and usage analyticsai_benchmark_models- Performance analysis across modelsai_model_recommendations- Use case-specific model suggestions
Utility Tools
gpt5_explain_routing- Explain model selection logicgpt5_model_info- Get model capabilities
💡 Usage Examples
Smart Auto-Routing
{
"tool": "gpt5_smart",
"parameters": {
"message": "Analyze the algorithmic complexity of this sorting algorithm",
"autoRoute": true
}
}
// Automatically routes to GPT-5 for complex analysis
Fast Chat with Temperature Control
{
"tool": "gpt5_chat_fast",
"parameters": {
"message": "Write a creative story about AI",
"temperature": 0.9,
"topP": 0.95,
"presencePenalty": 0.3
}
}
Code Review with Reasoning
{
"tool": "gpt5_code",
"parameters": {
"code": "function quickSort(arr) { ... }",
"task": "review",
"language": "javascript",
"reasoningEffort": "high"
}
}
System Design
{
"tool": "gpt5_design",
"parameters": {
"brief": "Design a scalable microservices architecture for e-commerce",
"type": "architecture",
"depth": "deep",
"constraints": ["AWS cloud", "100k concurrent users", "Sub-second response"]
}
}
Multi-turn Conversation
{
"tool": "gpt5_conversation",
"parameters": {
"message": "Let's discuss machine learning",
"model": "gpt-5-chat",
"temperature": 0.7
}
}
Local Private Models (Ollama)
{
"tool": "ai_smart",
"parameters": {
"message": "Analyze this private code locally",
"forceModel": "llama3.2",
"preferredModel": "llama3.2"
}
}
// Automatically uses local Ollama for complete privacy
Universal Model Comparison
{
"tool": "ai_benchmark_models",
"parameters": {
"testPrompts": ["Explain quantum computing", "Write a Python function"],
"models": [
{"provider": "azure", "model": "gpt-5"},
{"provider": "xai", "model": "grok-4"},
{"provider": "ollama", "model": "llama3.2"}
]
}
}
🎯 Model Selection Strategy
The server uses intelligent routing based on query analysis:
GPT-5-Chat is selected for:
- Quick conversations and Q&A
- Creative writing tasks
- Brainstorming sessions
- General assistance
- Real-time applications
GPT-5 is selected for:
- Complex problem solving
- Code analysis and generation
- Mathematical proofs
- System design
- Multi-step reasoning
- Technical documentation
🔧 Parameter Optimization Guide
GPT-5-Chat Temperature Settings
- 0.0-0.3: Technical documentation, factual responses
- 0.4-0.7: Balanced conversations, general assistance
- 0.8-1.2: Creative writing, brainstorming
- 1.3-2.0: Highly creative, experimental content
GPT-5 Reasoning Effort Levels
- minimal: Quick analysis, simple problems
- low: Standard analysis with basic reasoning
- medium: Thorough analysis with detailed reasoning
- high: Exhaustive analysis with comprehensive reasoning
🚀 Running the Server
Windows (Batch File)
start.bat
Manual Start
npm start
Development Mode
npm run dev
🔌 Claude Code Integration
Add to your Claude Code configuration:
Windows (%APPDATA%\Claude\claude_desktop_config.json):
{
"mcpServers": {
"gpt5-agent": {
"command": "node",
"args": ["C:\\path\\to\\gpt5-mcp-agent\\dist\\index.js"]
}
}
}
macOS/Linux (~/.config/Claude/claude_desktop_config.json):
{
"mcpServers": {
"gpt5-agent": {
"command": "node",
"args": ["/path/to/gpt5-mcp-agent/dist/index.js"]
}
}
}
Replace the path with your actual installation directory. After adding the configuration, restart Claude Code completely.
📊 Performance Characteristics
| Use Case | Recommended Model | Settings | Expected Latency |
|---|---|---|---|
| Quick chat | GPT-5-Chat | temp: 0.7 | 1-3 seconds |
| Creative writing | GPT-5-Chat | temp: 0.9-1.2 | 1-3 seconds |
| Code generation | GPT-5 | effort: high | 5-15 seconds |
| Complex analysis | GPT-5 | effort: high, verbosity: high | 10-30 seconds |
| Simple Q&A | GPT-5-Chat | temp: 0.3 | 1-2 seconds |
🐛 Troubleshooting
Common Issues
-
"max_tokens is too large" error
- Fixed in v2.0: GPT-5-Chat max output is now correctly limited to 16384 tokens
- The server automatically enforces model-specific token limits
- Override with
maxTokensparameter if needed (will be capped at model limit)
-
"Unsupported parameter" errors
- GPT-5 reasoning models don't support temperature
- GPT-5-Chat doesn't support reasoning_effort
- The server automatically handles parameter routing
-
Slow responses
- Reasoning models have higher latency
- Reduce reasoning_effort for faster responses
- Use fast models (Groq, GPT-4o) for real-time needs
-
Model selection issues
- Check
gpt5_explain_routingto understand selection logic - Use
forceModelparameter to override auto-routing - Adjust
ENABLE_MODEL_ROUTINGin .env
- Check
-
Session management
- Sessions auto-expire after 60 minutes
- Use
gpt5_clear_sessionsto reset - Check
SESSION_TIMEOUT_MINUTESin .env
🔒 Security Notes
- API keys stored in
.env(excluded from version control) - Supports both Azure and OpenAI API providers
- Automatic session cleanup prevents memory leaks
- Error messages sanitized to avoid exposing sensitive data
- All requests validated with Zod schemas
📈 Advanced Features
Streaming Support
GPT-5-Chat supports streaming for real-time responses. Enable with stream: true parameter.
Smart Context Management
Sessions automatically manage conversation history, keeping the most relevant messages within token limits.
Model Fallback
If a model fails, the system can automatically fallback to alternative models.
🛠️ Development
Project Structure
gpt5-agent/
├── src/
│ ├── index.ts # Main server with all tools
│ ├── gpt5-client.ts # Enhanced client with dual model support
│ ├── model-router.ts # Intelligent model selection logic
│ └── types.ts # TypeScript definitions
├── dist/ # Compiled JavaScript
├── .env # Configuration
└── package.json
Building
npm run build
Testing
# Test build
debug.bat
# Test server startup
test-server.bat
📝 License
MIT License
🤝 Contributing
Contributions welcome! Please ensure:
- TypeScript types are properly defined
- Error handling is comprehensive
- Documentation is updated
- Code follows existing patterns
📚 Resources
🎉 Version History
v3.0.0 (Current)
- Universal Multi-Provider Support: Azure, OpenAI, xAI, Google, Groq, Ollama
- 6 AI Providers: 20+ models with intelligent routing
- Advanced Routing Algorithms: Multi-criteria decision analysis, context-aware selection
- Complete Privacy Options: Local Ollama models with auto-installation
- Cost Optimization: Real-time tracking, budget controls, usage analytics
- Model Comparison Tools: Performance benchmarking and recommendations
- 25+ Specialized Tools: Universal tools + full backward compatibility
- Enhanced Capabilities: Real-time search, image generation, multimodal support
v2.0.0
- Full dual-model support (GPT-5 + GPT-5-Chat)
- Intelligent model routing
- Comprehensive parameter support
- 17 specialized tools
- Streaming support
- Enhanced error handling
v1.0.0
- Initial release with GPT-5 support
- Basic conversation tools
- Session management