apn-ra/ollama-mcp-proxy
If you are the rightful owner of ollama-mcp-proxy and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The Ollama MCP Proxy is a comprehensive server that bridges Model Context Protocol (MCP) clients with local language models, offering advanced features like RAG integration, context management, caching, and security.
Ollama MCP Proxy
A comprehensive Model Context Protocol (MCP) proxy server that bridges MCP clients with Ollama's local language models, providing advanced features like RAG integration, context management, caching, and production-ready security.
🌟 Features
Core Functionality
- MCP Protocol Implementation: Full server-side MCP support with tools, resources, and prompts
- Ollama Integration: Seamless connection to local Ollama language models
- Multiple Transport Methods: HTTP with Server-Sent Events (SSE) and WebSocket support
- Advanced Context Management: Session-based isolation with conversation branching and merging
Advanced AI Capabilities
- RAG Integration: Vector-based document retrieval with FAISS and sentence transformers
- Knowledge Base Connectivity: Integration with external knowledge sources
- Advanced Summarization: Context window management with intelligent summarization
- Multi-Model Support: Dynamic model discovery and switching
Performance & Production Features
- Intelligent Caching: Multi-tier caching with Redis and local fallback
- Circuit Breaker Pattern: Fault tolerance with automatic recovery
- Rate Limiting: Configurable request throttling and protection
- Streaming Optimization: Efficient real-time response streaming
Security & Authentication
- OAuth 2.0 Support: Comprehensive authentication and authorization
- Role-Based Access Control (RBAC): Granular permission management
- Data Encryption: At-rest encryption for sensitive conversation data
- Security Headers: Production-ready security configuration
Developer Experience
- Comprehensive Testing: Unit, integration, and load testing suites
- Development Tools: Hot reload, profiling, and debugging support
- Structured Logging: JSON-formatted logs with correlation IDs
- Configuration Management: Environment-based configuration with validation
🏗️ Architecture
┌─────────────────┐ ┌──────────────────────┐ ┌─────────────────┐
│ MCP Client │ │ Ollama MCP Proxy │ │ Ollama Server │
│ (Claude, etc.) │◄──►│ │◄──►│ (Local AI) │
└─────────────────┘ └──────────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Configuration │
│ & Storage │
└─────────────────┘
Key Components
- OllamaMCPServer: Main MCP server implementation with tool and resource handlers
- OllamaClient: Robust HTTP client with retry logic and circuit breaker
- ContextManager: Sophisticated session management with branching and search
- RAG Integration: Vector-based document retrieval and knowledge augmentation
- Security Framework: Authentication, authorization, and data protection
- Cache System: Multi-level caching with intelligent warming and invalidation
🚀 Quick Start
Prerequisites
- Python 3.8 or higher
- Ollama installed and running locally
- Redis (optional, for distributed caching)
Installation
- Clone the repository:
git clone https://github.com/ollama-mcp-proxy/ollama-mcp-proxy.git
cd ollama-mcp-proxy
- Create and activate virtual environment:
python -m venv venv
# Windows
venv\Scripts\activate
# Unix/macOS
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Start Ollama (if not already running):
ollama serve
- Run the MCP proxy:
python -m ollama_mcp_proxy.server --config config/development.json
Or using the CLI:
ollama-mcp-proxy --config config/development.json
📝 Configuration
The proxy uses JSON configuration files for different environments:
config/development.json- Development settings with debug modeconfig/production.json- Production-ready configuration
Key Configuration Sections
{
"ollama": {
"host": "localhost",
"port": 11434,
"timeout": 30,
"max_retries": 3
},
"mcp": {
"port": 8000,
"transport": "http",
"auth_enabled": false
},
"cache": {
"enabled": true,
"type": "hybrid",
"redis": {
"enabled": true,
"host": "localhost",
"port": 6379
}
},
"rag": {
"enabled": false,
"vector_store": "faiss",
"embedding_model": "all-MiniLM-L6-v2"
}
}
Environment Variables
OLLAMA_HOST- Ollama server host (default: localhost:11434)MCP_PROXY_PORT- MCP proxy port (default: 8000)OLLAMA_MCP_CONFIG- Path to configuration file
🔧 Claude Desktop Integration
Add to your Claude Desktop MCP configuration:
{
"mcpServers": {
"ollama-proxy": {
"command": "python",
"args": ["-m", "ollama_mcp_proxy"],
"env": {
"OLLAMA_HOST": "localhost:11434",
"MCP_PROXY_PORT": "8000"
}
}
}
}
🛠️ Available Tools
The proxy exposes several MCP tools:
Text Completion
{
"name": "ollama_completion",
"arguments": {
"prompt": "Explain quantum computing",
"model": "llama2",
"temperature": 0.7,
"max_tokens": 500
}
}
Code Completion
{
"name": "code_completion",
"arguments": {
"code": "def factorial(n):",
"language": "python",
"model": "codellama"
}
}
Tool Chaining
{
"name": "tool_chain",
"arguments": {
"tools": [
{"tool": "research", "args": {"topic": "AI ethics"}},
{"tool": "summarize", "args": {"input": "{{previous}}"}}
]
}
}
📚 Resources
MCP resources provide access to:
- Model Information:
/models/{model_name}- Model capabilities and metadata - System Status:
/system/status- Health and performance metrics - Configuration:
/config/current- Current configuration settings - Session Info:
/sessions/{session_id}- Session context and history
🧪 Development
Setup Development Environment
- Install development dependencies:
pip install -e ".[dev]"
- Set up pre-commit hooks:
pre-commit install
- Run in development mode:
python -m ollama_mcp_proxy.server --config config/development.json --debug
Code Quality Tools
- Black: Code formatting
- isort: Import sorting
- flake8: Linting
- mypy: Type checking
# Format code
black src/ tests/
# Sort imports
isort src/ tests/
# Run linting
flake8 src/ tests/
# Type checking
mypy src/
🧪 Testing
The project includes comprehensive testing with pytest:
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=ollama_mcp_proxy --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m load # Load tests only
# Run specific test file
pytest tests/test_auth.py -v
Test Categories
- Unit Tests: Individual component testing with mocked dependencies
- Integration Tests: End-to-end testing with real Ollama integration
- Load Tests: Performance and concurrency testing
Test Configuration
Tests use comprehensive fixtures defined in conftest.py:
- Mock Ollama client with predictable responses
- Sample test data and configurations
- Error scenario simulation
- Async testing support
📊 Performance Features
Caching Strategy
- Response Caching: Intelligent caching with TTL-based expiration
- Model Output Caching: Ollama response caching for repeated queries
- Cache Warming: Proactive cache population for popular models
- Distributed Caching: Redis integration for multi-instance deployments
Memory Management
- Context Window Sliding: Automatic context truncation for long conversations
- Memory Pressure Handling: Automatic cleanup when memory limits are reached
- Session Compression: Zlib compression for inactive sessions
- Garbage Collection: Efficient cleanup of expired sessions
Circuit Breaker
- Fault Tolerance: Automatic failure detection and recovery
- Exponential Backoff: Intelligent retry strategies
- Health Monitoring: Continuous health checking of dependencies
🔒 Security Features
Authentication & Authorization
- API Key Authentication: Secure key-based access control
- OAuth 2.0 Integration: Industry-standard authentication
- Role-Based Access Control: Granular permission management
- JWT Token Support: Stateless authentication with JSON Web Tokens
Data Protection
- Encryption at Rest: AES encryption for stored conversation data
- Request Sanitization: Input validation and sanitization
- Audit Logging: Comprehensive security event logging
- Security Headers: CORS, CSP, and other security headers
Rate Limiting
- Per-User Limits: Individual user rate limiting
- Global Limits: System-wide protection against abuse
- Sliding Window: Advanced rate limiting algorithms
🚀 Production Deployment
Docker Deployment
# Build Docker image
docker build -t ollama-mcp-proxy .
# Run with Docker Compose
docker-compose up -d
Environment Configuration
# Production environment variables
export OLLAMA_MCP_CONFIG=/app/config/production.json
export REDIS_URL=redis://localhost:6379
export LOG_LEVEL=INFO
Monitoring
- Health Endpoints:
/healthand/metricsendpoints - Structured Logging: JSON logs with correlation IDs
- Performance Metrics: Request/response time tracking
- Error Rate Monitoring: Comprehensive error tracking
🤝 Contributing
We welcome contributions! Please see our for details.
Development Workflow
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the test suite
- Submit a pull request
Code Style
- Follow PEP 8 style guidelines
- Use type hints throughout
- Write comprehensive docstrings
- Maintain test coverage above 90%
📄 License
This project is licensed under the MIT License - see the file for details.
🙏 Acknowledgments
- Ollama for providing the local language model API
- Model Context Protocol for the protocol specification
- Anthropic for MCP development and Claude integration
- All contributors who help make this project better
📞 Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Read the Docs
Built with ❤️ for the MCP community