ollama-mcp-proxy by apn-ra - MCP Server

Ollama MCP Proxy

A comprehensive Model Context Protocol (MCP) proxy server that bridges MCP clients with Ollama's local language models, providing advanced features like RAG integration, context management, caching, and production-ready security.

🌟 Features

Core Functionality

MCP Protocol Implementation: Full server-side MCP support with tools, resources, and prompts
Ollama Integration: Seamless connection to local Ollama language models
Multiple Transport Methods: HTTP with Server-Sent Events (SSE) and WebSocket support
Advanced Context Management: Session-based isolation with conversation branching and merging

Advanced AI Capabilities

RAG Integration: Vector-based document retrieval with FAISS and sentence transformers
Knowledge Base Connectivity: Integration with external knowledge sources
Advanced Summarization: Context window management with intelligent summarization
Multi-Model Support: Dynamic model discovery and switching

Performance & Production Features

Intelligent Caching: Multi-tier caching with Redis and local fallback
Circuit Breaker Pattern: Fault tolerance with automatic recovery
Rate Limiting: Configurable request throttling and protection
Streaming Optimization: Efficient real-time response streaming

Security & Authentication

OAuth 2.0 Support: Comprehensive authentication and authorization
Role-Based Access Control (RBAC): Granular permission management
Data Encryption: At-rest encryption for sensitive conversation data
Security Headers: Production-ready security configuration

Developer Experience

Comprehensive Testing: Unit, integration, and load testing suites
Development Tools: Hot reload, profiling, and debugging support
Structured Logging: JSON-formatted logs with correlation IDs
Configuration Management: Environment-based configuration with validation

🏗️ Architecture

┌─────────────────┐    ┌──────────────────────┐    ┌─────────────────┐
│   MCP Client    │    │   Ollama MCP Proxy   │    │  Ollama Server  │
│  (Claude, etc.) │◄──►│                      │◄──►│   (Local AI)    │
└─────────────────┘    └──────────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │  Configuration  │
                       │   & Storage     │
                       └─────────────────┘

Key Components

OllamaMCPServer: Main MCP server implementation with tool and resource handlers
OllamaClient: Robust HTTP client with retry logic and circuit breaker
ContextManager: Sophisticated session management with branching and search
RAG Integration: Vector-based document retrieval and knowledge augmentation
Security Framework: Authentication, authorization, and data protection
Cache System: Multi-level caching with intelligent warming and invalidation

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Ollama installed and running locally
Redis (optional, for distributed caching)

Installation

Clone the repository:

git clone https://github.com/ollama-mcp-proxy/ollama-mcp-proxy.git
cd ollama-mcp-proxy

Create and activate virtual environment:

python -m venv venv
# Windows
venv\Scripts\activate
# Unix/macOS
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Start Ollama (if not already running):

ollama serve

Run the MCP proxy:

python -m ollama_mcp_proxy.server --config config/development.json

Or using the CLI:

ollama-mcp-proxy --config config/development.json

📝 Configuration

The proxy uses JSON configuration files for different environments:

config/development.json - Development settings with debug mode
config/production.json - Production-ready configuration

Key Configuration Sections

{
  "ollama": {
    "host": "localhost",
    "port": 11434,
    "timeout": 30,
    "max_retries": 3
  },
  "mcp": {
    "port": 8000,
    "transport": "http",
    "auth_enabled": false
  },
  "cache": {
    "enabled": true,
    "type": "hybrid",
    "redis": {
      "enabled": true,
      "host": "localhost",
      "port": 6379
    }
  },
  "rag": {
    "enabled": false,
    "vector_store": "faiss",
    "embedding_model": "all-MiniLM-L6-v2"
  }
}

Environment Variables

OLLAMA_HOST - Ollama server host (default: localhost:11434)
MCP_PROXY_PORT - MCP proxy port (default: 8000)
OLLAMA_MCP_CONFIG - Path to configuration file

🔧 Claude Desktop Integration

Add to your Claude Desktop MCP configuration:

{
  "mcpServers": {
    "ollama-proxy": {
      "command": "python",
      "args": ["-m", "ollama_mcp_proxy"],
      "env": {
        "OLLAMA_HOST": "localhost:11434",
        "MCP_PROXY_PORT": "8000"
      }
    }
  }
}

🛠️ Available Tools

The proxy exposes several MCP tools:

Text Completion

{
  "name": "ollama_completion",
  "arguments": {
    "prompt": "Explain quantum computing",
    "model": "llama2",
    "temperature": 0.7,
    "max_tokens": 500
  }
}

Code Completion

{
  "name": "code_completion",
  "arguments": {
    "code": "def factorial(n):",
    "language": "python",
    "model": "codellama"
  }
}

Tool Chaining

{
  "name": "tool_chain",
  "arguments": {
    "tools": [
      {"tool": "research", "args": {"topic": "AI ethics"}},
      {"tool": "summarize", "args": {"input": "{{previous}}"}}
    ]
  }
}

📚 Resources

MCP resources provide access to:

Model Information: /models/{model_name} - Model capabilities and metadata
System Status: /system/status - Health and performance metrics
Configuration: /config/current - Current configuration settings
Session Info: /sessions/{session_id} - Session context and history

🧪 Development

Setup Development Environment

Install development dependencies:

pip install -e ".[dev]"

Set up pre-commit hooks:

pre-commit install

Run in development mode:

python -m ollama_mcp_proxy.server --config config/development.json --debug

Code Quality Tools

Black: Code formatting
isort: Import sorting
flake8: Linting
mypy: Type checking

# Format code
black src/ tests/

# Sort imports
isort src/ tests/

# Run linting
flake8 src/ tests/

# Type checking
mypy src/

🧪 Testing

The project includes comprehensive testing with pytest:

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=ollama_mcp_proxy --cov-report=html

# Run specific test categories
pytest -m unit          # Unit tests only
pytest -m integration   # Integration tests only
pytest -m load          # Load tests only

# Run specific test file
pytest tests/test_auth.py -v

Test Categories

Unit Tests: Individual component testing with mocked dependencies
Integration Tests: End-to-end testing with real Ollama integration
Load Tests: Performance and concurrency testing

Test Configuration

Tests use comprehensive fixtures defined in conftest.py:

Mock Ollama client with predictable responses
Sample test data and configurations
Error scenario simulation
Async testing support

📊 Performance Features

Caching Strategy

Response Caching: Intelligent caching with TTL-based expiration
Model Output Caching: Ollama response caching for repeated queries
Cache Warming: Proactive cache population for popular models
Distributed Caching: Redis integration for multi-instance deployments

Memory Management

Context Window Sliding: Automatic context truncation for long conversations
Memory Pressure Handling: Automatic cleanup when memory limits are reached
Session Compression: Zlib compression for inactive sessions
Garbage Collection: Efficient cleanup of expired sessions

Circuit Breaker

Fault Tolerance: Automatic failure detection and recovery
Exponential Backoff: Intelligent retry strategies
Health Monitoring: Continuous health checking of dependencies

🔒 Security Features

Authentication & Authorization

API Key Authentication: Secure key-based access control
OAuth 2.0 Integration: Industry-standard authentication
Role-Based Access Control: Granular permission management
JWT Token Support: Stateless authentication with JSON Web Tokens

Data Protection

Encryption at Rest: AES encryption for stored conversation data
Request Sanitization: Input validation and sanitization
Audit Logging: Comprehensive security event logging
Security Headers: CORS, CSP, and other security headers

Rate Limiting

Per-User Limits: Individual user rate limiting
Global Limits: System-wide protection against abuse
Sliding Window: Advanced rate limiting algorithms

🚀 Production Deployment

Docker Deployment

# Build Docker image
docker build -t ollama-mcp-proxy .

# Run with Docker Compose
docker-compose up -d

Environment Configuration

# Production environment variables
export OLLAMA_MCP_CONFIG=/app/config/production.json
export REDIS_URL=redis://localhost:6379
export LOG_LEVEL=INFO

Monitoring

Health Endpoints: /health and /metrics endpoints
Structured Logging: JSON logs with correlation IDs
Performance Metrics: Request/response time tracking
Error Rate Monitoring: Comprehensive error tracking

🤝 Contributing

We welcome contributions! Please see our for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes with tests
Run the test suite
Submit a pull request

Code Style

Follow PEP 8 style guidelines
Use type hints throughout
Write comprehensive docstrings
Maintain test coverage above 90%

📄 License

This project is licensed under the MIT License - see the file for details.

🙏 Acknowledgments

Ollama for providing the local language model API
Model Context Protocol for the protocol specification
Anthropic for MCP development and Claude integration
All contributors who help make this project better

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: Read the Docs

Built with ❤️ for the MCP community