mcp-ollama

MikeyBeez/mcp-ollama

3.2

If you are the rightful owner of mcp-ollama and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to henry@mcphub.com.

The MCP Ollama Server provides direct access to Ollama models for AI inference, enabling seamless interaction and management of AI models.

MCP Ollama Server

A Model Context Protocol (MCP) server that provides direct access to Ollama models for AI inference.

Features

  • 🚀 Direct Model Access: Generate responses and chat with any Ollama model
  • 💬 Chat Support: Maintain conversation context with chat endpoints
  • 📋 Model Management: List, pull, delete, and get info about models
  • 🔢 Embeddings: Generate text embeddings for semantic search
  • 🔧 Full Control: Configure temperature, max tokens, and system prompts
  • Status Checking: Automatic Ollama availability detection

Installation

  1. Prerequisites:

    • Ollama installed and running
    • Node.js 18+ installed
  2. Install the MCP server:

    cd /Users/bard/Code/mcp-ollama
    npm install
    npm run build
    
  3. Add to Claude Desktop config: Edit ~/Library/Application Support/Claude/claude_desktop_config.json:

    {
      "mcpServers": {
        "ollama": {
          "command": "node",
          "args": ["/Users/bard/Code/mcp-ollama/dist/index.js"],
          "env": {
            "OLLAMA_BASE_URL": "http://localhost:11434"
          }
        }
      }
    }
    
  4. Restart Claude Desktop

Usage

Generate Text

// Simple generation
ollama_generate({
  prompt: "What is the meaning of life?"
})

// With system prompt and parameters
ollama_generate({
  model: "llama3.2",
  prompt: "Write a haiku about coding",
  system: "You are a creative poet",
  temperature: 0.9,
  max_tokens: 100
})

Chat Conversations

// Multi-turn conversation
ollama_chat({
  model: "llama3.2",
  messages: [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "What is Python?" },
    { role: "assistant", content: "Python is a high-level programming language..." },
    { role: "user", content: "What makes it good for beginners?" }
  ]
})

Model Management

// List available models
ollama_list()

// Pull a new model
ollama_pull({ model: "mistral" })

// Get model information
ollama_info({ model: "llama3.2" })

// Delete a model
ollama_delete({ model: "old-model" })

Generate Embeddings

// Generate embeddings for semantic search
ollama_embeddings({
  model: "nomic-embed-text",
  prompt: "The quick brown fox jumps over the lazy dog"
})

Available Models

Popular models you can use:

  • llama3.2 - Fast, efficient general-purpose model
  • deepseek-r1 - Advanced reasoning model
  • mistral - Efficient 7B parameter model
  • gemma:2b - Google's small efficient model
  • phi3:mini - Microsoft's compact model
  • nomic-embed-text - For generating embeddings

Pull any model with:

ollama pull <model-name>

Configuration

Environment Variables

  • OLLAMA_BASE_URL: Ollama API endpoint (default: http://localhost:11434)

Tool Parameters

ollama_generate
  • model: Model to use (default: "llama3.2")
  • prompt: Input prompt (required)
  • system: System prompt (optional)
  • temperature: Sampling temperature 0-1 (default: 0.7)
  • max_tokens: Maximum tokens to generate (default: 2048)
  • stream: Stream responses (default: false)
ollama_chat
  • model: Model to use (default: "llama3.2")
  • messages: Array of chat messages (required)
  • temperature: Sampling temperature 0-1 (default: 0.7)
  • max_tokens: Maximum tokens to generate (default: 2048)

Troubleshooting

Ollama not running

If you see "❌ Ollama is not running", start Ollama:

ollama serve

No models available

Pull a model first:

ollama pull llama3.2

Different Ollama port

If Ollama runs on a different port, update the config:

{
  "env": {
    "OLLAMA_BASE_URL": "http://localhost:YOUR_PORT"
  }
}

Differences from ELVIS

This MCP server provides direct, synchronous access to Ollama models, unlike ELVIS which uses a delegation/queue pattern. Benefits:

  • Immediate responses: No waiting for task completion
  • Simpler API: Direct function calls instead of task management
  • Native chat support: Built-in conversation handling
  • Model management: Pull, delete, and inspect models
  • Embeddings support: Generate embeddings for RAG applications

Development

Running in development:

npm run dev

Building:

npm run build

Testing:

# Test generate
curl -X POST http://localhost:11434/api/generate \
  -d '{"model": "llama3.2", "prompt": "Hello"}'

# Test chat
curl -X POST http://localhost:11434/api/chat \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'

License

MIT