glm-mcp-server

ayounce80/glm-mcp-server

3.3

If you are the rightful owner of glm-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.

The GLM MCP Server integrates Z.ai GLM 4.6 API with Claude Code CLI, enabling enhanced task processing and model comparison.

Tools
4
Resources
0
Prompts
0

GLM MCP Server

MCP server for Z.ai GLM 4.6 API integration with Claude Code CLI. This allows Claude Code to leverage GLM models for parallel processing, offloading tasks, and cross-validation of outputs.

Overview

The GLM MCP Server enables Claude Code to:

  • Offload tasks: Use GLM-4.6 or GLM-4.5-Air for subtasks and agent work
  • Parallel processing: Run GLM alongside Sonnet to compare outputs
  • Cost optimization: Reduce Sonnet token consumption by delegating to GLM
  • Enhanced debugging: Get different perspectives on problems from multiple models

Features

  • GLM-4.6 Chat: 357B MoE flagship model for reasoning, coding, and agentic tasks
  • GLM-4.5-Air: 106B lighter model for faster responses (when available)
  • Reasoning Mode: Extended thinking for complex problems
  • Streaming Support: Real-time response generation
  • Model Auto-detection: Query available GLM models dynamically

Prerequisites

  • Z.ai API Key: Get yours from https://z.ai/manage-apikey/apikey-list
  • Python 3.10+: For running the MCP server
  • Claude Code CLI: Version 2.0+ with MCP support
  • uv: Fast Python package installer (installed during setup)

Setup

1. Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

2. Clone or Download This Project

cd /home/adam/projects/glm

3. Install Dependencies

uv venv
uv pip install -e ".[dev]"

4. Configure Z.ai API Key

# Option 1: Set environment variable (recommended)
export ZAI_API_KEY="your_api_key_here"

# Option 2: Create .env file
cp .env.example .env
# Edit .env and add your API key

5. Add to Claude Code (User Scope)

claude mcp add --scope user --transport stdio glm-server \
  --env ZAI_API_KEY="${ZAI_API_KEY}" -- \
  /home/adam/projects/glm/.venv/bin/python \
  /home/adam/projects/glm/src/glm_server/server.py

6. Verify Installation

claude mcp list

You should see:

glm-server: /home/adam/projects/glm/.venv/bin/python ... - ✓ Connected

Usage in Claude Code

Once configured, Claude Code will automatically have access to GLM tools. You can also explicitly request GLM usage:

Example Prompts

  1. Offload a specific task to GLM:

    Use the glm_chat tool to analyze this code and suggest optimizations
    
  2. Compare outputs:

    Check this implementation yourself, and also ask GLM for a second opinion using glm_chat
    
  3. Use reasoning mode for complex problems:

    Use glm_reasoning to solve this algorithm problem step-by-step
    
  4. Parallel processing:

    Simultaneously (in parallel):
    1. You analyze the bug in module A
    2. Use glm_chat to analyze the bug in module B
    

Available MCP Tools

1. glm_chat

Standard chat completions for general queries, coding, and reasoning.

Parameters:

  • prompt (required): The question or task
  • model (optional): "glm-4.6" (default) or "glm-4.5-air"
  • temperature (optional): 0-1, default 0.7
  • max_tokens (optional): Default 4096

Example:

Use glm_chat with prompt "Explain async/await in Python" and model "glm-4.6"

2. glm_reasoning

Extended thinking mode for complex problems requiring step-by-step analysis.

Parameters:

  • prompt (required): The complex problem to solve
  • model (optional): "glm-4.6" (default) or "glm-4.5-air"
  • temperature (optional): 0-1, default 0.7
  • max_tokens (optional): Default 8192

Example:

Use glm_reasoning to analyze the time complexity of this recursive algorithm

3. glm_stream

Streaming responses for real-time output (useful for long responses).

Parameters:

  • prompt (required): The prompt to send
  • model (optional): "glm-4.6" (default) or "glm-4.5-air"
  • temperature (optional): 0-1, default 0.7

4. list_glm_models

Query Z.ai API for available models and capabilities.

No parameters required

Available MCP Resources

glm://models

Returns JSON with available GLM models and their specifications.

glm://config

Returns current server configuration and feature status.

Testing

Run Smoke Tests

.venv/bin/python -m pytest tests/test_smoke.py -v

Run Integration Tests (requires API key)

export ZAI_API_KEY="your_api_key"
.venv/bin/python -m pytest tests/test_smoke.py -v

Use Cases

1. Parallel Task Processing

Let Sonnet work on one part of a problem while GLM handles another:

I have two bugs to fix. You fix the authentication issue,
and use glm_chat to investigate the database connection problem.

2. Offloading Agent Tasks

Use GLM-4.5-Air (when available) as a faster alternative to Haiku for subtasks:

Use glm_chat to generate 10 test cases for this function

3. Cross-Validation

Get a second opinion from GLM on critical decisions:

Review my database schema design, then use glm_reasoning
to validate the approach and catch any issues I might have missed

4. Cost Optimization

Delegate simpler tasks to GLM to save Sonnet tokens:

Use glm_chat to write the boilerplate code for these 5 CRUD endpoints

Troubleshooting

Server Not Connected

# Check MCP server status
claude mcp list

# Restart Claude Code or manually restart the server
# The server will auto-restart when you start a new conversation

API Key Not Working

# Verify your API key is set
echo $ZAI_API_KEY

# Test the API key directly
curl -H "Authorization: Bearer $ZAI_API_KEY" \
  https://api.z.ai/api/paas/v4/models

Import Errors

# Reinstall dependencies
cd /home/adam/projects/glm
uv pip install -e ".[dev]"

Architecture

Claude Code CLI
    ↓ (stdio MCP)
GLM MCP Server (Python)
    ↓ (OpenAI-compatible SDK)
Z.ai GLM API
    → GLM-4.6 (357B MoE)
    → GLM-4.5-Air (106B MoE)

Development

Project Structure

glm/
├── src/
│   └── glm_server/
│       ├── __init__.py
│       └── server.py          # Main MCP server implementation
├── tests/
│   └── test_smoke.py          # Smoke tests with mocks
├── pyproject.toml             # Python project configuration
├── README.md                  # This file
├── .env.example               # Environment variable template
└── .gitignore                 # Git ignore patterns

Running the Server Standalone

# The server uses stdio transport, so it's meant to be called by Claude Code
# For testing, you can use the MCP inspector (if installed)
python src/glm_server/server.py

Models

GLM-4.6

  • Parameters: 357B MoE (32B active)
  • Context: 200K tokens
  • Max Output: 128K tokens
  • Best for: Reasoning, coding, agentic tasks, tool use

GLM-4.5-Air

  • Parameters: 106B MoE (12B active)
  • Context: 128K tokens
  • Best for: Faster responses, general chat, lighter tasks

Contributing

  1. Run tests before committing:

    .venv/bin/python -m pytest tests/ -v
    
  2. Follow the existing code style

  3. Update tests for new features

License

MIT

Resources