glm-mcp-server by ayounce80 - MCP Server

GLM MCP Server

MCP server for Z.ai GLM 4.6 API integration with Claude Code CLI. This allows Claude Code to leverage GLM models for parallel processing, offloading tasks, and cross-validation of outputs.

Overview

The GLM MCP Server enables Claude Code to:

Offload tasks: Use GLM-4.6 or GLM-4.5-Air for subtasks and agent work
Parallel processing: Run GLM alongside Sonnet to compare outputs
Cost optimization: Reduce Sonnet token consumption by delegating to GLM
Enhanced debugging: Get different perspectives on problems from multiple models

Features

GLM-4.6 Chat: 357B MoE flagship model for reasoning, coding, and agentic tasks
GLM-4.5-Air: 106B lighter model for faster responses (when available)
Reasoning Mode: Extended thinking for complex problems
Streaming Support: Real-time response generation
Model Auto-detection: Query available GLM models dynamically

Prerequisites

Z.ai API Key: Get yours from https://z.ai/manage-apikey/apikey-list
Python 3.10+: For running the MCP server
Claude Code CLI: Version 2.0+ with MCP support
uv: Fast Python package installer (installed during setup)

Setup

1. Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

2. Clone or Download This Project

cd /home/adam/projects/glm

3. Install Dependencies

uv venv
uv pip install -e ".[dev]"

4. Configure Z.ai API Key

# Option 1: Set environment variable (recommended)
export ZAI_API_KEY="your_api_key_here"

# Option 2: Create .env file
cp .env.example .env
# Edit .env and add your API key

5. Add to Claude Code (User Scope)

claude mcp add --scope user --transport stdio glm-server \
  --env ZAI_API_KEY="${ZAI_API_KEY}" -- \
  /home/adam/projects/glm/.venv/bin/python \
  /home/adam/projects/glm/src/glm_server/server.py

6. Verify Installation

claude mcp list

You should see:

glm-server: /home/adam/projects/glm/.venv/bin/python ... - ✓ Connected

Usage in Claude Code

Once configured, Claude Code will automatically have access to GLM tools. You can also explicitly request GLM usage:

Example Prompts

Offload a specific task to GLM:

Use the glm_chat tool to analyze this code and suggest optimizations

Compare outputs:

Check this implementation yourself, and also ask GLM for a second opinion using glm_chat

Use reasoning mode for complex problems:

Use glm_reasoning to solve this algorithm problem step-by-step

Parallel processing:

Simultaneously (in parallel):
1. You analyze the bug in module A
2. Use glm_chat to analyze the bug in module B

Available MCP Tools

1. `glm_chat`

Standard chat completions for general queries, coding, and reasoning.

Parameters:

prompt (required): The question or task
model (optional): "glm-4.6" (default) or "glm-4.5-air"
temperature (optional): 0-1, default 0.7
max_tokens (optional): Default 4096

Example:

Use glm_chat with prompt "Explain async/await in Python" and model "glm-4.6"

2. `glm_reasoning`

Extended thinking mode for complex problems requiring step-by-step analysis.

Parameters:

prompt (required): The complex problem to solve
model (optional): "glm-4.6" (default) or "glm-4.5-air"
temperature (optional): 0-1, default 0.7
max_tokens (optional): Default 8192

Example:

Use glm_reasoning to analyze the time complexity of this recursive algorithm

3. `glm_stream`

Streaming responses for real-time output (useful for long responses).

Parameters:

prompt (required): The prompt to send
model (optional): "glm-4.6" (default) or "glm-4.5-air"
temperature (optional): 0-1, default 0.7

4. `list_glm_models`

Query Z.ai API for available models and capabilities.

No parameters required

Available MCP Resources

`glm://models`

Returns JSON with available GLM models and their specifications.

`glm://config`

Returns current server configuration and feature status.

Testing

Run Smoke Tests

.venv/bin/python -m pytest tests/test_smoke.py -v

Run Integration Tests (requires API key)

export ZAI_API_KEY="your_api_key"
.venv/bin/python -m pytest tests/test_smoke.py -v

Use Cases

1. Parallel Task Processing

Let Sonnet work on one part of a problem while GLM handles another:

I have two bugs to fix. You fix the authentication issue,
and use glm_chat to investigate the database connection problem.

2. Offloading Agent Tasks

Use GLM-4.5-Air (when available) as a faster alternative to Haiku for subtasks:

Use glm_chat to generate 10 test cases for this function

3. Cross-Validation

Get a second opinion from GLM on critical decisions:

Review my database schema design, then use glm_reasoning
to validate the approach and catch any issues I might have missed

4. Cost Optimization

Delegate simpler tasks to GLM to save Sonnet tokens:

Use glm_chat to write the boilerplate code for these 5 CRUD endpoints

Troubleshooting

Server Not Connected

# Check MCP server status
claude mcp list

# Restart Claude Code or manually restart the server
# The server will auto-restart when you start a new conversation

API Key Not Working

# Verify your API key is set
echo $ZAI_API_KEY

# Test the API key directly
curl -H "Authorization: Bearer $ZAI_API_KEY" \
  https://api.z.ai/api/paas/v4/models

Import Errors

# Reinstall dependencies
cd /home/adam/projects/glm
uv pip install -e ".[dev]"

Architecture

Claude Code CLI
    ↓ (stdio MCP)
GLM MCP Server (Python)
    ↓ (OpenAI-compatible SDK)
Z.ai GLM API
    → GLM-4.6 (357B MoE)
    → GLM-4.5-Air (106B MoE)

Development

Project Structure

glm/
├── src/
│   └── glm_server/
│       ├── __init__.py
│       └── server.py          # Main MCP server implementation
├── tests/
│   └── test_smoke.py          # Smoke tests with mocks
├── pyproject.toml             # Python project configuration
├── README.md                  # This file
├── .env.example               # Environment variable template
└── .gitignore                 # Git ignore patterns

Running the Server Standalone

# The server uses stdio transport, so it's meant to be called by Claude Code
# For testing, you can use the MCP inspector (if installed)
python src/glm_server/server.py

Models

GLM-4.6

Parameters: 357B MoE (32B active)
Context: 200K tokens
Max Output: 128K tokens
Best for: Reasoning, coding, agentic tasks, tool use

GLM-4.5-Air

Parameters: 106B MoE (12B active)
Context: 128K tokens
Best for: Faster responses, general chat, lighter tasks

Contributing

Run tests before committing:
```
.venv/bin/python -m pytest tests/ -v
```
Follow the existing code style
Update tests for new features

License

MIT

Resources

Z.ai Platform: https://z.ai
Z.ai API Docs: https://docs.z.ai
MCP Documentation: https://docs.claude.com/en/docs/claude-code/mcp
Claude Code: https://code.claude.com

ayounce80/glm-mcp-server

GLM MCP Server

Overview

Features

Prerequisites

Setup

1. Install uv (if not already installed)

2. Clone or Download This Project

3. Install Dependencies

4. Configure Z.ai API Key

5. Add to Claude Code (User Scope)

6. Verify Installation

Usage in Claude Code

Example Prompts

Available MCP Tools

1. glm_chat

2. glm_reasoning

3. glm_stream

4. list_glm_models

Available MCP Resources

glm://models

glm://config

Testing

Run Smoke Tests

Run Integration Tests (requires API key)

Use Cases

1. Parallel Task Processing

2. Offloading Agent Tasks

3. Cross-Validation

4. Cost Optimization

Troubleshooting

Server Not Connected

API Key Not Working

Import Errors

Architecture

Development

Project Structure

Running the Server Standalone

Models

GLM-4.6

GLM-4.5-Air

Contributing

License

Resources

1. `glm_chat`

2. `glm_reasoning`

3. `glm_stream`

4. `list_glm_models`

`glm://models`

`glm://config`