ayounce80/glm-mcp-server
If you are the rightful owner of glm-mcp-server and would like to certify it and/or have it hosted online, please leave a comment on the right or send an email to dayong@mcphub.com.
The GLM MCP Server integrates Z.ai GLM 4.6 API with Claude Code CLI, enabling enhanced task processing and model comparison.
GLM MCP Server
MCP server for Z.ai GLM 4.6 API integration with Claude Code CLI. This allows Claude Code to leverage GLM models for parallel processing, offloading tasks, and cross-validation of outputs.
Overview
The GLM MCP Server enables Claude Code to:
- Offload tasks: Use GLM-4.6 or GLM-4.5-Air for subtasks and agent work
- Parallel processing: Run GLM alongside Sonnet to compare outputs
- Cost optimization: Reduce Sonnet token consumption by delegating to GLM
- Enhanced debugging: Get different perspectives on problems from multiple models
Features
- GLM-4.6 Chat: 357B MoE flagship model for reasoning, coding, and agentic tasks
- GLM-4.5-Air: 106B lighter model for faster responses (when available)
- Reasoning Mode: Extended thinking for complex problems
- Streaming Support: Real-time response generation
- Model Auto-detection: Query available GLM models dynamically
Prerequisites
- Z.ai API Key: Get yours from https://z.ai/manage-apikey/apikey-list
- Python 3.10+: For running the MCP server
- Claude Code CLI: Version 2.0+ with MCP support
- uv: Fast Python package installer (installed during setup)
Setup
1. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
2. Clone or Download This Project
cd /home/adam/projects/glm
3. Install Dependencies
uv venv
uv pip install -e ".[dev]"
4. Configure Z.ai API Key
# Option 1: Set environment variable (recommended)
export ZAI_API_KEY="your_api_key_here"
# Option 2: Create .env file
cp .env.example .env
# Edit .env and add your API key
5. Add to Claude Code (User Scope)
claude mcp add --scope user --transport stdio glm-server \
--env ZAI_API_KEY="${ZAI_API_KEY}" -- \
/home/adam/projects/glm/.venv/bin/python \
/home/adam/projects/glm/src/glm_server/server.py
6. Verify Installation
claude mcp list
You should see:
glm-server: /home/adam/projects/glm/.venv/bin/python ... - ✓ Connected
Usage in Claude Code
Once configured, Claude Code will automatically have access to GLM tools. You can also explicitly request GLM usage:
Example Prompts
-
Offload a specific task to GLM:
Use the glm_chat tool to analyze this code and suggest optimizations -
Compare outputs:
Check this implementation yourself, and also ask GLM for a second opinion using glm_chat -
Use reasoning mode for complex problems:
Use glm_reasoning to solve this algorithm problem step-by-step -
Parallel processing:
Simultaneously (in parallel): 1. You analyze the bug in module A 2. Use glm_chat to analyze the bug in module B
Available MCP Tools
1. glm_chat
Standard chat completions for general queries, coding, and reasoning.
Parameters:
prompt(required): The question or taskmodel(optional): "glm-4.6" (default) or "glm-4.5-air"temperature(optional): 0-1, default 0.7max_tokens(optional): Default 4096
Example:
Use glm_chat with prompt "Explain async/await in Python" and model "glm-4.6"
2. glm_reasoning
Extended thinking mode for complex problems requiring step-by-step analysis.
Parameters:
prompt(required): The complex problem to solvemodel(optional): "glm-4.6" (default) or "glm-4.5-air"temperature(optional): 0-1, default 0.7max_tokens(optional): Default 8192
Example:
Use glm_reasoning to analyze the time complexity of this recursive algorithm
3. glm_stream
Streaming responses for real-time output (useful for long responses).
Parameters:
prompt(required): The prompt to sendmodel(optional): "glm-4.6" (default) or "glm-4.5-air"temperature(optional): 0-1, default 0.7
4. list_glm_models
Query Z.ai API for available models and capabilities.
No parameters required
Available MCP Resources
glm://models
Returns JSON with available GLM models and their specifications.
glm://config
Returns current server configuration and feature status.
Testing
Run Smoke Tests
.venv/bin/python -m pytest tests/test_smoke.py -v
Run Integration Tests (requires API key)
export ZAI_API_KEY="your_api_key"
.venv/bin/python -m pytest tests/test_smoke.py -v
Use Cases
1. Parallel Task Processing
Let Sonnet work on one part of a problem while GLM handles another:
I have two bugs to fix. You fix the authentication issue,
and use glm_chat to investigate the database connection problem.
2. Offloading Agent Tasks
Use GLM-4.5-Air (when available) as a faster alternative to Haiku for subtasks:
Use glm_chat to generate 10 test cases for this function
3. Cross-Validation
Get a second opinion from GLM on critical decisions:
Review my database schema design, then use glm_reasoning
to validate the approach and catch any issues I might have missed
4. Cost Optimization
Delegate simpler tasks to GLM to save Sonnet tokens:
Use glm_chat to write the boilerplate code for these 5 CRUD endpoints
Troubleshooting
Server Not Connected
# Check MCP server status
claude mcp list
# Restart Claude Code or manually restart the server
# The server will auto-restart when you start a new conversation
API Key Not Working
# Verify your API key is set
echo $ZAI_API_KEY
# Test the API key directly
curl -H "Authorization: Bearer $ZAI_API_KEY" \
https://api.z.ai/api/paas/v4/models
Import Errors
# Reinstall dependencies
cd /home/adam/projects/glm
uv pip install -e ".[dev]"
Architecture
Claude Code CLI
↓ (stdio MCP)
GLM MCP Server (Python)
↓ (OpenAI-compatible SDK)
Z.ai GLM API
→ GLM-4.6 (357B MoE)
→ GLM-4.5-Air (106B MoE)
Development
Project Structure
glm/
├── src/
│ └── glm_server/
│ ├── __init__.py
│ └── server.py # Main MCP server implementation
├── tests/
│ └── test_smoke.py # Smoke tests with mocks
├── pyproject.toml # Python project configuration
├── README.md # This file
├── .env.example # Environment variable template
└── .gitignore # Git ignore patterns
Running the Server Standalone
# The server uses stdio transport, so it's meant to be called by Claude Code
# For testing, you can use the MCP inspector (if installed)
python src/glm_server/server.py
Models
GLM-4.6
- Parameters: 357B MoE (32B active)
- Context: 200K tokens
- Max Output: 128K tokens
- Best for: Reasoning, coding, agentic tasks, tool use
GLM-4.5-Air
- Parameters: 106B MoE (12B active)
- Context: 128K tokens
- Best for: Faster responses, general chat, lighter tasks
Contributing
-
Run tests before committing:
.venv/bin/python -m pytest tests/ -v -
Follow the existing code style
-
Update tests for new features
License
MIT
Resources
- Z.ai Platform: https://z.ai
- Z.ai API Docs: https://docs.z.ai
- MCP Documentation: https://docs.claude.com/en/docs/claude-code/mcp
- Claude Code: https://code.claude.com