Technology RadarTechnology Radar
Trial

GLM-5 is Zhipu AI's (Z.ai) open-source 744B-parameter model released February 13, 2026 — the highest-scoring open-weight model on SWE-bench Verified at time of release (77.8%), trained entirely on Huawei Ascend chips, MIT-licensed, and available via API at significantly lower cost than frontier proprietary models. A post-training upgrade — GLM-5.1 — shipped on April 7, 2026 with the same base architecture and claimed SOTA on SWE-Bench Pro.

Architecture Deep Dive → GLM-5 Architecture Breakdown — 744B sparse MoE design (40B active per token), Slime async RL training framework, Huawei Ascend infrastructure, and benchmark context for interpreting its SWE-bench score.

April 2026 Update — GLM-5.1 Post-Training Release

Z.ai released GLM-5.1 on April 7, 2026 as a post-training refinement of GLM-5 — no new pre-training, same 754B-parameter MoE base (40B active per token), 202,752 token context window, MIT licence, weights on Hugging Face (zai-org/GLM-5.1). The upgrade targets agentic coding and long-horizon task execution.

Reported benchmark gains (per VentureBeat coverage and MarkTechPost):

Benchmark GLM-5.1 GPT-5.4 Claude Opus 4.6 Gemini 3.1 Pro
SWE-Bench Pro 58.4 57.7 57.3 54.2
Terminal-Bench 2.0 (Terminus-2) 63.5
Terminal-Bench 2.0 (Claude Code harness) 66.5
KernelBench Level 3 (speedup vs PyTorch) 3.6× 4.2× (leader)
AIME 2026 95.3 93.3
GPQA-Diamond 86.2 92.8 ~91%
Humanity's Last Exam 31.0 (52.3 w/ tools)
CyberGym 68.7
MCP-Atlas 71.8
T3-Bench 70.6

A key technical characteristic is GLM-5.1's staircase optimization pattern — periods of incremental tuning within a fixed strategy, punctuated by structural shifts that unlock step-change performance gains. In Z.ai's VectorDBBench test, the model ran 655 iterations and 6,000+ tool calls to reach 21,500 queries/second — approximately 6× the ceiling achievable in a single 50-turn session.

The headline claim is 8-hour autonomous execution without degradation — Z.ai reports the model can sustain a proactive experiment/analyse/optimise loop across hundreds of iterations. Independent long-horizon verification is still pending; treat the 8-hour figure as a vendor claim until an independent eval replicates it. SWE-Bench Pro is a newer, harder benchmark than SWE-bench Verified and gaming risk (contamination, overfitting) is the usual caveat for any top-of-leaderboard claim weeks after release.

Why It's in Trial

GLM-5 is the most significant open-source model release since DeepSeek V3. It closes the gap with frontier proprietary models substantially:

  • SWE-bench Verified: 77.8% — the highest score among open-weight models at time of release, though SWE-bench is a single benchmark (Python bug-fixing) and does not capture the full range of real-world coding tasks
  • MIT License — the most permissive open-source licence in AI; commercial use, modification, and redistribution are unrestricted
  • Fully self-hostable — weights available on Hugging Face; runs on vLLM and SGLang
  • Trained without NVIDIA GPUs — entirely on Huawei Ascend chips, making it strategically important for organizations with hardware constraints or geopolitical considerations

It sits in Trial rather than Adopt because: the ecosystem around the model (tooling, evals, community integrations) is less mature than GPT or Claude, and inference requirements for a 744B model are substantial for self-hosting.

Architecture

GLM-5 uses a Mixture of Experts (MoE) design:

  • 744B total parameters, but only 40B active per token — dramatically reducing inference cost vs. a dense 744B model
  • Trained on 28.5 trillion tokens with Huawei's MindSpore framework
  • Post-trained using "Slime" — an asynchronous RL infrastructure (open-sourced at THUDM/slime on GitHub)
  • 205K token context window

Benchmark Performance

Benchmark GLM-5 Claude Opus 4.6 GPT-5.4
SWE-bench Verified 77.8% 74% 74.9%
AIME 2026 I 92.7 93.3
GPQA Diamond 86.0% ~91% 92.8%
Terminal-Bench 2.0 56.2 59.3
BrowseComp 75.9

Cost Advantage

Provider Input Output
GLM-5 via API $1.00/M $3.20/M
GLM-5.1 via API $1.40/M $4.40/M
Claude Opus 4.6 $5/M $25/M
GPT-5.4 Standard $2.50/M $15/M

GLM-5 is roughly 5× cheaper on input and ~8× cheaper on output than Claude Opus 4.6. GLM-5.1 is ~3.6× cheaper on input and ~5.7× cheaper on output. On SWE-bench Verified GLM-5 scores higher than Opus, but Opus leads on GPQA Diamond and KernelBench (see tables above). Cost comparisons also don't account for ecosystem maturity, tooling support, or self-hosting infrastructure costs for a 754B model.

What "Pony Alpha" Was

Before the official release, GLM-5 circulated on OpenRouter under the codename "Pony Alpha" — a stealth model that attracted attention by topping coding benchmarks. The GLM-5 release confirmed Zhipu AI was behind it.

Key Characteristics

Property Value
Total parameters 744B (MoE)
Active parameters 40B per token
Context window 205,000 tokens
License MIT
Trained on Huawei Ascend chips
Provider Zhipu AI (Z.ai)
Release date February 13, 2026
Weights Hugging Face: zai-org/GLM-5 (349K downloads, 1,874 likes)

Further Reading