GLM-5 is Zhipu AI's (Z.ai) open-source 744B-parameter model released February 13, 2026 — the highest-scoring open-weight model on SWE-bench Verified at time of release (77.8%), trained entirely on Huawei Ascend chips, MIT-licensed, and available via API at significantly lower cost than frontier proprietary models. A post-training upgrade — GLM-5.1 — shipped on April 7, 2026 with the same base architecture and claimed SOTA on SWE-Bench Pro.
Architecture Deep Dive → GLM-5 Architecture Breakdown — 744B sparse MoE design (40B active per token), Slime async RL training framework, Huawei Ascend infrastructure, and benchmark context for interpreting its SWE-bench score.
April 2026 Update — GLM-5.1 Post-Training Release
Z.ai released GLM-5.1 on April 7, 2026 as a post-training refinement of GLM-5 — no new pre-training, same 754B-parameter MoE base (40B active per token), 202,752 token context window, MIT licence, weights on Hugging Face (zai-org/GLM-5.1). The upgrade targets agentic coding and long-horizon task execution.
Reported benchmark gains (per VentureBeat coverage and MarkTechPost):
| Benchmark | GLM-5.1 | GPT-5.4 | Claude Opus 4.6 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-Bench Pro | 58.4 | 57.7 | 57.3 | 54.2 |
| Terminal-Bench 2.0 (Terminus-2) | 63.5 | — | — | — |
| Terminal-Bench 2.0 (Claude Code harness) | 66.5 | — | — | — |
| KernelBench Level 3 (speedup vs PyTorch) | 3.6× | — | 4.2× (leader) | — |
| AIME 2026 | 95.3 | — | 93.3 | — |
| GPQA-Diamond | 86.2 | 92.8 | ~91% | — |
| Humanity's Last Exam | 31.0 (52.3 w/ tools) | — | — | — |
| CyberGym | 68.7 | — | — | — |
| MCP-Atlas | 71.8 | — | — | — |
| T3-Bench | 70.6 | — | — | — |
A key technical characteristic is GLM-5.1's staircase optimization pattern — periods of incremental tuning within a fixed strategy, punctuated by structural shifts that unlock step-change performance gains. In Z.ai's VectorDBBench test, the model ran 655 iterations and 6,000+ tool calls to reach 21,500 queries/second — approximately 6× the ceiling achievable in a single 50-turn session.
The headline claim is 8-hour autonomous execution without degradation — Z.ai reports the model can sustain a proactive experiment/analyse/optimise loop across hundreds of iterations. Independent long-horizon verification is still pending; treat the 8-hour figure as a vendor claim until an independent eval replicates it. SWE-Bench Pro is a newer, harder benchmark than SWE-bench Verified and gaming risk (contamination, overfitting) is the usual caveat for any top-of-leaderboard claim weeks after release.
Why It's in Trial
GLM-5 is the most significant open-source model release since DeepSeek V3. It closes the gap with frontier proprietary models substantially:
- SWE-bench Verified: 77.8% — the highest score among open-weight models at time of release, though SWE-bench is a single benchmark (Python bug-fixing) and does not capture the full range of real-world coding tasks
- MIT License — the most permissive open-source licence in AI; commercial use, modification, and redistribution are unrestricted
- Fully self-hostable — weights available on Hugging Face; runs on vLLM and SGLang
- Trained without NVIDIA GPUs — entirely on Huawei Ascend chips, making it strategically important for organizations with hardware constraints or geopolitical considerations
It sits in Trial rather than Adopt because: the ecosystem around the model (tooling, evals, community integrations) is less mature than GPT or Claude, and inference requirements for a 744B model are substantial for self-hosting.
Architecture
GLM-5 uses a Mixture of Experts (MoE) design:
- 744B total parameters, but only 40B active per token — dramatically reducing inference cost vs. a dense 744B model
- Trained on 28.5 trillion tokens with Huawei's MindSpore framework
- Post-trained using "Slime" — an asynchronous RL infrastructure (open-sourced at
THUDM/slimeon GitHub) - 205K token context window
Benchmark Performance
| Benchmark | GLM-5 | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
| SWE-bench Verified | 77.8% | 74% | 74.9% |
| AIME 2026 I | 92.7 | 93.3 | — |
| GPQA Diamond | 86.0% | ~91% | 92.8% |
| Terminal-Bench 2.0 | 56.2 | 59.3 | — |
| BrowseComp | 75.9 | — | — |
Cost Advantage
| Provider | Input | Output |
|---|---|---|
| GLM-5 via API | $1.00/M | $3.20/M |
| GLM-5.1 via API | $1.40/M | $4.40/M |
| Claude Opus 4.6 | $5/M | $25/M |
| GPT-5.4 Standard | $2.50/M | $15/M |
GLM-5 is roughly 5× cheaper on input and ~8× cheaper on output than Claude Opus 4.6. GLM-5.1 is ~3.6× cheaper on input and ~5.7× cheaper on output. On SWE-bench Verified GLM-5 scores higher than Opus, but Opus leads on GPQA Diamond and KernelBench (see tables above). Cost comparisons also don't account for ecosystem maturity, tooling support, or self-hosting infrastructure costs for a 754B model.
What "Pony Alpha" Was
Before the official release, GLM-5 circulated on OpenRouter under the codename "Pony Alpha" — a stealth model that attracted attention by topping coding benchmarks. The GLM-5 release confirmed Zhipu AI was behind it.
Key Characteristics
| Property | Value |
|---|---|
| Total parameters | 744B (MoE) |
| Active parameters | 40B per token |
| Context window | 205,000 tokens |
| License | MIT |
| Trained on | Huawei Ascend chips |
| Provider | Zhipu AI (Z.ai) |
| Release date | February 13, 2026 |
| Weights | Hugging Face: zai-org/GLM-5 (349K downloads, 1,874 likes) |
Further Reading
- GLM-5 GitHub (zai-org)
- GLM-5 paper: "From Vibe Coding to Agentic Engineering" (arXiv)
- VentureBeat: GLM-5 achieves record low hallucination rate
- NVIDIA NIM model card for GLM-5
- GLM-5 Architecture Breakdown — full deep dive into MoE architecture, Slime RL training, and Huawei Ascend training stack