GLM-5 is Zhipu AI's (Z.ai) open-source 744B-parameter model released February 13, 2026 — the highest-scoring open-weight model on SWE-bench Verified at time of release (77.8%), trained entirely on Huawei Ascend chips, MIT-licensed, and available via API at significantly lower cost than frontier proprietary models.
Architecture Deep Dive → GLM-5 Architecture Breakdown — 744B sparse MoE design (40B active per token), Slime async RL training framework, Huawei Ascend infrastructure, and benchmark context for interpreting its SWE-bench score.
Why It's in Trial
GLM-5 is the most significant open-source model release since DeepSeek V3. It closes the gap with frontier proprietary models substantially:
- SWE-bench Verified: 77.8% — the highest score among open-weight models at time of release, though SWE-bench is a single benchmark (Python bug-fixing) and does not capture the full range of real-world coding tasks
- MIT License — the most permissive open-source licence in AI; commercial use, modification, and redistribution are unrestricted
- Fully self-hostable — weights available on Hugging Face; runs on vLLM and SGLang
- Trained without NVIDIA GPUs — entirely on Huawei Ascend chips, making it strategically important for organizations with hardware constraints or geopolitical considerations
It sits in Trial rather than Adopt because: the ecosystem around the model (tooling, evals, community integrations) is less mature than GPT or Claude, and inference requirements for a 744B model are substantial for self-hosting.
Architecture
GLM-5 uses a Mixture of Experts (MoE) design:
- 744B total parameters, but only 40B active per token — dramatically reducing inference cost vs. a dense 744B model
- Trained on 28.5 trillion tokens with Huawei's MindSpore framework
- Post-trained using "Slime" — an asynchronous RL infrastructure (open-sourced at
THUDM/slimeon GitHub) - 205K token context window
Benchmark Performance
| Benchmark | GLM-5 | Claude Opus 4.6 | GPT-5.4 |
|---|---|---|---|
| SWE-bench Verified | 77.8% | 74% | 74.9% |
| AIME 2026 I | 92.7 | 93.3 | — |
| GPQA Diamond | 86.0% | ~91% | 92.8% |
| Terminal-Bench 2.0 | 56.2 | 59.3 | — |
| BrowseComp | 75.9 | — | — |
Cost Advantage
| Provider | Input | Output |
|---|---|---|
| GLM-5 via API (OpenRouter) | ~$0.80/M | ~$3.20/M |
| Claude Opus 4.6 | $5/M | $25/M |
| GPT-5.4 Standard | $2.50/M | $15/M |
GLM-5 is roughly 6× cheaper on input and 8× cheaper on output than Claude Opus 4.6 via API. On SWE-bench Verified it scores higher than Opus, but trails it on GPQA Diamond and Terminal-Bench (see table above). Cost comparisons also don't account for ecosystem maturity, tooling support, or self-hosting infrastructure costs for a 744B model.
What "Pony Alpha" Was
Before the official release, GLM-5 circulated on OpenRouter under the codename "Pony Alpha" — a stealth model that attracted attention by topping coding benchmarks. The GLM-5 release confirmed Zhipu AI was behind it.
Key Characteristics
| Property | Value |
|---|---|
| Total parameters | 744B (MoE) |
| Active parameters | 40B per token |
| Context window | 205,000 tokens |
| License | MIT |
| Trained on | Huawei Ascend chips |
| Provider | Zhipu AI (Z.ai) |
| Release date | February 13, 2026 |
| Weights | Hugging Face: zai-org/GLM-5 (349K downloads, 1,874 likes) |
Further Reading
- GLM-5 GitHub (zai-org)
- GLM-5 paper: "From Vibe Coding to Agentic Engineering" (arXiv)
- VentureBeat: GLM-5 achieves record low hallucination rate
- NVIDIA NIM model card for GLM-5
- GLM-5 Architecture Breakdown — full deep dive into MoE architecture, Slime RL training, and Huawei Ascend training stack