Technology RadarTechnology Radar
Trial

GLM-5 is Zhipu AI's (Z.ai) open-source 744B-parameter model released February 13, 2026 — the highest-scoring open-weight model on SWE-bench Verified at time of release (77.8%), trained entirely on Huawei Ascend chips, MIT-licensed, and available via API at significantly lower cost than frontier proprietary models.

Architecture Deep Dive → GLM-5 Architecture Breakdown — 744B sparse MoE design (40B active per token), Slime async RL training framework, Huawei Ascend infrastructure, and benchmark context for interpreting its SWE-bench score.

Why It's in Trial

GLM-5 is the most significant open-source model release since DeepSeek V3. It closes the gap with frontier proprietary models substantially:

  • SWE-bench Verified: 77.8% — the highest score among open-weight models at time of release, though SWE-bench is a single benchmark (Python bug-fixing) and does not capture the full range of real-world coding tasks
  • MIT License — the most permissive open-source licence in AI; commercial use, modification, and redistribution are unrestricted
  • Fully self-hostable — weights available on Hugging Face; runs on vLLM and SGLang
  • Trained without NVIDIA GPUs — entirely on Huawei Ascend chips, making it strategically important for organizations with hardware constraints or geopolitical considerations

It sits in Trial rather than Adopt because: the ecosystem around the model (tooling, evals, community integrations) is less mature than GPT or Claude, and inference requirements for a 744B model are substantial for self-hosting.

Architecture

GLM-5 uses a Mixture of Experts (MoE) design:

  • 744B total parameters, but only 40B active per token — dramatically reducing inference cost vs. a dense 744B model
  • Trained on 28.5 trillion tokens with Huawei's MindSpore framework
  • Post-trained using "Slime" — an asynchronous RL infrastructure (open-sourced at THUDM/slime on GitHub)
  • 205K token context window

Benchmark Performance

Benchmark GLM-5 Claude Opus 4.6 GPT-5.4
SWE-bench Verified 77.8% 74% 74.9%
AIME 2026 I 92.7 93.3
GPQA Diamond 86.0% ~91% 92.8%
Terminal-Bench 2.0 56.2 59.3
BrowseComp 75.9

Cost Advantage

Provider Input Output
GLM-5 via API (OpenRouter) ~$0.80/M ~$3.20/M
Claude Opus 4.6 $5/M $25/M
GPT-5.4 Standard $2.50/M $15/M

GLM-5 is roughly 6× cheaper on input and 8× cheaper on output than Claude Opus 4.6 via API. On SWE-bench Verified it scores higher than Opus, but trails it on GPQA Diamond and Terminal-Bench (see table above). Cost comparisons also don't account for ecosystem maturity, tooling support, or self-hosting infrastructure costs for a 744B model.

What "Pony Alpha" Was

Before the official release, GLM-5 circulated on OpenRouter under the codename "Pony Alpha" — a stealth model that attracted attention by topping coding benchmarks. The GLM-5 release confirmed Zhipu AI was behind it.

Key Characteristics

Property Value
Total parameters 744B (MoE)
Active parameters 40B per token
Context window 205,000 tokens
License MIT
Trained on Huawei Ascend chips
Provider Zhipu AI (Z.ai)
Release date February 13, 2026
Weights Hugging Face: zai-org/GLM-5 (349K downloads, 1,874 likes)

Further Reading