GLM-5

open-source llm coding agentic self-hosted

Jun 2026

Trial

GLM-5 is Zhipu AI's (Z.ai) open-source 744B-parameter model released February 13, 2026 — the highest-scoring open-weight model on SWE-bench Verified at time of release (77.8%), trained entirely on Huawei Ascend chips, MIT-licensed, and available via API at significantly lower cost than frontier proprietary models. A post-training upgrade — GLM-5.1 — shipped on April 7, 2026 with the same base architecture and claimed SOTA on SWE-Bench Pro.

Architecture Deep Dive → GLM-5 Architecture Breakdown — 744B sparse MoE design (40B active per token), Slime async RL training framework, Huawei Ascend infrastructure, and benchmark context for interpreting its SWE-bench score.

April 2026 Update — GLM-5.1 Post-Training Release

Z.ai released GLM-5.1 on April 7, 2026 as a post-training refinement of GLM-5 — no new pre-training, same 754B-parameter MoE base (40B active per token), 202,752 token context window, MIT licence, weights on Hugging Face (zai-org/GLM-5.1). The upgrade targets agentic coding and long-horizon task execution.

Reported benchmark gains (per VentureBeat coverage and MarkTechPost):

Benchmark	GLM-5.1	GPT-5.4	Claude Opus 4.6	Gemini 3.1 Pro
SWE-Bench Pro	58.4	57.7	57.3	54.2
Terminal-Bench 2.0 (Terminus-2)	63.5	—	—	—
Terminal-Bench 2.0 (Claude Code harness)	66.5	—	—	—
KernelBench Level 3 (speedup vs PyTorch)	3.6×	—	4.2× (leader)	—
AIME 2026	95.3	—	93.3	—
GPQA-Diamond	86.2	92.8	~91%	—
Humanity's Last Exam	31.0 (52.3 w/ tools)	—	—	—
CyberGym	68.7	—	—	—
MCP-Atlas	71.8	—	—	—
T3-Bench	70.6	—	—	—

A key technical characteristic is GLM-5.1's staircase optimization pattern — periods of incremental tuning within a fixed strategy, punctuated by structural shifts that unlock step-change performance gains. In Z.ai's VectorDBBench test, the model ran 655 iterations and 6,000+ tool calls to reach 21,500 queries/second — approximately 6× the ceiling achievable in a single 50-turn session.

The headline claim is 8-hour autonomous execution without degradation — Z.ai reports the model can sustain a proactive experiment/analyse/optimise loop across hundreds of iterations. Independent long-horizon verification is still pending; treat the 8-hour figure as a vendor claim until an independent eval replicates it. SWE-Bench Pro is a newer, harder benchmark than SWE-bench Verified and gaming risk (contamination, overfitting) is the usual caveat for any top-of-leaderboard claim weeks after release.

Why It's in Trial

GLM-5 is the most significant open-source model release since DeepSeek V3. It closes the gap with frontier proprietary models substantially:

SWE-bench Verified: 77.8% — the highest score among open-weight models at time of release, though SWE-bench is a single benchmark (Python bug-fixing) and does not capture the full range of real-world coding tasks
MIT License — the most permissive open-source licence in AI; commercial use, modification, and redistribution are unrestricted
Fully self-hostable — weights available on Hugging Face; runs on vLLM and SGLang
Trained without NVIDIA GPUs — entirely on Huawei Ascend chips, making it strategically important for organizations with hardware constraints or geopolitical considerations

It sits in Trial rather than Adopt because: the ecosystem around the model (tooling, evals, community integrations) is less mature than GPT or Claude, and inference requirements for a 744B model are substantial for self-hosting.

Architecture

GLM-5 uses a Mixture of Experts (MoE) design:

744B total parameters, but only 40B active per token — dramatically reducing inference cost vs. a dense 744B model
Trained on 28.5 trillion tokens with Huawei's MindSpore framework
Post-trained using "Slime" — an asynchronous RL infrastructure (open-sourced at THUDM/slime on GitHub)
205K token context window

Benchmark Performance

Benchmark	GLM-5	Claude Opus 4.6	GPT-5.4
SWE-bench Verified	77.8%	74%	74.9%
AIME 2026 I	92.7	93.3	—
GPQA Diamond	86.0%	~91%	92.8%
Terminal-Bench 2.0	56.2	59.3	—
BrowseComp	75.9	—	—

Cost Advantage

Provider	Input	Output
GLM-5 via API	$1.00/M	$3.20/M
GLM-5.1 via API	$1.40/M	$4.40/M
Claude Opus 4.6	$5/M	$25/M
GPT-5.4 Standard	$2.50/M	$15/M

GLM-5 is roughly 5× cheaper on input and ~8× cheaper on output than Claude Opus 4.6. GLM-5.1 is ~3.6× cheaper on input and ~5.7× cheaper on output. On SWE-bench Verified GLM-5 scores higher than Opus, but Opus leads on GPQA Diamond and KernelBench (see tables above). Cost comparisons also don't account for ecosystem maturity, tooling support, or self-hosting infrastructure costs for a 754B model.

What "Pony Alpha" Was

Before the official release, GLM-5 circulated on OpenRouter under the codename "Pony Alpha" — a stealth model that attracted attention by topping coding benchmarks. The GLM-5 release confirmed Zhipu AI was behind it.

Key Characteristics

Property	Value
Total parameters	744B (MoE)
Active parameters	40B per token
Context window	205,000 tokens
License	MIT
Trained on	Huawei Ascend chips
Provider	Zhipu AI (Z.ai)
Release date	February 13, 2026
Weights	Hugging Face: zai-org/GLM-5 (349K downloads, 1,874 likes)