GLM-5

open-source llm coding agentic self-hosted

Mar 2026

Trial

GLM-5 is Zhipu AI's (Z.ai) open-source 744B-parameter model released February 13, 2026 — the highest-scoring open-weight model on SWE-bench Verified at time of release (77.8%), trained entirely on Huawei Ascend chips, MIT-licensed, and available via API at significantly lower cost than frontier proprietary models.

Architecture Deep Dive → GLM-5 Architecture Breakdown — 744B sparse MoE design (40B active per token), Slime async RL training framework, Huawei Ascend infrastructure, and benchmark context for interpreting its SWE-bench score.

Why It's in Trial

GLM-5 is the most significant open-source model release since DeepSeek V3. It closes the gap with frontier proprietary models substantially:

SWE-bench Verified: 77.8% — the highest score among open-weight models at time of release, though SWE-bench is a single benchmark (Python bug-fixing) and does not capture the full range of real-world coding tasks
MIT License — the most permissive open-source licence in AI; commercial use, modification, and redistribution are unrestricted
Fully self-hostable — weights available on Hugging Face; runs on vLLM and SGLang
Trained without NVIDIA GPUs — entirely on Huawei Ascend chips, making it strategically important for organizations with hardware constraints or geopolitical considerations

It sits in Trial rather than Adopt because: the ecosystem around the model (tooling, evals, community integrations) is less mature than GPT or Claude, and inference requirements for a 744B model are substantial for self-hosting.

Architecture

GLM-5 uses a Mixture of Experts (MoE) design:

744B total parameters, but only 40B active per token — dramatically reducing inference cost vs. a dense 744B model
Trained on 28.5 trillion tokens with Huawei's MindSpore framework
Post-trained using "Slime" — an asynchronous RL infrastructure (open-sourced at THUDM/slime on GitHub)
205K token context window

Benchmark Performance

Benchmark	GLM-5	Claude Opus 4.6	GPT-5.4
SWE-bench Verified	77.8%	74%	74.9%
AIME 2026 I	92.7	93.3	—
GPQA Diamond	86.0%	~91%	92.8%
Terminal-Bench 2.0	56.2	59.3	—
BrowseComp	75.9	—	—

Cost Advantage

Provider	Input	Output
GLM-5 via API (OpenRouter)	~$0.80/M	~$3.20/M
Claude Opus 4.6	$5/M	$25/M
GPT-5.4 Standard	$2.50/M	$15/M

GLM-5 is roughly 6× cheaper on input and 8× cheaper on output than Claude Opus 4.6 via API. On SWE-bench Verified it scores higher than Opus, but trails it on GPQA Diamond and Terminal-Bench (see table above). Cost comparisons also don't account for ecosystem maturity, tooling support, or self-hosting infrastructure costs for a 744B model.

What "Pony Alpha" Was

Before the official release, GLM-5 circulated on OpenRouter under the codename "Pony Alpha" — a stealth model that attracted attention by topping coding benchmarks. The GLM-5 release confirmed Zhipu AI was behind it.

Key Characteristics

Property	Value
Total parameters	744B (MoE)
Active parameters	40B per token
Context window	205,000 tokens
License	MIT
Trained on	Huawei Ascend chips
Provider	Zhipu AI (Z.ai)
Release date	February 13, 2026
Weights	Hugging Face: zai-org/GLM-5 (349K downloads, 1,874 likes)