GPT-5.4

Mar 2026

Assess

Full deep dive: GPT-5.4 Architecture Breakdown

OpenAI's unified frontier model that consolidates separate coding (GPT-5.3-Codex) and reasoning (GPT-5.2) lines into a single inference pipeline. First mainline model with native computer use, context compaction, and parameterized reasoning effort (none → xhigh).

Why It Matters

GPT-5.4 represents a philosophical shift: instead of maintaining separate specialized models, OpenAI collapsed everything into one set of weights with a reasoning.effort parameter that controls compute allocation per request. This is the opposite of the "many specialized models" trend.

Key Architecture Decisions

Decision	Detail
Unified weights	Coding + reasoning in one model — not an ensemble or router between models
Reasoning effort	5 levels (none → xhigh) control chain-of-thought depth per request
Context compaction	Natively trained to summarize/prune its own context, enabling multi-hour agent loops
CoT transfer	Reasoning tokens pass between turns via `previous_response_id`, avoiding re-reasoning
Computer use	Native screen interaction — 75% OSWorld (above 72.4% human baseline)
Tool search	Finds relevant tools across large ecosystems without sacrificing accuracy
Context window	1.05M input / 128K output tokens

When to Use

Use GPT-5.4 when you need breadth across coding, reasoning, computer use, and tool orchestration in a single model
Use Claude Opus 4.6 for pure software engineering tasks — still holds SWE-bench crown at 80.9%
Use GPT-5.4 mini/nano for latency-sensitive workloads that don't need frontier reasoning depth

Why It's in Assess

The unified inference pipeline is architecturally significant — it simplifies model selection and enables novel agent patterns (dynamic reasoning depth per step). But OpenAI hasn't published internal architecture details (MoE config, parameter counts), so the "how" remains partially opaque. Worth studying the reasoning effort system and context compaction for anyone building long-running agents.

Key Characteristics

Property	Value
Company	OpenAI
Release	March 13, 2026
Context	1.05M input / 128K output
Key innovations	Unified inference pipeline, context compaction, parameterized reasoning, native computer use
SWE-Bench Pro	57.7% (xhigh mode)
OSWorld	75.0% (above human baseline)
Token efficiency	47% fewer tokens vs GPT-5.2 on complex tasks
Sources	OpenAI Blog, API Guide