GPT-5.5

frontier llm coding agentic reasoning multimodal

Jun 2026

Trial

GPT-5.5, released April 23, 2026, is the first fully retrained OpenAI base model since GPT-4.5 — a ground-up architecture rebuild with native unified multimodal support (text, image, audio, video), co-designed with NVIDIA's GB200/GB300 NVL72 rack-scale systems, and stronger agentic benchmark performance than its predecessor.

Why It's in Trial

GPT-5.5 enters the radar in Trial because most benchmark scores are self-reported by OpenAI. SWE-bench Verified secondary-source consensus has converged to 82.6% across three independent trackers (vals.ai, llm-stats.com, BenchLM) — lower than Claude Opus 4.7's 87.6% on the same benchmark. SWE-bench Pro at 58.6% is confirmed via Scale AI's leaderboard context and is lower than Claude Opus 4.7 (64.3%), suggesting specialization: GPT-5.5 leads on agentic terminal tasks (Terminal-Bench 2.0: 82.7%) while Opus 4.7 leads on complex real-world coding. Move to Adopt once SWE-bench Verified is independently reproduced and LMSYS Arena rankings are confirmed.

Key upgrade signals from GPT-5.4:

Architecture rebuilt from scratch — not an incremental fine-tune; the first full pretraining overhaul since GPT-4.5
Unified multimodal pipeline — text, images, audio, and video processed end-to-end in a single model (GPT-5.4 had modality-specific sub-systems)
NVIDIA silicon co-design — optimised for GB200 and GB300 NVL72 at the hardware level; expected to benefit latency and cost at scale
Higher token efficiency — OpenAI reports fewer tokens needed for equivalent quality vs. GPT-5.4, partially offsetting the 2× price increase on the standard tier
Stronger agentic task completion — OSWorld-Verified (computer use) up to 78.7%; Tau2-bench Telecom at 98.0%

Performance (Self-Reported)

Benchmark	Score	Notes
Terminal-Bench 2.0	82.7%	Agentic terminal task completion
FrontierMath (Lvl 1–3)	51.7%	Graduate-level mathematics
FrontierMath (Lvl 4)	35.4%	Olympiad/research-level maths
GDPval	84.9%	Complex knowledge work across 44 occupations
OSWorld-Verified	78.7%	Real computer environment operation
Tau2-bench Telecom	98.0%	Complex customer-service agent workflows
SWE-bench Verified	82.6%	Secondary sources only; 3 independent trackers converge; primary not yet confirmed
SWE-bench Pro	58.6%	Lower than Claude Opus 4.7 (64.3%)

All scores from OpenAI's April 23 announcement except SWE-bench Verified, which is reported by secondary sources. Three independent trackers now converge on 82.6%: vals.ai, llm-stats.com, and BenchLM. tokenmix.ai's 88.7% figure remains an outlier — likely a different evaluation harness or unverified self-report. Independent reproduction still pending. SWE-bench Pro score sourced from Scale AI leaderboard context.

GPT-5.5 Instant (May 5, 2026)

OpenAI released GPT-5.5 Instant on May 5, 2026, replacing GPT-5.3 Instant as the default ChatGPT model for all users. It is available in the API as chat-latest. Key improvements over GPT-5.3 Instant:

52.5% fewer hallucinated claims on high-stakes prompts (medicine, law, finance)
30.2% fewer words and 29.2% fewer lines — more concise, practical tone with fewer unsolicited emoji
Enhanced personalization from past chats, files, and connected Gmail (rolling out to Plus/Pro first)

GPT-5.5 Instant is a separate, smaller model tuned for speed and conversation — distinct from the flagship GPT-5.5 covered in this entry. Paid users retain access to GPT-5.3 Instant for three months before it is retired.

Sources: OpenAI announcement, TechCrunch

Pricing

Tier	Input	Cached Input	Output
Standard API	$5.00/M	$0.50/M	$30.00/M

GPT-5.5 Pro pricing not yet confirmed in accessible sources.

Context Window

1M tokens via API (same as GPT-5.4)
400K tokens in Codex (reduced vs. GPT-5.4's full 1M in Codex)

Relationship to GPT-5.4

GPT-5.5 supersedes GPT-5.4 as OpenAI's flagship model. GPT-5.4 remains available and is the safer choice for workloads where:

Budget is constrained (GPT-5.4 is 2× cheaper at standard tier)
Independent benchmark validation is required before adoption
Codex integrations depend on full 1M-token context (GPT-5.4 has 1M in Codex; GPT-5.5 caps at 400K)

Key Characteristics

Property	Value
License	Proprietary
Provider	OpenAI
Context window	1,000,000 tokens (API) / 400,000 tokens (Codex)
Pricing	$5.00/M input, $30.00/M output
API model ID	`gpt-5.5`
Release date	April 23, 2026
Modalities	Text, image, audio, video

GPT-5.5 Instant (May 5, 2026)

OpenAI released GPT-5.5 Instant on May 5, 2026, as the new default model for ChatGPT, replacing GPT-5.3 Instant. It is an inference-optimised variant of GPT-5.5 targeting lower-latency chat use cases.

Property	Value
ChatGPT default	Yes (replaces GPT-5.3 Instant from May 5)
API alias	`chat-latest`
Context window	400K tokens
Pricing	$5.00/M input, $30.00/M output (batch/flex: $2.50/$15.00)

Key quality improvements over GPT-5.3 Instant: 52.5% fewer hallucinated claims on high-stakes prompts (medicine, law, finance), and 37.3% fewer inaccurate claims on user-flagged challenging conversations. The chat-latest alias routes to whichever Instant model currently powers ChatGPT; the previous model remains addressable for paying API customers for 90 days after each swap.

Sources: OpenAI — GPT-5.5 Instant, TechCrunch

GPT-5.5

Why It's in Trial

Performance (Self-Reported)

GPT-5.5 Instant (May 5, 2026)

Pricing

Context Window

Relationship to GPT-5.4

Key Characteristics

GPT-5.5 Instant (May 5, 2026)

Further Reading