OpenAI GPT-OSS

Mar 2026

Assess

Full deep dive: GPT-OSS Architecture Breakdown

OpenAI's first open-weight release since GPT-2. Two MoE models — gpt-oss-120b (117B total, 5.1B active) and gpt-oss-20b (21B total, 3.6B active) — that achieve near-parity with o4-mini on reasoning benchmarks while fitting on a single H100 or even consumer hardware. Apache 2.0 license.

Architecture

Both models are decoder-only Transformers with Mixture-of-Experts (MoE):

Property	gpt-oss-120b	gpt-oss-20b
Total parameters	117B	21B
Active per token	5.1B	3.6B
Context length	128K	128K
Quantization	MXFP4 (4-bit)	MXFP4 (4-bit)
Min hardware	Single H100 (80GB)	16GB consumer GPU

Key architectural choices:

Alternating dense + sparse attention — locally banded sparse patterns similar to GPT-3
Grouped multi-query attention (group size 8) — reduces memory footprint
RoPE positional encoding — standard for modern long-context models
MXFP4 quantization — 4-bit inference keeping resource usage low while maintaining quality

Training

Trained using reinforcement learning with techniques from o3 and frontier systems. Mostly English text-only dataset focused on STEM, coding, and general knowledge. Tokenizer: o200k_harmony (open-sourced, superset of o4-mini/GPT-4o tokenizer).

Performance

gpt-oss-120b matches or exceeds o4-mini on Codeforces, MMLU, HLE, and TauBench
gpt-oss-20b delivers similar results to o3-mini on common benchmarks
Both support agentic workflows: tool use, adjustable reasoning effort, full chain-of-thought, Structured Outputs

Deployment

Optimized implementations available for PyTorch + Triton, Metal (Apple Silicon), vLLM, llama.cpp, and ollama. Compatible with OpenAI's Responses API. LLM-agnostic by design — works with any API wrapper.

Why It's in Assess

GPT-OSS is architecturally interesting as OpenAI's first public look at their MoE approach. The extreme efficiency (5.1B active from 117B total) combined with consumer-hardware deployment makes it the most accessible frontier-class open model. However, it's text-only (no vision), and the real question is whether open-weight models from a company that profits from closed models will receive ongoing investment. Assess the MoE efficiency patterns and the MXFP4 quantization approach — both are directly transferable.

Key Characteristics

Property	Value
Company	OpenAI
Models	gpt-oss-120b, gpt-oss-20b
Architecture	Sparse MoE with alternating dense/sparse attention
License	Apache 2.0
Key innovation	Frontier-class reasoning on consumer hardware via extreme sparsity + MXFP4
Sources	OpenAI Blog, GitHub, Model Card