Technology RadarTechnology Radar

OpenAI GPT-OSS

openaimoeopen-source
Assess

Full deep dive: GPT-OSS Architecture Breakdown

OpenAI's first open-weight release since GPT-2. Two MoE models — gpt-oss-120b (117B total, 5.1B active) and gpt-oss-20b (21B total, 3.6B active) — that achieve near-parity with o4-mini on reasoning benchmarks while fitting on a single H100 or even consumer hardware. Apache 2.0 license.

Architecture

Both models are decoder-only Transformers with Mixture-of-Experts (MoE):

Property gpt-oss-120b gpt-oss-20b
Total parameters 117B 21B
Active per token 5.1B 3.6B
Context length 128K 128K
Quantization MXFP4 (4-bit) MXFP4 (4-bit)
Min hardware Single H100 (80GB) 16GB consumer GPU

Key architectural choices:

  • Alternating dense + sparse attention — locally banded sparse patterns similar to GPT-3
  • Grouped multi-query attention (group size 8) — reduces memory footprint
  • RoPE positional encoding — standard for modern long-context models
  • MXFP4 quantization — 4-bit inference keeping resource usage low while maintaining quality

Training

Trained using reinforcement learning with techniques from o3 and frontier systems. Mostly English text-only dataset focused on STEM, coding, and general knowledge. Tokenizer: o200k_harmony (open-sourced, superset of o4-mini/GPT-4o tokenizer).

Performance

  • gpt-oss-120b matches or exceeds o4-mini on Codeforces, MMLU, HLE, and TauBench
  • gpt-oss-20b delivers similar results to o3-mini on common benchmarks
  • Both support agentic workflows: tool use, adjustable reasoning effort, full chain-of-thought, Structured Outputs

Deployment

Optimized implementations available for PyTorch + Triton, Metal (Apple Silicon), vLLM, llama.cpp, and ollama. Compatible with OpenAI's Responses API. LLM-agnostic by design — works with any API wrapper.

Why It's in Assess

GPT-OSS is architecturally interesting as OpenAI's first public look at their MoE approach. The extreme efficiency (5.1B active from 117B total) combined with consumer-hardware deployment makes it the most accessible frontier-class open model. However, it's text-only (no vision), and the real question is whether open-weight models from a company that profits from closed models will receive ongoing investment. Assess the MoE efficiency patterns and the MXFP4 quantization approach — both are directly transferable.

Key Characteristics

Property Value
Company OpenAI
Models gpt-oss-120b, gpt-oss-20b
Architecture Sparse MoE with alternating dense/sparse attention
License Apache 2.0
Key innovation Frontier-class reasoning on consumer hardware via extreme sparsity + MXFP4
Sources OpenAI Blog, GitHub, Model Card