Full deep dive: GPT-OSS Architecture Breakdown
OpenAI's first open-weight release since GPT-2. Two MoE models — gpt-oss-120b (117B total, 5.1B active) and gpt-oss-20b (21B total, 3.6B active) — that achieve near-parity with o4-mini on reasoning benchmarks while fitting on a single H100 or even consumer hardware. Apache 2.0 license.
Architecture
Both models are decoder-only Transformers with Mixture-of-Experts (MoE):
| Property | gpt-oss-120b | gpt-oss-20b |
|---|---|---|
| Total parameters | 117B | 21B |
| Active per token | 5.1B | 3.6B |
| Context length | 128K | 128K |
| Quantization | MXFP4 (4-bit) | MXFP4 (4-bit) |
| Min hardware | Single H100 (80GB) | 16GB consumer GPU |
Key architectural choices:
- Alternating dense + sparse attention — locally banded sparse patterns similar to GPT-3
- Grouped multi-query attention (group size 8) — reduces memory footprint
- RoPE positional encoding — standard for modern long-context models
- MXFP4 quantization — 4-bit inference keeping resource usage low while maintaining quality
Training
Trained using reinforcement learning with techniques from o3 and frontier systems. Mostly English text-only dataset focused on STEM, coding, and general knowledge. Tokenizer: o200k_harmony (open-sourced, superset of o4-mini/GPT-4o tokenizer).
Performance
- gpt-oss-120b matches or exceeds o4-mini on Codeforces, MMLU, HLE, and TauBench
- gpt-oss-20b delivers similar results to o3-mini on common benchmarks
- Both support agentic workflows: tool use, adjustable reasoning effort, full chain-of-thought, Structured Outputs
Deployment
Optimized implementations available for PyTorch + Triton, Metal (Apple Silicon), vLLM, llama.cpp, and ollama. Compatible with OpenAI's Responses API. LLM-agnostic by design — works with any API wrapper.
Why It's in Assess
GPT-OSS is architecturally interesting as OpenAI's first public look at their MoE approach. The extreme efficiency (5.1B active from 117B total) combined with consumer-hardware deployment makes it the most accessible frontier-class open model. However, it's text-only (no vision), and the real question is whether open-weight models from a company that profits from closed models will receive ongoing investment. Assess the MoE efficiency patterns and the MXFP4 quantization approach — both are directly transferable.
Key Characteristics
| Property | Value |
|---|---|
| Company | OpenAI |
| Models | gpt-oss-120b, gpt-oss-20b |
| Architecture | Sparse MoE with alternating dense/sparse attention |
| License | Apache 2.0 |
| Key innovation | Frontier-class reasoning on consumer hardware via extreme sparsity + MXFP4 |
| Sources | OpenAI Blog, GitHub, Model Card |