Technology RadarTechnology Radar
Trial

Llama 4 is Meta's latest open-weight model family, released April 2025. The Maverick variant (400B MoE, 17B active) offers frontier-competitive performance at dramatically lower cost — the strongest case yet for open-weight models in production.

Why It's in Trial

Llama 4 represents a significant leap for open-weight models:

  • Llama 4 Scout: 17B active / 109B total (16 experts), 10M context window — the longest of any open-weight model
  • Llama 4 Maverick: 17B active / 400B total (128 experts), 1M context — the performance leader
  • Llama 4 Behemoth: 288B active / ~2T total — announced but not publicly released

Maverick beats GPT-4o on MMMU (73.4% vs 69.1%) at roughly 1/9th the cost. Inference speeds are impressive — up to 700+ tokens/sec on optimized providers.

Why Not Adopt?

  • SWE-bench scores don't lead — proprietary models (Claude, GPT-5.4, Grok 4.2) still dominate on coding-specific benchmarks
  • Controversy around Meta using unreleased "experimental chat" versions for some benchmark submissions
  • Ecosystem maturity: not yet a first-class option in Cursor, Claude Code, or other major tools
  • Behemoth (the largest variant) hasn't shipped publicly

When to Use Llama 4

Variant Best for
Scout (10M context) Massive document analysis, entire-codebase reasoning
Maverick (400B MoE) General coding, self-hosted production inference
Via API (Together, Groq, Fireworks) Cost-sensitive inference without GPU management

Key Characteristics

Property Value
Release April 5, 2025
Architecture Mixture of Experts (MoE)
Maverick size 17B active / 400B total
Context window 1M (Maverick), 10M (Scout)
License Llama 4 Community License
Pricing (Maverick API) ~$0.20/M input, ~$0.70/M output
Provider Meta (open weights)
HF Adoption (Maverick) 530K downloads, 471 likes
HF Adoption (Scout) 251K downloads, 1,252 likes
Website llama.meta.com