Llama 4 (Meta)

frontier llm open-source self-hosted coding

Mar 2026

Trial

Llama 4 is Meta's latest open-weight model family, released April 2025. The Maverick variant (400B MoE, 17B active) offers frontier-competitive performance at dramatically lower cost — the strongest case yet for open-weight models in production.

Why It's in Trial

Llama 4 represents a significant leap for open-weight models:

Llama 4 Scout: 17B active / 109B total (16 experts), 10M context window — the longest of any open-weight model
Llama 4 Maverick: 17B active / 400B total (128 experts), 1M context — the performance leader
Llama 4 Behemoth: 288B active / ~2T total — announced but not publicly released

Maverick beats GPT-4o on MMMU (73.4% vs 69.1%) at roughly 1/9th the cost. Inference speeds are impressive — up to 700+ tokens/sec on optimized providers.

Why Not Adopt?

SWE-bench scores don't lead — proprietary models (Claude, GPT-5.4, Grok 4.2) still dominate on coding-specific benchmarks
Controversy around Meta using unreleased "experimental chat" versions for some benchmark submissions
Ecosystem maturity: not yet a first-class option in Cursor, Claude Code, or other major tools
Behemoth (the largest variant) hasn't shipped publicly

When to Use Llama 4

Variant	Best for
Scout (10M context)	Massive document analysis, entire-codebase reasoning
Maverick (400B MoE)	General coding, self-hosted production inference
Via API (Together, Groq, Fireworks)	Cost-sensitive inference without GPU management

Key Characteristics

Property	Value
Release	April 5, 2025
Architecture	Mixture of Experts (MoE)
Maverick size	17B active / 400B total
Context window	1M (Maverick), 10M (Scout)
License	Llama 4 Community License
Pricing (Maverick API)	~$0.20/M input, ~$0.70/M output
Provider	Meta (open weights)
HF Adoption (Maverick)	530K downloads, 471 likes
HF Adoption (Scout)	251K downloads, 1,252 likes
Website	llama.meta.com