Trial
Llama 4 is Meta's latest open-weight model family, released April 2025. The Maverick variant (400B MoE, 17B active) offers frontier-competitive performance at dramatically lower cost — the strongest case yet for open-weight models in production.
Why It's in Trial
Llama 4 represents a significant leap for open-weight models:
- Llama 4 Scout: 17B active / 109B total (16 experts), 10M context window — the longest of any open-weight model
- Llama 4 Maverick: 17B active / 400B total (128 experts), 1M context — the performance leader
- Llama 4 Behemoth: 288B active / ~2T total — announced but not publicly released
Maverick beats GPT-4o on MMMU (73.4% vs 69.1%) at roughly 1/9th the cost. Inference speeds are impressive — up to 700+ tokens/sec on optimized providers.
Why Not Adopt?
- SWE-bench scores don't lead — proprietary models (Claude, GPT-5.4, Grok 4.2) still dominate on coding-specific benchmarks
- Controversy around Meta using unreleased "experimental chat" versions for some benchmark submissions
- Ecosystem maturity: not yet a first-class option in Cursor, Claude Code, or other major tools
- Behemoth (the largest variant) hasn't shipped publicly
When to Use Llama 4
| Variant | Best for |
|---|---|
| Scout (10M context) | Massive document analysis, entire-codebase reasoning |
| Maverick (400B MoE) | General coding, self-hosted production inference |
| Via API (Together, Groq, Fireworks) | Cost-sensitive inference without GPU management |
Key Characteristics
| Property | Value |
|---|---|
| Release | April 5, 2025 |
| Architecture | Mixture of Experts (MoE) |
| Maverick size | 17B active / 400B total |
| Context window | 1M (Maverick), 10M (Scout) |
| License | Llama 4 Community License |
| Pricing (Maverick API) | ~$0.20/M input, ~$0.70/M output |
| Provider | Meta (open weights) |
| HF Adoption (Maverick) | 530K downloads, 471 likes |
| HF Adoption (Scout) | 251K downloads, 1,252 likes |
| Website | llama.meta.com |