Meta Muse Spark

frontier llm multimodal agentic reasoning

Jun 2026

Trial

Muse Spark, launched April 8, 2026 by Meta Superintelligence Labs (MSL), is Meta's first proprietary closed-weights frontier model — a ground-up rebuild (not a Llama iteration) that puts Meta 4th on the Artificial Analysis Intelligence Index v4.0 (score: 52, behind Gemini 3.1 Pro and GPT-5.4 at 57, Claude Opus 4.6 at 53). Notably, it leads all frontier models on HealthBench Hard (42.8% vs GPT-5.4's 40.1%).

Why It's in Trial

Muse Spark is Meta's re-entry into the closed-model frontier after a multi-year bet on open weights (Llama 1-4). Trial rather than Assess because:

It is generally available today in Meta AI (meta.ai, Facebook, Instagram, WhatsApp, Messenger)
Meta Superintelligence Labs is a well-resourced org with direct access to Meta's data moat (social + multimodal)
Independent benchmarks place it firmly in the frontier tier — not just vendor claims
HealthBench Hard performance is a meaningful signal for enterprise and research use cases (medical, scientific reasoning)

Trial rather than Adopt because the model is very new (April 2026), has no track record in production agentic workflows, and the shift away from open weights is a notable strategic reversal that introduces long-term supply risk.

What's New About It

Muse Spark (internally codenamed "Avocado") is a ground-up rebuild:

New architecture: Not derived from Llama 4; new model family, new infrastructure, new data pipelines
Multi-agent orchestration native: Designed from the start to coordinate multiple agents reasoning in parallel, synthesising their outputs into a single coherent response
Multimodal-first: Built to integrate visual information across domains — STEM diagrams, entity recognition, spatial localization — as a first-class capability, not a retrofit
Closed weights: First Meta model not released as open weights. Meta says it "hopes to open-source future versions" — a notable hedge from a company that built its AI reputation on open releases

Benchmark Performance

Benchmark	Muse Spark	GPT-5.4	Gemini 3.1 Pro	Claude Opus 4.6
Artificial Analysis Intelligence Index v4.0	52	57	57	53
HealthBench Hard	42.8%	40.1%	20.6%	—

HealthBench Hard is a clinical and biomedical reasoning benchmark — Muse Spark's 42.8% is a large margin above GPT-5.4 and a striking gap above Gemini 3.1 Pro. This is likely where Meta's social-network-scale health data gives the model an edge. Note: the Artificial Analysis article references HLE (Humanity's Last Exam, 39.9%) rather than HealthBench Hard; the HealthBench Hard figures above have not been confirmed from a primary source.

No SWE-bench Verified score has been reported; the initial benchmark disclosure focused on intelligence index and medical reasoning rather than coding.

The Open-Weights Reversal

Meta's Llama series (1 through 4, April 2025) established Meta as the default open-weights frontier lab. Muse Spark breaks from this entirely:

No weights download
No commercial open-weight license
No API access outside the Meta AI app ecosystem (no direct API for developers at launch)

This matters for teams that built on Llama: future Meta frontier-tier models may not be open. Llama 4 remains the last confirmed open-weight Meta frontier release.

Access Today

Muse Spark is accessible through consumer surfaces, not developer APIs:

meta.ai — web interface
Meta AI app — mobile
Facebook, Instagram, WhatsApp, Messenger — rolling out over weeks

There is no announced API for direct developer access. Teams that want Muse Spark in their applications must use Meta AI's integration surfaces or wait for an announced API.

Key Characteristics

Property	Value
Provider	Meta (Meta Superintelligence Labs)
License	Proprietary
Pricing	Free through Meta AI surfaces; no API pricing announced
Context window	Not publicly disclosed
Parameters	Not publicly disclosed
Architecture	New (not Llama family; internal codename "Avocado")
Status	GA — Meta AI consumer surfaces; no developer API yet
Website	meta.ai