Technology RadarTechnology Radar
Trial

Muse Spark, launched April 8, 2026 by Meta Superintelligence Labs (MSL), is Meta's first proprietary closed-weights frontier model — a ground-up rebuild (not a Llama iteration) that puts Meta 4th on the Artificial Analysis Intelligence Index v4.0 (score: 52, behind Gemini 3.1 Pro and GPT-5.4 at 57, Claude Opus 4.6 at 53). Notably, it leads all frontier models on HealthBench Hard (42.8% vs GPT-5.4's 40.1%).

Why It's in Trial

Muse Spark is Meta's re-entry into the closed-model frontier after a multi-year bet on open weights (Llama 1-4). Trial rather than Assess because:

  • It is generally available today in Meta AI (meta.ai, Facebook, Instagram, WhatsApp, Messenger)
  • Meta Superintelligence Labs is a well-resourced org with direct access to Meta's data moat (social + multimodal)
  • Independent benchmarks place it firmly in the frontier tier — not just vendor claims
  • HealthBench Hard performance is a meaningful signal for enterprise and research use cases (medical, scientific reasoning)

Trial rather than Adopt because the model is very new (April 2026), has no track record in production agentic workflows, and the shift away from open weights is a notable strategic reversal that introduces long-term supply risk.

What's New About It

Muse Spark (internally codenamed "Avocado") is a ground-up rebuild:

  • New architecture: Not derived from Llama 4; new model family, new infrastructure, new data pipelines
  • Multi-agent orchestration native: Designed from the start to coordinate multiple agents reasoning in parallel, synthesising their outputs into a single coherent response
  • Multimodal-first: Built to integrate visual information across domains — STEM diagrams, entity recognition, spatial localization — as a first-class capability, not a retrofit
  • Closed weights: First Meta model not released as open weights. Meta says it "hopes to open-source future versions" — a notable hedge from a company that built its AI reputation on open releases

Benchmark Performance

Benchmark Muse Spark GPT-5.4 Gemini 3.1 Pro Claude Opus 4.6
Artificial Analysis Intelligence Index v4.0 52 57 57 53
HealthBench Hard 42.8% 40.1% 20.6%

HealthBench Hard is a clinical and biomedical reasoning benchmark — Muse Spark's 42.8% is a large margin above GPT-5.4 and a striking gap above Gemini 3.1 Pro. This is likely where Meta's social-network-scale health data gives the model an edge. Note: the Artificial Analysis article references HLE (Humanity's Last Exam, 39.9%) rather than HealthBench Hard; the HealthBench Hard figures above have not been confirmed from a primary source.

No SWE-bench Verified score has been reported; the initial benchmark disclosure focused on intelligence index and medical reasoning rather than coding.

The Open-Weights Reversal

Meta's Llama series (1 through 4, April 2025) established Meta as the default open-weights frontier lab. Muse Spark breaks from this entirely:

  • No weights download
  • No commercial open-weight license
  • No API access outside the Meta AI app ecosystem (no direct API for developers at launch)

This matters for teams that built on Llama: future Meta frontier-tier models may not be open. Llama 4 remains the last confirmed open-weight Meta frontier release.

Access Today

Muse Spark is accessible through consumer surfaces, not developer APIs:

  • meta.ai — web interface
  • Meta AI app — mobile
  • Facebook, Instagram, WhatsApp, Messenger — rolling out over weeks

There is no announced API for direct developer access. Teams that want Muse Spark in their applications must use Meta AI's integration surfaces or wait for an announced API.

Key Characteristics

Property Value
Provider Meta (Meta Superintelligence Labs)
License Proprietary
Pricing Free through Meta AI surfaces; no API pricing announced
Context window Not publicly disclosed
Parameters Not publicly disclosed
Architecture New (not Llama family; internal codename "Avocado")
Status GA — Meta AI consumer surfaces; no developer API yet
Website meta.ai

Further Reading