Technology RadarTechnology Radar
Assess

DeepSeek V4 is a preview-stage open-weight model family released April 24, 2026 under MIT license — featuring a new Manifold-Constrained Hyper-Connections (mHC) architecture, two tiers (V4-Pro at 1.6T/49B active parameters and V4-Flash at 284B/13B active), and a 1M token context window across both.

Why It's in Assess

V4 remains in Assess: it is preview-stage and the benchmark scores below are self-reported from DeepSeek's technical report (released April 24, 2026) — no independent third-party evaluation has published yet:

  • Self-reported benchmark scores (from DeepSeek technical report, unverified):
    • SWE-bench Verified: 80.6% (vs. V3.2's ~49% — an extraordinary claimed jump)
    • GPQA Diamond: 81.0
    • AIME 2025: 87.5
    • MMLU: 88.4%
  • Architecture change: mHC (Manifold-Constrained Hyper-Connections) is a new design. Technical paper published April 24, 2026 on Hugging Face alongside the model weights.
  • Preview qualifier: The API announcement page labels both models as preview. Weight stability and API compatibility may still shift before GA.
  • MIT license confirmed: Weights already on Hugging Face (uploaded April 22, 2026).

Move to Trial once: independent SWE-bench / coding benchmark results confirm the self-reported scores AND the preview label drops.

Model Variants

Model Total Params Active Params Context Input pricing Output pricing
V4-Pro 1.6T (MoE) 49B per token 1M tokens $1.74/M $3.48/M
V4-Flash 284B (MoE) 13B per token 1M tokens $0.14/M $0.28/M

V4-Flash is priced comparably to V3.x Flash variants — a solid budget option if performance holds. V4-Pro sits at a premium that makes sense only if benchmark scores justify it over V3.2.

Architecture: mHC

The Manifold-Constrained Hyper-Connections (mHC) architecture is the key differentiator from V3.x. The technical report was published April 24, 2026 on Hugging Face alongside the model weights. Based on the report and model cards:

  • Hyper-Connections extend the standard residual connection with learned multi-path routing across layers
  • Manifold constraint limits the connection space, reducing memory and compute overhead vs. unconstrained variants
  • FP8/FP4 mixed precision throughout (consistent with V3.x)
  • Trained on 16,000 Hopper-era GPUs at a reported total compute cost of $5.6M

A dedicated arXiv preprint for the mHC architecture may follow — watch arxiv.org/search/?searchtype=author&query=deepseek.

Relationship to DeepSeek V3.2

V4 is the successor to V3.2. V3.2 remains recommended for production workloads until V4 completes the preview phase. Key differences:

Property V3.2 V4-Pro
Architecture Standard MoE mHC (new)
Total parameters 685B 1.6T
Active parameters 40B 49B
Context window 128K 1M
API input price ~$0.07/M $1.74/M
Status GA Preview

The 1M context window is the most immediately compelling upgrade — V3.2's 128K context is a hard ceiling for long-codebase and document analysis tasks.

Data Sovereignty

Standard DeepSeek cautions apply: Chinese company, evaluate under US export controls, ITAR, or strict data localisation requirements. MIT-licensed weights can be self-hosted to eliminate API data exposure.

Key Characteristics

Property Value
Provider DeepSeek
License MIT
Pricing V4-Pro: $1.74/$3.48/M tokens; V4-Flash: $0.14/$0.28/M tokens
Context window 1,000,000 tokens
Parameters V4-Pro: 1.6T total (49B active); V4-Flash: 284B total (13B active)
Architecture MoE + Manifold-Constrained Hyper-Connections (mHC)
Status Preview (April 24, 2026)
GitHub deepseek-ai
Website platform.deepseek.com

Further Reading