DeepSeek V4

Jun 2026

Assess

DeepSeek V4 is a preview-stage open-weight model family released April 24, 2026 under MIT license — featuring a new Manifold-Constrained Hyper-Connections (mHC) architecture, two tiers (V4-Pro at 1.6T/49B active parameters and V4-Flash at 284B/13B active), and a 1M token context window across both.

Why It's in Assess

V4 remains in Assess: it is preview-stage and the benchmark scores below are self-reported from DeepSeek's technical report (released April 24, 2026) — no independent third-party evaluation has published yet:

Self-reported benchmark scores (from DeepSeek technical report, unverified):
- SWE-bench Verified: 80.6% (vs. V3.2's ~49% — an extraordinary claimed jump)
- GPQA Diamond: 81.0
- AIME 2025: 87.5
- MMLU: 88.4%
Architecture change: mHC (Manifold-Constrained Hyper-Connections) is a new design. Technical paper published April 24, 2026 on Hugging Face alongside the model weights.
Preview qualifier: The API announcement page labels both models as preview. Weight stability and API compatibility may still shift before GA.
MIT license confirmed: Weights already on Hugging Face (uploaded April 22, 2026).

Move to Trial once: independent SWE-bench / coding benchmark results confirm the self-reported scores AND the preview label drops.

Model Variants

Model	Total Params	Active Params	Context	Input pricing	Output pricing
V4-Pro	1.6T (MoE)	49B per token	1M tokens	$1.74/M	$3.48/M
V4-Flash	284B (MoE)	13B per token	1M tokens	$0.14/M	$0.28/M

V4-Flash is priced comparably to V3.x Flash variants — a solid budget option if performance holds. V4-Pro sits at a premium that makes sense only if benchmark scores justify it over V3.2.

Architecture: mHC

The Manifold-Constrained Hyper-Connections (mHC) architecture is the key differentiator from V3.x. The technical report was published April 24, 2026 on Hugging Face alongside the model weights. Based on the report and model cards:

Hyper-Connections extend the standard residual connection with learned multi-path routing across layers
Manifold constraint limits the connection space, reducing memory and compute overhead vs. unconstrained variants
FP8/FP4 mixed precision throughout (consistent with V3.x)
Trained on 16,000 Hopper-era GPUs at a reported total compute cost of $5.6M

A dedicated arXiv preprint for the mHC architecture may follow — watch arxiv.org/search/?searchtype=author&query=deepseek.

Relationship to DeepSeek V3.2

V4 is the successor to V3.2. V3.2 remains recommended for production workloads until V4 completes the preview phase. Key differences:

Property	V3.2	V4-Pro
Architecture	Standard MoE	mHC (new)
Total parameters	685B	1.6T
Active parameters	40B	49B
Context window	128K	1M
API input price	~$0.07/M	$1.74/M
Status	GA	Preview

The 1M context window is the most immediately compelling upgrade — V3.2's 128K context is a hard ceiling for long-codebase and document analysis tasks.

Data Sovereignty

Standard DeepSeek cautions apply: Chinese company, evaluate under US export controls, ITAR, or strict data localisation requirements. MIT-licensed weights can be self-hosted to eliminate API data exposure.

Key Characteristics

Property	Value
Provider	DeepSeek
License	MIT
Pricing	V4-Pro: $1.74/$3.48/M tokens; V4-Flash: $0.14/$0.28/M tokens
Context window	1,000,000 tokens
Parameters	V4-Pro: 1.6T total (49B active); V4-Flash: 284B total (13B active)
Architecture	MoE + Manifold-Constrained Hyper-Connections (mHC)
Status	Preview (April 24, 2026)
GitHub	deepseek-ai
Website	platform.deepseek.com