DeepSeek V3.1 Terminus, released September 2025, is the stability- and agent-optimized refinement of DeepSeek's V3 line -- combining hybrid thinking/non-thinking modes with dramatically improved agentic tool use, surpassing both DeepSeek V3 and R1 by over 40% on SWE-bench and Terminal-bench.
Why It's in Trial
V3.1 Terminus represents a significant leap over its predecessors and earns Trial for several reasons:
Hybrid reasoning modes: Supports both "thinking" mode (deep chain-of-thought reasoning) and "non-thinking" mode (fast chat with function calling and FIM completion) -- effectively combining the strengths of R1 and V3 in one model
Agentic tool use: Clear improvements over V3.1 on tool calling, code agent, and search agent tasks. SimpleQA score of 96.8
SWE-bench and Terminal-bench: Surpasses V3 and R1 by over 40% on these agentic coding benchmarks
Language stability: Resolves the Chinese/English mixing and abnormal character issues that plagued earlier versions
MIT license: Same permissive licensing as all DeepSeek open-weight models
Same architecture, better training: Maintains the V3 architecture (671B MoE, 37B active) so existing deployment infrastructure works unchanged
Architecture
Property
Value
Architecture
Mixture of Experts (MoE)
Total parameters
671B
Active parameters
37B per token
Context window
128,000 tokens
Precision
FP8 microscaling
The model supports two operational modes:
Chat mode: Function calling, Fill-in-the-Middle (FIM) completion, JSON output
Reasoner mode: Deep contextual reasoning (no function calling or FIM)
Relationship to Other DeepSeek Models
Model
Focus
Status
DeepSeek V3
General-purpose
Superseded by V3.1 Terminus
DeepSeek R1
Reasoning-focused
Still relevant for transparent CoT; V3.1 Terminus subsumes most use cases
DeepSeek V3.1 Terminus
Agent + stability
Current recommended model
DeepSeek V3.2-Exp
Sparse attention
Experimental; builds on Terminus with DeepSeek Sparse Attention (DSA)
Cautions
The same data sovereignty considerations from the DeepSeek V3 entry apply:
Chinese company -- evaluate carefully under US export controls, ITAR, or strict data localisation requirements
Open weights can be self-hosted to mitigate cloud API concerns (requires A100/H100 class GPUs)
Review the license for large-scale commercial deployment