Kimi K2.6, released April 20, 2026 by Moonshot AI, is the latest in the K2 family of 1-trillion-parameter open-weight MoE models (32B active). K2.6 scores 80.2% on SWE-bench Verified and 58.6% on SWE-bench Pro — ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%) on the latter benchmark. Its Agent Swarm now scales to 300 parallel sub-agents executing 4,000+ coordinated steps, up from K2.5's 100-agent limit.
Why It's in Trial
The K2 family has earned attention through strong benchmarks, architectural innovation, and rapid iteration:
- 80.2% SWE-bench Verified (K2.6) and 58.6% SWE-bench Pro — the latter ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%)
- 1T total parameters, 32B active: Massive MoE architecture (384 experts: 8 routed + 1 shared) with efficient inference via sparse activation
- Open weights (Modified MIT License): Base and instruct variants available for download and fine-tuning. Commercial use requires visible "Kimi K2.6" credit above 100M MAU or $20M monthly revenue
- Agent Swarm (K2.6): Coordinates up to 300 sub-agents executing 4,000+ coordinated steps simultaneously, chaining 4,000+ tool calls over 12-hour continuous runs
- Native ACP support: Kimi CLI is listed in the Agent Client Protocol registry (see ACP entry)
- K2 Thinking variant: AIME 2025 99.1%, HMMT 2025 95.1% — near-perfect math reasoning
K2 vs K2.5 vs K2.6
| Property | Kimi K2 | Kimi K2.5 | Kimi K2.6 |
|---|---|---|---|
| Release | Mid-2025 | January 2026 | April 20, 2026 |
| Modality | Text only | Native multimodal | Native multimodal (image + video in, text out) |
| Training data | 15.5T text tokens | 15T mixed visual + text | — |
| Context | 256K | 256K | 256K |
| Key innovation | MuonClip Optimizer at scale | Agent Swarm (100 agents) | Agent Swarm scaled to 300 agents, 4K steps |
| SWE-bench Verified | 71.6% | — | 80.2% |
| SWE-bench Pro | — | — | 58.6% |
| HLE (text, w/ tools) | 44.9% (K2 Thinking) | 51.8% | — |
| BrowseComp | 60.2% (K2 Thinking) | 74.9% (78.4% with Swarm) | — |
Agent Swarm
K2.6's Agent Swarm scales to 300 sub-agents executing across 4,000+ coordinated steps — a 3x increase over K2.5's 100-agent limit. The model can chain together over 4,000 tool calls and run continuously for more than 12 hours in languages like Rust, Go, and Python. On tasks requiring wide information gathering (K2.5 baselines):
- BrowseComp: 78.4% (vs 60.6% for standard single-agent)
- Wide Search: 79.0% (vs 72.7% standard)
- Cost: 76% lower than Claude Opus for equivalent Humanity's Last Exam performance
Architecture
K2 uses a Mixture of Experts design with 61 layers:
- 1T total parameters, 32B active per token
- 384 experts (8 routed + 1 shared per layer), MLA attention
- Trained on 15.5 trillion tokens with zero training instability
- Uses the MuonClip Optimizer — the Muon optimizer applied at unprecedented scale with novel stability techniques
- INT4 quantization support for efficient deployment
- Designed specifically for tool use, reasoning, and autonomous problem-solving
Cautions
- Same data sovereignty considerations as other Chinese-origin models (Moonshot AI is Beijing-based)
- Agent Swarm cost and duration claims should be independently verified — benchmark comparisons depend heavily on evaluation settings
- Limited Western ecosystem presence compared to DeepSeek or Llama
- K2.5 evaluation used temperature=1.0, top-p=0.95 — settings that may favour the model
- K2.6 SWE-bench Pro lead over GPT-5.4 is narrow (58.6% vs 57.7%) — well within noise range for benchmark variance
Key Characteristics
| Property | Value |
|---|---|
| Total parameters | 1T (MoE) |
| Active parameters | 32B per token |
| Context window | 256,000 tokens |
| License | Modified MIT |
| Provider | Moonshot AI (Beijing) |
| K2 weights | Hugging Face: moonshotai/Kimi-K2-Instruct |
| K2.5 weights | Hugging Face: moonshotai/Kimi-K2.5 |
| K2.6 weights | Hugging Face: moonshotai/Kimi-K2.6 |
| Website | kimi.com |