Technology RadarTechnology Radar
Trial

Kimi K2.6, released April 20, 2026 by Moonshot AI, is the latest in the K2 family of 1-trillion-parameter open-weight MoE models (32B active). K2.6 scores 80.2% on SWE-bench Verified and 58.6% on SWE-bench Pro — ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%) on the latter benchmark. Its Agent Swarm now scales to 300 parallel sub-agents executing 4,000+ coordinated steps, up from K2.5's 100-agent limit.

Why It's in Trial

The K2 family has earned attention through strong benchmarks, architectural innovation, and rapid iteration:

  • 80.2% SWE-bench Verified (K2.6) and 58.6% SWE-bench Pro — the latter ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%)
  • 1T total parameters, 32B active: Massive MoE architecture (384 experts: 8 routed + 1 shared) with efficient inference via sparse activation
  • Open weights (Modified MIT License): Base and instruct variants available for download and fine-tuning. Commercial use requires visible "Kimi K2.6" credit above 100M MAU or $20M monthly revenue
  • Agent Swarm (K2.6): Coordinates up to 300 sub-agents executing 4,000+ coordinated steps simultaneously, chaining 4,000+ tool calls over 12-hour continuous runs
  • Native ACP support: Kimi CLI is listed in the Agent Client Protocol registry (see ACP entry)
  • K2 Thinking variant: AIME 2025 99.1%, HMMT 2025 95.1% — near-perfect math reasoning

K2 vs K2.5 vs K2.6

Property Kimi K2 Kimi K2.5 Kimi K2.6
Release Mid-2025 January 2026 April 20, 2026
Modality Text only Native multimodal Native multimodal (image + video in, text out)
Training data 15.5T text tokens 15T mixed visual + text
Context 256K 256K 256K
Key innovation MuonClip Optimizer at scale Agent Swarm (100 agents) Agent Swarm scaled to 300 agents, 4K steps
SWE-bench Verified 71.6% 80.2%
SWE-bench Pro 58.6%
HLE (text, w/ tools) 44.9% (K2 Thinking) 51.8%
BrowseComp 60.2% (K2 Thinking) 74.9% (78.4% with Swarm)

Agent Swarm

K2.6's Agent Swarm scales to 300 sub-agents executing across 4,000+ coordinated steps — a 3x increase over K2.5's 100-agent limit. The model can chain together over 4,000 tool calls and run continuously for more than 12 hours in languages like Rust, Go, and Python. On tasks requiring wide information gathering (K2.5 baselines):

  • BrowseComp: 78.4% (vs 60.6% for standard single-agent)
  • Wide Search: 79.0% (vs 72.7% standard)
  • Cost: 76% lower than Claude Opus for equivalent Humanity's Last Exam performance

Architecture

K2 uses a Mixture of Experts design with 61 layers:

  • 1T total parameters, 32B active per token
  • 384 experts (8 routed + 1 shared per layer), MLA attention
  • Trained on 15.5 trillion tokens with zero training instability
  • Uses the MuonClip Optimizer — the Muon optimizer applied at unprecedented scale with novel stability techniques
  • INT4 quantization support for efficient deployment
  • Designed specifically for tool use, reasoning, and autonomous problem-solving

Cautions

  • Same data sovereignty considerations as other Chinese-origin models (Moonshot AI is Beijing-based)
  • Agent Swarm cost and duration claims should be independently verified — benchmark comparisons depend heavily on evaluation settings
  • Limited Western ecosystem presence compared to DeepSeek or Llama
  • K2.5 evaluation used temperature=1.0, top-p=0.95 — settings that may favour the model
  • K2.6 SWE-bench Pro lead over GPT-5.4 is narrow (58.6% vs 57.7%) — well within noise range for benchmark variance

Key Characteristics

Property Value
Total parameters 1T (MoE)
Active parameters 32B per token
Context window 256,000 tokens
License Modified MIT
Provider Moonshot AI (Beijing)
K2 weights Hugging Face: moonshotai/Kimi-K2-Instruct
K2.5 weights Hugging Face: moonshotai/Kimi-K2.5
K2.6 weights Hugging Face: moonshotai/Kimi-K2.6
Website kimi.com

Further Reading