Kimi K2 / K2.5 / K2.6

Jun 2026

Trial

Kimi K2.6, released April 20, 2026 by Moonshot AI, is the latest in the K2 family of 1-trillion-parameter open-weight MoE models (32B active). K2.6 scores 80.2% on SWE-bench Verified and 58.6% on SWE-bench Pro — ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%) on the latter benchmark. Its Agent Swarm now scales to 300 parallel sub-agents executing 4,000+ coordinated steps, up from K2.5's 100-agent limit.

Why It's in Trial

The K2 family has earned attention through strong benchmarks, architectural innovation, and rapid iteration:

80.2% SWE-bench Verified (K2.6) and 58.6% SWE-bench Pro — the latter ahead of GPT-5.4 (57.7%) and Claude Opus 4.6 (53.4%)
1T total parameters, 32B active: Massive MoE architecture (384 experts: 8 routed + 1 shared) with efficient inference via sparse activation
Open weights (Modified MIT License): Base and instruct variants available for download and fine-tuning. Commercial use requires visible "Kimi K2.6" credit above 100M MAU or $20M monthly revenue
Agent Swarm (K2.6): Coordinates up to 300 sub-agents executing 4,000+ coordinated steps simultaneously, chaining 4,000+ tool calls over 12-hour continuous runs
Native ACP support: Kimi CLI is listed in the Agent Client Protocol registry (see ACP entry)
K2 Thinking variant: AIME 2025 99.1%, HMMT 2025 95.1% — near-perfect math reasoning

K2 vs K2.5 vs K2.6

Property	Kimi K2	Kimi K2.5	Kimi K2.6
Release	Mid-2025	January 2026	April 20, 2026
Modality	Text only	Native multimodal	Native multimodal (image + video in, text out)
Training data	15.5T text tokens	15T mixed visual + text	—
Context	256K	256K	256K
Key innovation	MuonClip Optimizer at scale	Agent Swarm (100 agents)	Agent Swarm scaled to 300 agents, 4K steps
SWE-bench Verified	71.6%	—	80.2%
SWE-bench Pro	—	—	58.6%
HLE (text, w/ tools)	44.9% (K2 Thinking)	51.8%	—
BrowseComp	60.2% (K2 Thinking)	74.9% (78.4% with Swarm)	—

Agent Swarm

K2.6's Agent Swarm scales to 300 sub-agents executing across 4,000+ coordinated steps — a 3x increase over K2.5's 100-agent limit. The model can chain together over 4,000 tool calls and run continuously for more than 12 hours in languages like Rust, Go, and Python. On tasks requiring wide information gathering (K2.5 baselines):

BrowseComp: 78.4% (vs 60.6% for standard single-agent)
Wide Search: 79.0% (vs 72.7% standard)
Cost: 76% lower than Claude Opus for equivalent Humanity's Last Exam performance

Architecture

K2 uses a Mixture of Experts design with 61 layers:

1T total parameters, 32B active per token
384 experts (8 routed + 1 shared per layer), MLA attention
Trained on 15.5 trillion tokens with zero training instability
Uses the MuonClip Optimizer — the Muon optimizer applied at unprecedented scale with novel stability techniques
INT4 quantization support for efficient deployment
Designed specifically for tool use, reasoning, and autonomous problem-solving

Cautions

Same data sovereignty considerations as other Chinese-origin models (Moonshot AI is Beijing-based)
Agent Swarm cost and duration claims should be independently verified — benchmark comparisons depend heavily on evaluation settings
Limited Western ecosystem presence compared to DeepSeek or Llama
K2.5 evaluation used temperature=1.0, top-p=0.95 — settings that may favour the model
K2.6 SWE-bench Pro lead over GPT-5.4 is narrow (58.6% vs 57.7%) — well within noise range for benchmark variance

Key Characteristics

Property	Value
Total parameters	1T (MoE)
Active parameters	32B per token
Context window	256,000 tokens
License	Modified MIT
Provider	Moonshot AI (Beijing)
K2 weights	Hugging Face: moonshotai/Kimi-K2-Instruct
K2.5 weights	Hugging Face: moonshotai/Kimi-K2.5
K2.6 weights	Hugging Face: moonshotai/Kimi-K2.6
Website	kimi.com