Kimi K2 / K2.5

Mar 2026

Trial

Kimi K2, released mid-2025 by Moonshot AI, is a 1-trillion-parameter open-weight MoE model (32B active) that scores 71.6% on SWE-bench Verified. Its January 2026 successor, K2.5, adds native multimodal capabilities and an "Agent Swarm" system that coordinates up to 100 parallel agents -- achieving 50.2% on Humanity's Last Exam at 76% lower cost than Claude Opus.

Why It's in Trial

Kimi K2 and K2.5 have earned attention through strong benchmarks and architectural innovation:

71.6% SWE-bench Verified (K2, with parallel test-time compute) -- competitive with frontier proprietary models
1T total parameters, 32B active: Massive MoE architecture with efficient inference via sparse activation
Open weights (Modified MIT License): Both base and instruct variants available for download and fine-tuning
Agent Swarm (K2.5): Coordinates up to 100 specialised AI agents working simultaneously, cutting execution time by 4.5x while achieving 78.4% on BrowseComp
Native ACP support: Kimi CLI is listed in the Agent Client Protocol registry (see ACP entry)
K2 Thinking variant: AIME 2025 99.1%, HMMT 2025 95.1% -- near-perfect math reasoning

K2 vs K2.5

Property	Kimi K2	Kimi K2.5
Release	Mid-2025	January 2026
Modality	Text only	Native multimodal (vision + text)
Training data	15.5T text tokens	15T mixed visual + text tokens
Context	256K	256K
Key innovation	MuonClip Optimizer at scale	Agent Swarm (100 parallel agents)
SWE-bench Verified	71.6%	—
HLE (text, w/ tools)	44.9% (K2 Thinking)	51.8%
BrowseComp	60.2% (K2 Thinking)	74.9% (78.4% with Agent Swarm)

Agent Swarm

K2.5's most distinctive feature is Agent Swarm -- a system that spawns up to 100 specialised agents working in parallel on a single task. On tasks requiring wide information gathering:

BrowseComp: 78.4% (vs 60.6% for standard single-agent)
Wide Search: 79.0% (vs 72.7% standard)
Cost: 76% lower than Claude Opus for equivalent Humanity's Last Exam performance

Architecture

K2 uses a Mixture of Experts design with 61 layers:

1T total parameters, 32B active per token
Trained on 15.5 trillion tokens with zero training instability
Uses the MuonClip Optimizer -- the Muon optimizer applied at unprecedented scale with novel stability techniques
Designed specifically for tool use, reasoning, and autonomous problem-solving

Cautions

Same data sovereignty considerations as other Chinese-origin models (Moonshot AI is Beijing-based)
Agent Swarm cost claims should be independently verified -- benchmark comparisons depend heavily on evaluation settings
Limited Western ecosystem presence compared to DeepSeek or Llama
K2.5 evaluation used temperature=1.0, top-p=0.95 -- settings that may favour the model

Key Characteristics

Property	Value
Total parameters	1T (MoE)
Active parameters	32B per token
Context window	256,000 tokens
License	Modified MIT
Provider	Moonshot AI (Beijing)
K2 weights	Hugging Face: moonshotai/Kimi-K2-Instruct
K2.5 weights	Hugging Face: moonshotai/Kimi-K2.5