Trial
Kimi K2, released mid-2025 by Moonshot AI, is a 1-trillion-parameter open-weight MoE model (32B active) that scores 71.6% on SWE-bench Verified. Its January 2026 successor, K2.5, adds native multimodal capabilities and an "Agent Swarm" system that coordinates up to 100 parallel agents -- achieving 50.2% on Humanity's Last Exam at 76% lower cost than Claude Opus.
Why It's in Trial
Kimi K2 and K2.5 have earned attention through strong benchmarks and architectural innovation:
- 71.6% SWE-bench Verified (K2, with parallel test-time compute) -- competitive with frontier proprietary models
- 1T total parameters, 32B active: Massive MoE architecture with efficient inference via sparse activation
- Open weights (Modified MIT License): Both base and instruct variants available for download and fine-tuning
- Agent Swarm (K2.5): Coordinates up to 100 specialised AI agents working simultaneously, cutting execution time by 4.5x while achieving 78.4% on BrowseComp
- Native ACP support: Kimi CLI is listed in the Agent Client Protocol registry (see ACP entry)
- K2 Thinking variant: AIME 2025 99.1%, HMMT 2025 95.1% -- near-perfect math reasoning
K2 vs K2.5
| Property | Kimi K2 | Kimi K2.5 |
|---|---|---|
| Release | Mid-2025 | January 2026 |
| Modality | Text only | Native multimodal (vision + text) |
| Training data | 15.5T text tokens | 15T mixed visual + text tokens |
| Context | 256K | 256K |
| Key innovation | MuonClip Optimizer at scale | Agent Swarm (100 parallel agents) |
| SWE-bench Verified | 71.6% | — |
| HLE (text, w/ tools) | 44.9% (K2 Thinking) | 51.8% |
| BrowseComp | 60.2% (K2 Thinking) | 74.9% (78.4% with Agent Swarm) |
Agent Swarm
K2.5's most distinctive feature is Agent Swarm -- a system that spawns up to 100 specialised agents working in parallel on a single task. On tasks requiring wide information gathering:
- BrowseComp: 78.4% (vs 60.6% for standard single-agent)
- Wide Search: 79.0% (vs 72.7% standard)
- Cost: 76% lower than Claude Opus for equivalent Humanity's Last Exam performance
Architecture
K2 uses a Mixture of Experts design with 61 layers:
- 1T total parameters, 32B active per token
- Trained on 15.5 trillion tokens with zero training instability
- Uses the MuonClip Optimizer -- the Muon optimizer applied at unprecedented scale with novel stability techniques
- Designed specifically for tool use, reasoning, and autonomous problem-solving
Cautions
- Same data sovereignty considerations as other Chinese-origin models (Moonshot AI is Beijing-based)
- Agent Swarm cost claims should be independently verified -- benchmark comparisons depend heavily on evaluation settings
- Limited Western ecosystem presence compared to DeepSeek or Llama
- K2.5 evaluation used temperature=1.0, top-p=0.95 -- settings that may favour the model
Key Characteristics
| Property | Value |
|---|---|
| Total parameters | 1T (MoE) |
| Active parameters | 32B per token |
| Context window | 256,000 tokens |
| License | Modified MIT |
| Provider | Moonshot AI (Beijing) |
| K2 weights | Hugging Face: moonshotai/Kimi-K2-Instruct |
| K2.5 weights | Hugging Face: moonshotai/Kimi-K2.5 |