Technology RadarTechnology Radar
Trial

Kimi K2, released mid-2025 by Moonshot AI, is a 1-trillion-parameter open-weight MoE model (32B active) that scores 71.6% on SWE-bench Verified. Its January 2026 successor, K2.5, adds native multimodal capabilities and an "Agent Swarm" system that coordinates up to 100 parallel agents -- achieving 50.2% on Humanity's Last Exam at 76% lower cost than Claude Opus.

Why It's in Trial

Kimi K2 and K2.5 have earned attention through strong benchmarks and architectural innovation:

  • 71.6% SWE-bench Verified (K2, with parallel test-time compute) -- competitive with frontier proprietary models
  • 1T total parameters, 32B active: Massive MoE architecture with efficient inference via sparse activation
  • Open weights (Modified MIT License): Both base and instruct variants available for download and fine-tuning
  • Agent Swarm (K2.5): Coordinates up to 100 specialised AI agents working simultaneously, cutting execution time by 4.5x while achieving 78.4% on BrowseComp
  • Native ACP support: Kimi CLI is listed in the Agent Client Protocol registry (see ACP entry)
  • K2 Thinking variant: AIME 2025 99.1%, HMMT 2025 95.1% -- near-perfect math reasoning

K2 vs K2.5

Property Kimi K2 Kimi K2.5
Release Mid-2025 January 2026
Modality Text only Native multimodal (vision + text)
Training data 15.5T text tokens 15T mixed visual + text tokens
Context 256K 256K
Key innovation MuonClip Optimizer at scale Agent Swarm (100 parallel agents)
SWE-bench Verified 71.6%
HLE (text, w/ tools) 44.9% (K2 Thinking) 51.8%
BrowseComp 60.2% (K2 Thinking) 74.9% (78.4% with Agent Swarm)

Agent Swarm

K2.5's most distinctive feature is Agent Swarm -- a system that spawns up to 100 specialised agents working in parallel on a single task. On tasks requiring wide information gathering:

  • BrowseComp: 78.4% (vs 60.6% for standard single-agent)
  • Wide Search: 79.0% (vs 72.7% standard)
  • Cost: 76% lower than Claude Opus for equivalent Humanity's Last Exam performance

Architecture

K2 uses a Mixture of Experts design with 61 layers:

  • 1T total parameters, 32B active per token
  • Trained on 15.5 trillion tokens with zero training instability
  • Uses the MuonClip Optimizer -- the Muon optimizer applied at unprecedented scale with novel stability techniques
  • Designed specifically for tool use, reasoning, and autonomous problem-solving

Cautions

  • Same data sovereignty considerations as other Chinese-origin models (Moonshot AI is Beijing-based)
  • Agent Swarm cost claims should be independently verified -- benchmark comparisons depend heavily on evaluation settings
  • Limited Western ecosystem presence compared to DeepSeek or Llama
  • K2.5 evaluation used temperature=1.0, top-p=0.95 -- settings that may favour the model

Key Characteristics

Property Value
Total parameters 1T (MoE)
Active parameters 32B per token
Context window 256,000 tokens
License Modified MIT
Provider Moonshot AI (Beijing)
K2 weights Hugging Face: moonshotai/Kimi-K2-Instruct
K2.5 weights Hugging Face: moonshotai/Kimi-K2.5

Further Reading