Technology RadarTechnology Radar

Voyage Code 3

codingopen-source
Trial

Voyage Code 3 is a code embedding model from Voyage AI optimized for code retrieval. It outperforms OpenAI's text-embedding-3-large and CodeSage-large by 13.8% and 16.8% respectively across 32 code retrieval benchmarks, and supports flexible dimensions and quantization for cost-efficient deployment.

Why It's in Trial

  • Best-in-class code retrieval: Consistently outperforms alternatives across a broad benchmark suite — not a single cherry-picked task but 32 diverse code retrieval datasets.
  • Production-ready flexibility: Supports Matryoshka dimensionality (2048, 1024, 512, 256) and quantized output types (int8, uint8, binary), enabling 4x–32x storage reduction with minimal quality loss.
  • Wide adoption: Since its predecessor voyage-code-2 launched in January 2024, Voyage's code models have seen exponential adoption among coding assistant and agent startups for RAG-based code retrieval.
  • 32K context length: Long enough to embed entire files, large code blocks, or multi-file context — important for agentic retrieval where queries span significant code context.
  • Not yet the default: While leading on benchmarks, it hasn't displaced OpenAI embeddings as the default in most frameworks and tutorials. Trial — use it on real projects, but the ecosystem integration is still maturing.

Key Capabilities

  • Matryoshka learning: Train once at 2048 dimensions, truncate to 256 at query time with graceful quality degradation — no retraining needed.
  • Quantized embeddings: int8 (4x savings) or binary (32x savings) compared to 32-bit float, enabling cost-effective large-scale code search.
  • AWS SageMaker deployment: Available for private deployment in your VPC — 90ms latency per query, ~$0.22/M tokens on ml.g6.xlarge.

Key Characteristics

Property Value
Developer Voyage AI
Model voyage-code-3
Type Code embedding (not generative)
Released December 2024
Context length 32,768 tokens
Default dimensions 1,024
Max dimensions 2,048
Quantization int8, uint8, binary, ubinary
Benchmark +13.8% vs OpenAI text-embedding-3-large (32 datasets)
Deployment Voyage API, AWS SageMaker
Pricing API-based (see voyageai.com)
Hugging Face voyageai/voyage-code-3