Technology RadarTechnology Radar

Ollama

inference
Adopt

Ollama is the standard tool for running large language models locally on your own hardware — a single command downloads a model and starts an OpenAI-compatible API server. It's the fastest path from "I want a local model" to "I have a working local model."

Buy vs Build

Ollama is a build tool (you run it yourself) but abstracts away all the complexity. It's closer to "buy" in effort: ollama pull llama3.3 downloads and configures everything. There's no commercial hosted version.

Why It's in Adopt

Ollama is the de facto standard for local model development in 2026:

  • One command to run any model: ollama run llama3.3 downloads and starts chatting
  • Always-on API server: ollama serve runs a local OpenAI-compatible API at http://localhost:11434 — drop-in replacement for the OpenAI API in development
  • Tool calling: Full support for function/tool calling in supported models (Llama 3.1+, Mistral, Qwen 2.5)
  • MCP integration: Works with Model Context Protocol tools, enabling agentic workflows on local models
  • Cross-platform: Mac (Apple Silicon optimised), Linux, Windows

Why Engineering Managers Care

Cost control during development: Developers burning OpenAI credits running tests against real APIs is expensive. Ollama lets developers use local models for the 90% of work where cloud quality isn't needed — reserving cloud credits for production testing.

Data privacy: Source code, proprietary documents, and customer data never leave your network. Relevant for regulated industries or when working with sensitive IP.

Offline capability: Agents and tools work without internet access.

Performance on Apple Silicon

On M-series Macs, Ollama runs Llama 3.3 70B at 15-25 tokens/second with 64GB RAM — fast enough for interactive use. The 8B and 14B models run at 60+ tokens/second.

Getting Started

# Install (macOS)
brew install ollama

# Download and run a model
ollama run llama3.3

# Or use the API from code
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.3", "messages": [{"role": "user", "content": "Hello"}]}'

Key Characteristics

Property Value
License MIT
Provider Ollama Inc.
GitHub ollama/ollama
Website ollama.com