Ollama
inferenceOllama is the standard tool for running large language models locally on your own hardware — a single command downloads a model and starts an OpenAI-compatible API server. It's the fastest path from "I want a local model" to "I have a working local model."
Buy vs Build
Ollama is a build tool (you run it yourself) but abstracts away all the complexity. It's closer to "buy" in effort: ollama pull llama3.3 downloads and configures everything. There's no commercial hosted version.
Why It's in Adopt
Ollama is the de facto standard for local model development in 2026:
- One command to run any model:
ollama run llama3.3downloads and starts chatting - Always-on API server:
ollama serveruns a local OpenAI-compatible API athttp://localhost:11434— drop-in replacement for the OpenAI API in development - Tool calling: Full support for function/tool calling in supported models (Llama 3.1+, Mistral, Qwen 2.5)
- MCP integration: Works with Model Context Protocol tools, enabling agentic workflows on local models
- Cross-platform: Mac (Apple Silicon optimised), Linux, Windows
Why Engineering Managers Care
Cost control during development: Developers burning OpenAI credits running tests against real APIs is expensive. Ollama lets developers use local models for the 90% of work where cloud quality isn't needed — reserving cloud credits for production testing.
Data privacy: Source code, proprietary documents, and customer data never leave your network. Relevant for regulated industries or when working with sensitive IP.
Offline capability: Agents and tools work without internet access.
Performance on Apple Silicon
On M-series Macs, Ollama runs Llama 3.3 70B at 15-25 tokens/second with 64GB RAM — fast enough for interactive use. The 8B and 14B models run at 60+ tokens/second.
Getting Started
# Install (macOS)
brew install ollama
# Download and run a model
ollama run llama3.3
# Or use the API from code
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3.3", "messages": [{"role": "user", "content": "Hello"}]}'
Key Characteristics
| Property | Value |
|---|---|
| Platforms | macOS, Linux, Windows |
| API format | OpenAI-compatible |
| Model format | GGUF (quantised models) |
| License | MIT |
| Provider | Ollama Inc. |