Technology RadarTechnology Radar

Lemonade Server

inference
This item was not updated in last three versions of the Radar. Should it have appeared in one of the more recent editions, there is a good chance it remains pertinent. However, if the item dates back further, its relevance may have diminished and our current evaluation could vary. Regrettably, our capacity to consistently revisit items from past Radar editions is limited.
Assess

Lemonade Server is AMD's open-source local LLM runtime — the only option that uses AMD's Neural Processing Unit (NPU) for AI inference. If your organisation's hardware fleet runs AMD CPUs (Ryzen AI 300 series), Lemonade offers inference performance and power efficiency not available through Ollama or LM Studio.

Buy vs Build

Lemonade is build: open-source, runs on your hardware. The "buy" angle is that it's developed and maintained by AMD, giving it long-term hardware support.

Why It's in Assess

For most teams, Ollama or LM Studio cover local inference needs. Lemonade becomes relevant in specific scenarios:

  1. Your hardware is AMD: Lemonade is the only tool that activates the NPU in Ryzen AI 300-series processors. This delivers 2-3x better tokens-per-watt than CPU-only inference on the same chip.
  2. Energy efficiency matters: NPU inference uses significantly less power than GPU inference — relevant for devices running on battery or datacenters optimising power draw.
  3. You need a single tool for GPU + NPU: Lemonade handles both, whereas Ollama on Windows AMD requires GPU-only mode.

What Is an NPU?

For readers unfamiliar with the term: Modern AMD (and Intel, Apple) processors include a Neural Processing Unit — a dedicated chip designed specifically for running AI models efficiently. Think of it like having a small, low-power GPU on the same chip as your CPU. The NPU is optimised for the specific type of math AI models use, so it runs them faster and with less energy than a general-purpose CPU would.

OpenAI-Compatible API

Like Ollama, Lemonade runs an API at http://localhost:8000/api/v1 that is compatible with the OpenAI API format — so any code targeting OpenAI can point at Lemonade with a single line change.

Supported Models

Llama 4, Llama 3.x, DeepSeek, Qwen, Gemma, Phi, and most GGUF-format models.

Key Characteristics

Property Value
Platform Windows (primary), Linux + macOS (beta)
Hardware AMD Ryzen AI 300 (NPU), AMD Radeon (GPU), CPU
API format OpenAI-compatible
License MIT
Provider AMD (open-source)