Context engineering is the discipline of deliberately designing what information an AI agent has access to — and when — as it executes a long-running task. It is to agentic AI what schema design is to databases: getting it right is the difference between a reliable system and an unpredictable one.
How It Relates to Prompt Engineering and Context Management
This entry supersedes the earlier "Context Management" entry (now Hold). Context management — the practice of curating what information goes into an AI conversation — is a subset of context engineering. The distinction:
- Prompt engineering is about what you say to the model at the start of a task
- Context management is about selecting which files, docs, and information to include in a conversation
- Context engineering encompasses both, plus the ongoing, dynamic management of the information flowing through an agent's context window over time — as it reads files, calls tools, receives outputs, and builds up history
It's less about crafting the perfect sentence and more about architecting an information pipeline.
The term was coined by Tobi Lütke (Shopify CEO) in a June 2025 post: "the art of providing all the context for the task to be plausibly solvable by the LLM." Andrej Karpathy amplified it, calling it "the delicate art and science of filling the context window with just the right information for each step." Anthropic published an influential guide in September 2025; Thoughtworks placed the practice on their Technology Radar (Vol. 33, November 2025) at Assess. By early 2026, Birgitta Böckeler (Thoughtworks Distinguished Engineer) declared it "probably the most significant development of the year" in her analysis on martinfowler.com: "Context is the bottleneck for coding agents now."
Why It Matters in Practice
A typical agentic failure mode: an agent starts a complex task, accumulates 50,000 tokens of context (tool outputs, file contents, intermediate results), and by the third hour is "confused" — contradicting earlier decisions, forgetting constraints, or losing track of the goal. The model's reasoning quality degrades as context grows noisier. Crucially, throwing more tokens at the problem makes it worse, not better.
Context engineering addresses this by treating the context window as a managed resource:
1. Context selection: What does the agent actually need right now?
- Inject only the relevant files for the current sub-task, not the whole codebase
- Summarise completed sub-tasks rather than keeping full transcripts
- Use retrieval (RAG) to pull in specific information on demand
2. Context freshness: Stale information misleads agents
- Prune tool output that is no longer relevant
- Refresh state (re-read files) before taking action rather than relying on cached content from earlier in the session
3. Context structure: How information is formatted affects reasoning quality
- Place the most important constraints near the beginning and end of context (the "lost in the middle" phenomenon is real)
- Use structured formats (JSON, XML) for programmatically-consumed information; natural prose for nuanced instructions
- Separate "ground truth" information (facts, state) from "reasoning space" (scratchpad)
4. Context checkpointing: For long-running tasks
- Periodically ask the agent to summarise progress, constraints still in effect, and the next planned step
- Use this summary to "reset" a fresh context rather than carrying the entire history forward
5. Context isolation via subagents: The most powerful pattern for large tasks
- Spawn subagents with focused, scoped context windows so the main agent is never polluted with file-level detail
- The parent holds project-level context (goals, plan, progress); subagents handle search and file reading independently
What Forward-Looking Companies Are Doing
- Thoughtworks: Böckeler's February 2026 analysis on martinfowler.com uses Claude Code as the canonical example of advanced context engineering — CLAUDE.md as critical infrastructure, modular path-scoped rules, and subagent isolation. Their Tech Radar Vol. 33 (Nov 2025) placed context engineering at Assess, calling it central to how AI assistance is maturing.
- Spotify: Published the "Honk" engineering blog series on their background coding agent (November 2025) — a practical demonstration of context engineering at scale (Part 1 · Part 2). Their migrations require careful prompt construction specifying "what to change, what not to touch, and how to verify success."
- Anthropic: Claude Code's architecture is explicitly built around context engineering — CLAUDE.md hierarchy, auto-compaction, subagent isolation, and tool search for schema deferral. Claude's extended thinking with adaptive effort partially automates this at the model level.
- Stripe: Structured "blueprints" that constrain minions' behaviour at each step are an implicit form of context engineering — defining what information agents operate on at each stage.
The Bigger Picture: Harness Engineering
Context engineering is one component of the emerging practice of Harness Engineering — the broader discipline of building and maintaining the system that governs how agents operate. Per Böckeler's follow-up article on martinfowler.com (February 2026), a harness encompasses three concerns:
| Component | What It Means |
|---|---|
| Context engineering | What information the agent sees and when |
| Architectural constraints | Codebase structure, conventions, and guardrails that prevent agents from making unconstrained decisions |
| Garbage collection | Continuously removing dead code, stale docs, and noise so agents don't act on misleading information |
The key insight: the underlying model matters less than the system around it. LangChain improved their coding agent from 52.8% to 66.5% on Terminal Bench 2.0 (Top 30 → Top 5) by changing nothing about the model — only the harness.
Getting Started
If you're running agents and seeing inconsistent results, start here before blaming the model:
- Audit your context at failure points — what was in the window when it went wrong?
- Trim rather than grow — remove tool outputs once they've been acted on
- Add explicit state summaries — "so far we've done X, the invariants are Y, the next step is Z"
- Test with different context sizes — run the same task with a lean vs. full context and compare
- Use subagents for isolation — don't let file-search noise accumulate in your main agent's context
Key Characteristics
| Property | Value |
|---|---|
| Type | Engineering discipline / practice |
| Applies to | Any LLM-powered agent system |
| Related | Prompt engineering, RAG workflows, Harness Engineering |
| Where it matters most | Long-running tasks, multi-step agents, production reliability |
| Term coined by | Tobi Lütke (Shopify CEO), June 2025; popularised by Andrej Karpathy |
| Industry validation | Thoughtworks Tech Radar Vol. 33 (Assess, Nov 2025); martinfowler.com analysis (Feb 2026) |