Technology RadarTechnology Radar

Multi-Agent Workflows

agentmulti-agent
Trial

Multi-agent workflows — coordinating multiple AI agents with specialised roles to tackle complex engineering tasks in parallel — have crossed into Trial. Named organisations are running them in production, the A2A protocol reached v1.0, and purpose-built patterns for parallel agent execution are now standard in major coding tools.

Why It Moved from Assess to Trial

In February 2026, multi-agent workflows were Assess — the pattern was proven at research scale, but few organisations had it working reliably in production. Since then:

  • A2A v1.0 shipped (March 12, 2026): The Agent2Agent Protocol hit v1.0 under Linux Foundation governance, giving multi-agent systems a stable interoperability standard for cross-vendor agent coordination. Tyson Foods and Gordon Food Service are early production users for supply chain coordination.
  • Background coding agents reached scale: Stripe's Minions system is generating 1,300+ PRs/week, Spotify's Honk runs across their entire service fleet, Shopify's Roast framework is open-sourced and in production. These are all multi-agent in the sense of multiple specialised agents running in parallel.
  • Platform support is now standard: Claude Code's Agent Teams (sub-agents with dedicated context windows and git worktrees), OpenAI Codex's parallel cloud sandboxes, and GitHub Copilot's Coding Agent are all generally available. You don't need to build the infrastructure yourself.
  • Harness patterns are documented: The architectural patterns that make parallel agent execution reliable — isolated environments, verification loops, structured workflow definition — are now well-described (see Background Coding Agents and Harness Engineering).

What It Looks Like

Multi-agent workflows take several forms:

Parallel specialised agents (most common)

Role Responsibility
Planner Breaks down a feature request into tasks
Implementer Writes code in isolated branches/worktrees
Tester Generates and runs tests
Reviewer Checks for bugs, security, style

These agents run simultaneously on separate git worktrees, then merge — dramatically speeding up complex feature development.

Sequential agent pipelines (proven in production)

Stripe's Minions pattern: one type of agent per task, using a Blueprint (structured workflow mixing deterministic code with LLM calls). 1,300+ PRs/week.

Cross-vendor agent networks (emerging)

A2A v1.0 enables: a LangGraph orchestrator delegates to a specialist Semantic Kernel agent via the A2A protocol. Neither needs to know how the other is built. This is the frontier — working but not yet widespread.

Tools That Enable This

Tool Multi-agent capability
Claude Code Agent Teams: sub-agents with dedicated context + git worktrees
OpenAI Codex Parallel cloud sandboxes, up to N concurrent tasks
Google ADK Native multi-agent patterns (sequential, parallel, handoff, loop)
Microsoft Agent Framework Orchestration patterns: sequential, concurrent, handoff, group chat
A2A Protocol Cross-vendor agent coordination (v1.0 stable)
CrewAI / LangGraph Python frameworks for custom role-based agent teams
Shopify Roast Open-source YAML orchestration for structured workflows

The Quality Caveat Remains

Research shows 40–62% of AI-generated code contains security vulnerabilities. Google's 2025 DORA Report found that high AI adoption correlates with a 9% increase in bug rates and a 154% increase in PR size. Multi-agent systems amplify both speed and quality risks.

The production deployments that are working (Stripe, Spotify, Shopify) succeed because they treat quality controls as non-negotiable constraints in the workflow — not afterthoughts. If you're not ready to invest in a robust verification loop, multi-agent is not ready for your team.

Why Not Adopt?

  • Most organisations are still in the "single-agent reliability" phase — get one agent working predictably before adding the coordination complexity
  • Cross-vendor A2A deployments (the most powerful use case) have only a handful of production examples
  • Observability tooling for multi-agent systems is immature — debugging failures in a fleet of agents is harder than debugging a single agent
  • Security and compliance audit trails for agent-to-agent data exchange are not standardised

Best Practices (Validated in Production)

Practice Rationale
Narrow task scope per agent Stripe's Minions: one agent type per task class
Verification loop as gate Spotify's Honk: lint → compile → test; don't merge unless green
Isolated environments Separate worktrees/containers; agents cannot interfere
Deterministic orchestration Workflow engine with predefined rules, not open-ended agent reasoning
Human review gates Mandatory for security-sensitive code and infrastructure changes

Key Characteristics

Property Value
Type Architectural pattern / technique
Maturity Production-proven at scale (Stripe, Spotify, Shopify)
Cross-vendor standard A2A v1.0 (March 12, 2026)
Prerequisite Single-agent workflows working reliably first

Further Reading

Assess

Multi-agent workflows — coordinating multiple AI agents with specialised roles to tackle complex engineering tasks in parallel — are moving from research novelty to production reality in 2026.

Why It's in Assess

As of early 2026, 57% of companies run AI agents in production, and organisations like Salesforce and NVIDIA report 90%+ adoption of agentic tools across their engineering organisations. The shift from single-agent assistance to coordinated agent teams is underway, but the practices for doing it reliably are still maturing.

What It Looks Like

Instead of one AI agent handling everything sequentially, multi-agent systems assign specialised roles:

Role Responsibility
Planner Breaks down a feature request into tasks
Architect Designs the technical approach
Implementer Writes the code
Tester Generates and runs tests
Reviewer Checks for bugs, security, style

These agents can run in parallel on separate branches or worktrees, then merge their work — dramatically speeding up complex feature development.

Tools That Enable This

  • Claude Code: Supports spawning sub-agents and parallel execution
  • Windsurf: Parallel Cascade sessions with Git worktrees
  • OpenAI Codex: Parallel sandboxed execution in the cloud
  • MetaGPT: Purpose-built multi-agent framework simulating a product team
  • CrewAI / LangGraph: Python frameworks for building custom agent teams

The Risk: Quality

Research shows 40-62% of AI-generated code contains security vulnerabilities. Google's 2025 DORA Report found that high AI adoption correlates with a 9% increase in bug rates and a 154% increase in PR size. Multi-agent systems amplify both the speed and the quality risks.

Best Practices (Emerging)

  1. Deterministic orchestration: Don't let agents decide what comes next — use a workflow engine with predefined rules
  2. Specialise, don't generalise: Separate agents for planning, implementation, and review outperform one general-purpose agent
  3. Mandatory human review gates: Especially for security-sensitive code and infrastructure changes
  4. Isolated environments: Use worktrees, containers, or sandboxes so agents can't accidentally interfere with each other

Further Reading

  • Anthropic: Building Effective Agents — the most cited practical guide on multi-agent architecture patterns
  • MetaGPT — purpose-built multi-agent framework that simulates a software team
  • CrewAI — Python framework for role-based agent crews
  • LangGraph — graph-based orchestration for stateful multi-agent workflows

Newcomer Note

Start with single-agent workflows before attempting multi-agent orchestration. The complexity increases significantly, and the tooling for reliable multi-agent production workflows is still rapidly evolving.