The five patterns
If you've shipped a multi-agent system to production, you've seen the failure modes: a planner that loops forever; a tool worker that returns 30K tokens of JSON; a regulated path that picks a different worker on every run; a runaway loop that bills $500 before anyone notices; a worker that times out and the system silently returns garbage.
Each has a pattern that fixes it. Apply all five and your multi-agent system stops being scary.
1. Planner-worker separation
The single biggest win: separate the agent that decides what to do from the agents that do it. The planner has a small, focused prompt and access to a tool registry; the workers have specialized prompts, narrow tool access, and stronger guardrails.
Concrete benefit: when the planner fails an eval (hallucinates a step, picks the wrong tool), you fix the planner without retraining or revalidating any workers. The workers stay stable. The blast radius of a planner mistake is bounded.
2. Tool-as-policy - the contract is the guardrail
Don't let tool definitions live next to model code. Define every tool with a strongly-typed schema, validate inputs and outputs, and enforce policy at the schema layer (e.g., "refund_amount cannot exceed $5,000 without human approval"). Policy in the schema means policy that survives prompt changes.
const refundTool = defineTool({
name: "issue_refund",
description: "Issue a refund to a customer",
// Policy lives in the schema - prompt changes can't bypass it.
input: z.object({
order_id: z.string().uuid(),
amount_cents: z.number()
.int()
.positive()
.max(500_00, "amount over $500 requires human approval"),
reason: z.enum(["damaged", "wrong_item", "fraud", "other"]),
}),
// Output also typed - prevents \"agent says success, system says nothing\"
output: z.object({
refund_id: z.string().uuid(),
status: z.enum(["completed", "pending_review"]),
}),
// Guardrail enforced regardless of agent decision
policy: async ({ amount_cents }) => {
if (amount_cents > 500_00) return { allow: false, reason: "review_required" };
return { allow: true };
},
});3. Deterministic routing for regulated paths
Regulated workflows - medical advice, financial advice, legal positions - should not be routed by an LLM. Use a deterministic classifier on the inbound request and only invoke the LLM for the body of the response, not the path to it. Auditors love this; LLMs get to do what they're best at without being asked to make compliance decisions.
View data table· Source: Techimax BFSI engagement telemetry 2024–2026
| Series | % correct |
|---|---|
| LLM-only routing (zero-shot) | 84 |
| LLM + regex fallback | 91 |
| Trained classifier + LLM tie-break | 98 |
4. Budget-bounded execution
Every agent invocation gets a token + tool-call budget. The budget is enforced at the orchestrator, not in the prompt. When the budget hits 80%, the orchestrator forces a graceful summary and exit. When it hits 100%, the orchestrator hard-kills and returns a structured error.
Prompt-level budget instructions don't work. The model agrees with them and then ignores them when the planning loop gets interesting. Enforce budgets in code.
5. Graceful degradation, not silent failure
When a worker fails - model timeout, tool 503, hallucinated tool output that can't be validated - the orchestrator should know what to do. The pattern is a degradation tree: try worker A → fall back to worker B → fall back to a deterministic baseline → fall back to a human handoff. Every step is logged; nothing fails silently.
| Level | Action | When |
|---|---|---|
| 1 | Primary worker (Claude 4 Sonnet) | Default path |
| 2 | Fallback worker (Claude 4 Haiku) | Primary 5xx, retry exhausted |
| 3 | Templated response from KB | Both LLMs unavailable; topic in template KB |
| 4 | Human routing with full context | Topic outside KB or escalation requested |
| 5 | Acknowledged outage message | All upstreams unavailable; degraded mode |
What to build Monday
- Audit your current agent: is it planner-worker or one big prompt? If one big prompt, plan the split.
- Move every tool definition into a typed schema with policy. Delete "don't do X" lines from prompts that live in the schema now.
- Identify any regulated paths and replace LLM routing with a classifier.
- Add token + tool-call budgets to the orchestrator. 8K tokens / 12 tool calls is a reasonable starting point for a customer-care agent.
- Draw the degradation tree. Wire each level. Test by killing the primary worker.
References
- [1]Building effective agents - Anthropic engineering blog (2024)
- [2]Patterns for agent orchestration - OpenAI engineering (2025)
Frequently asked questions
Are these patterns provider-specific?
No - they're orchestration patterns, not model patterns. We use them with Anthropic Claude, OpenAI GPT, and open-weight models behind a gateway. The patterns hold; the model behind the patterns can swap.
Doesn't planner-worker separation add latency?
Slightly - typically 200–400ms per call versus a single big prompt. We trade that latency for blast-radius bounding and individual-eval tuning per worker. For customer-facing latency-sensitive paths, we cache the planner output where it's deterministic and skip planning altogether for routine routes.
What's the right framework - LangGraph, CrewAI, custom?
Below the patterns level, framework choice is mostly preference. We've shipped production systems on all three. The orchestration patterns matter more than the framework. If you're starting cold, LangGraph is well-documented and we know it scales; CrewAI is faster for prototypes; a custom thin orchestrator pays off for highly-regulated workloads where every line audits.
How do you test multi-agent systems?
Per-worker eval suites + an end-to-end suite with real-customer-trace cases. The per-worker suites catch local regressions; the e2e suite catches orchestration regressions (planner picking wrong worker, degradation tree mis-firing). You need both.