A VentureStdio Company
ArchitectureFor Head of AIFor VP Engineering

Multi-agent orchestration patterns for the enterprise

Five battle-tested patterns for composing planner, tool, and worker agents at enterprise scale - without losing context, leaking budget, or shipping non-deterministic regulated workflows.

TTechimax EngineeringForward-deployed engineering team12 min readUpdated March 30, 2026

The five patterns

If you've shipped a multi-agent system to production, you've seen the failure modes: a planner that loops forever; a tool worker that returns 30K tokens of JSON; a regulated path that picks a different worker on every run; a runaway loop that bills $500 before anyone notices; a worker that times out and the system silently returns garbage.

Each has a pattern that fixes it. Apply all five and your multi-agent system stops being scary.

1. Planner-worker separation

The single biggest win: separate the agent that decides what to do from the agents that do it. The planner has a small, focused prompt and access to a tool registry; the workers have specialized prompts, narrow tool access, and stronger guardrails.

Concrete benefit: when the planner fails an eval (hallucinates a step, picks the wrong tool), you fix the planner without retraining or revalidating any workers. The workers stay stable. The blast radius of a planner mistake is bounded.

2. Tool-as-policy - the contract is the guardrail

Don't let tool definitions live next to model code. Define every tool with a strongly-typed schema, validate inputs and outputs, and enforce policy at the schema layer (e.g., "refund_amount cannot exceed $5,000 without human approval"). Policy in the schema means policy that survives prompt changes.

Tool definition with policy in the schema, not the promptts
const refundTool = defineTool({
  name: "issue_refund",
  description: "Issue a refund to a customer",
  // Policy lives in the schema - prompt changes can't bypass it.
  input: z.object({
    order_id: z.string().uuid(),
    amount_cents: z.number()
      .int()
      .positive()
      .max(500_00, "amount over $500 requires human approval"),
    reason: z.enum(["damaged", "wrong_item", "fraud", "other"]),
  }),
  // Output also typed - prevents \"agent says success, system says nothing\"
  output: z.object({
    refund_id: z.string().uuid(),
    status: z.enum(["completed", "pending_review"]),
  }),
  // Guardrail enforced regardless of agent decision
  policy: async ({ amount_cents }) => {
    if (amount_cents > 500_00) return { allow: false, reason: "review_required" };
    return { allow: true };
  },
});

3. Deterministic routing for regulated paths

Regulated workflows - medical advice, financial advice, legal positions - should not be routed by an LLM. Use a deterministic classifier on the inbound request and only invoke the LLM for the body of the response, not the path to it. Auditors love this; LLMs get to do what they're best at without being asked to make compliance decisions.

Chart · % correct
Routing accuracy by mechanism (regulated request classification)
View data table· Source: Techimax BFSI engagement telemetry 2024–2026
Series% correct
LLM-only routing (zero-shot)84
LLM + regex fallback91
Trained classifier + LLM tie-break98

4. Budget-bounded execution

Every agent invocation gets a token + tool-call budget. The budget is enforced at the orchestrator, not in the prompt. When the budget hits 80%, the orchestrator forces a graceful summary and exit. When it hits 100%, the orchestrator hard-kills and returns a structured error.

Prompt-level budget instructions don't work. The model agrees with them and then ignores them when the planning loop gets interesting. Enforce budgets in code.

5. Graceful degradation, not silent failure

When a worker fails - model timeout, tool 503, hallucinated tool output that can't be validated - the orchestrator should know what to do. The pattern is a degradation tree: try worker A → fall back to worker B → fall back to a deterministic baseline → fall back to a human handoff. Every step is logged; nothing fails silently.

LevelActionWhen
1Primary worker (Claude 4 Sonnet)Default path
2Fallback worker (Claude 4 Haiku)Primary 5xx, retry exhausted
3Templated response from KBBoth LLMs unavailable; topic in template KB
4Human routing with full contextTopic outside KB or escalation requested
5Acknowledged outage messageAll upstreams unavailable; degraded mode
Degradation tree for a customer-care agent

What to build Monday

  1. Audit your current agent: is it planner-worker or one big prompt? If one big prompt, plan the split.
  2. Move every tool definition into a typed schema with policy. Delete "don't do X" lines from prompts that live in the schema now.
  3. Identify any regulated paths and replace LLM routing with a classifier.
  4. Add token + tool-call budgets to the orchestrator. 8K tokens / 12 tool calls is a reasonable starting point for a customer-care agent.
  5. Draw the degradation tree. Wire each level. Test by killing the primary worker.

References

  1. [1]Building effective agents - Anthropic engineering blog (2024)
  2. [2]Patterns for agent orchestration - OpenAI engineering (2025)

Frequently asked questions

Are these patterns provider-specific?

No - they're orchestration patterns, not model patterns. We use them with Anthropic Claude, OpenAI GPT, and open-weight models behind a gateway. The patterns hold; the model behind the patterns can swap.

Doesn't planner-worker separation add latency?

Slightly - typically 200–400ms per call versus a single big prompt. We trade that latency for blast-radius bounding and individual-eval tuning per worker. For customer-facing latency-sensitive paths, we cache the planner output where it's deterministic and skip planning altogether for routine routes.

What's the right framework - LangGraph, CrewAI, custom?

Below the patterns level, framework choice is mostly preference. We've shipped production systems on all three. The orchestration patterns matter more than the framework. If you're starting cold, LangGraph is well-documented and we know it scales; CrewAI is faster for prototypes; a custom thin orchestrator pays off for highly-regulated workloads where every line audits.

How do you test multi-agent systems?

Per-worker eval suites + an end-to-end suite with real-customer-trace cases. The per-worker suites catch local regressions; the e2e suite catches orchestration regressions (planner picking wrong worker, degradation tree mis-firing). You need both.

Talk to engineering

Ready to ship the patterns from this post?

Tell us where you are. A senior forward-deployed engineer replies within 24 hours with a written plan tailored to your stack - never an SDR.

  • Practical engineering review of your current setup
  • Eval discipline + observability + cost controls
  • Free 60-min working session, no sales pitch

Senior reply within 24h

Drop your details and we'll match you with an engineer who's shipped in your industry.

By submitting, you agree to our privacy policy. We'll never share your information.