When do you actually need a platform?
Most organizations don't need a platform until they have 3+ production agents. Below that, the platform overhead exceeds the benefit. Above 3, the duplication cost compounds - every new agent re-solves orchestration, evals, identity, audit. Gartner forecasts that by 2026, 40% of enterprise applications will include task-specific AI agents [3], which puts most digital-native enterprises past the platform threshold.
The decision rule we share with CTOs: you need a platform when (a) two or more agents share users, (b) compliance asks for a single audit trail across agents, or (c) the third agent is taking longer to ship than the first two combined. Any of these signals duplication that compounds.
What separates point automations from a platform
- Shared runtime
All agents share orchestration primitives - planner-worker, tool registry, retry policies, budget caps. New agents inherit them; bug fixes propagate.
- Shared eval discipline
Eval suites compose. A common refusal-policy eval applies to every agent; per-agent suites layer on top. The org's eval bar is consistent.
- Shared identity model
User identity propagates across agents and tools. The agent for sales sees what the salesperson can see - not more. Identity isn't reinvented per agent.
- Shared audit
One audit stream for every agent action. Compliance reviews one log; on-call debugs in one APM; finance budgets against one decomposition.
View data table· Source: Techimax engagement data 2023–2026; comparable scopes
| Series | USD per agent (K) |
|---|---|
| Agent #1 | 200 |
| Agent #1 | 220 |
| Agent #5 | 380 |
| Agent #5 | 180 |
| Agent #10 | 520 |
| Agent #10 | 140 |
| Agent #20 | 700 |
| Agent #20 | 100 |
What not to do
- Don't build a 12-month "AI platform" before shipping agents. Ship 2–3 agents first; then extract the platform from what they share.
- Don't centralize agent ownership in a platform team that doesn't ship product agents. Centralize the platform; embed the engineers in product.
- Don't pick a framework as the platform. The framework is one layer of the runtime - the platform is also evals, identity, and audit.
- Don't ship the platform before the first agent's runbooks. Platform reusability matters less than first-agent operability.
Reference architecture for an agentic platform
An agentic platform we ship looks like this in layers: at the bottom, a model gateway with provider adapters [1]; above it, a tool registry with shared schemas, idempotency, and contract versioning; above that, the orchestration runtime (planner-worker, retries, budget caps); identity and authorization woven through every layer; an eval framework that composes (org-level + per-agent suites); and one audit stream out to the customer's existing log infrastructure.
Each layer is independently testable, independently versionable, and independently swappable. The customer's product engineers ship agents into this platform in days, not weeks - because orchestration, identity, audit, and eval discipline are inherited, not rebuilt.
| Layer | Owns | Provides to agent engineers |
|---|---|---|
| Model gateway | Provider adapters, routing, cost caps, retries | One interface; eval-driven routing |
| Tool registry | Schemas, idempotency, versioning, fuzzing | Strongly-typed tools out of the box |
| Orchestration runtime | Planner-worker, budgets, circuit breakers | Compositional patterns; no DIY orchestration |
| Identity layer | RBAC, OAuth, tenant isolation | Authz that propagates correctly |
| Eval framework | Org evals + composition + CI gating | Per-agent suites that inherit refusal policy |
| Audit stream | Append-only events; lineage; retention | One log surface; compliance-ready |
View data table· Source: Techimax engagement analysis 2024–2026
| Series | Value |
|---|---|
| Domain logic + prompts | 62 |
| Agent-specific evals | 18 |
| Surface UI integration | 12 |
| Platform extension (rare) | 5 |
| Operational handoff | 3 |
Build vs buy on platform layers
Build the discipline; buy or use OSS for the heavy infrastructure. The discipline (eval-gated CI, runbook templates, cross-agent identity propagation, OWASP coverage) is your differentiator and embeds your business constraints. The infrastructure (vector DB, model gateway, observability backend, identity provider) is commodity - pick best-of-breed and integrate.
Concrete pattern: pgvector or Pinecone for vectors, OpenTelemetry for traces [2], your existing IdP (Okta, Azure AD, Auth0) for identity, MCP servers for tool standards [4]. The platform team writes the integration glue and the discipline; nobody re-implements vector indexing.
Point automations duplicate work; platforms amortize it. The 11th agent on a platform costs less than the 5th - because the platform did the hard parts already.
How big should the platform team be?
We size platform teams against agent count: 2 engineers for 5 agents, 3–4 for 5–15 agents, 5–8 for 15+ agents. The platform team writes the rituals and maintains the layers; product engineers ship the agents on top. This shape mirrors Team Topologies' Platform team pattern [5] - enabling stream-aligned product teams without owning their work.
Anti-pattern: the platform team that becomes the bottleneck. If product engineers can't ship an agent without platform-team approval, the platform is gating instead of enabling. Platform teams should publish docs, support channels, and self-service onboarding; approval should live in the eval suite and the audit stream, not in a platform-team queue.
References
- [1]Platform engineering - Internal Developer Platform community (2025)
- [2]OpenTelemetry GenAI semantic conventions - OpenTelemetry SIG (2025)
- [3]Predicts 2026: Generative AI - Gartner (2025)
- [4]Model Context Protocol specification - Anthropic (2024)
- [5]Team Topologies - Skelton & Pais (IT Revolution) (2024)
- [6]Multi-agent system design patterns - Anthropic engineering (2025)
Frequently asked questions
When should we extract a platform?
Around agent #3. After two production agents, the shared patterns are visible; before two, you don't know which patterns matter.
Build vs buy on the platform layers?
Build the discipline (evals, eval-gated CI, telemetry); buy or use OSS for the heavy infra (vector DB, observability, model gateway). The discipline is the differentiator; the infra is commodity.
How big should the platform team be?
2–4 engineers for organizations running 5–15 agents. Scale linearly with agent count. The platform team writes the rituals; the product engineers ship the agents.
Should we use LangGraph, LlamaIndex, or build custom?
Frameworks are useful for orchestration patterns; use what fits. The platform sits above the framework - eval discipline, identity, audit don't come from frameworks. We've shipped on LangGraph, custom orchestrators, MCP-based stacks, and Bedrock Agents; the platform layer above is consistent.
How does this relate to MLOps platforms (Vertex, SageMaker)?
Complementary. MLOps platforms cover model training, registry, and serving. Agentic platforms cover the orchestration layer above - tool calling, prompts, multi-step workflows, agent-specific evals. We use both: MLOps for the model lifecycle, agentic platform for the application layer.
What's the right governance model for the platform?
Platform team owns the layers and the SLAs; product teams own the agents and their business outcomes. Compliance reviews the audit stream; security reviews the gateway; finance reviews the FinOps dashboard. One platform; many stakeholders; clear interfaces.
How do we handle multi-tenant isolation across agents?
Tenant ID propagates from the identity layer through every retrieval, tool call, and audit event. Enforced at the gateway and the retrieval layer; never trusted to the LLM. Same pattern across all agents on the platform - that's a shared-layer benefit.
What about agent-to-agent communication?
Treat as cross-trust-boundary. Agent A calling agent B is an external tool call, with schema validation, identity propagation, and audit on both sides. Don't let agents share memory directly; communicate through typed contracts. This sounds bureaucratic but stops 90% of multi-agent failure modes [6].