Multi-Agent Architectures on AWS — A Founder's Playbook

AWS Marketplace just published the production playbook for multi-agent AI systems. Here is the founder-grade translation.

AI

Cisco AriasFounder & CEO11 min read
Multi-Agent Architectures on AWS — A Founder's Playbook

AWS Marketplace just published a workshop deck called Multi-Agent Architectures: Building Agentic Systems on AWS. It is the most concrete, opinionated, production-grounded guidance on multi-agent AI I have seen come out of the major clouds — and it is squarely aimed at platform teams at large enterprises. The presenters, Dr. James Bland and Mike Brugnoni, have clearly shipped enough of these systems to know exactly where they fall apart.

I read it so you don't have to. Then, in the spirit of the last post — where HBR and AWS called out "AI-powered automation" as one of four pillars of modern engineering — I translated this one for the only audience I really care about: founders and operators who actually have to build this, ship it, and keep it from melting the company's reputation when it hallucinates.

First: do you actually need multi-agent?

The deck is refreshingly disciplined here. The default answer is no. A single agent with a good system prompt, the right tools, and durable memory will solve more problems than most engineers want to admit, with far less operational pain.

AWS lays out four triggers that, when they start firing in combination, mean it is time to graduate to multi-agent:

  1. Context saturation. Your single agent's prompt is now 40 KB of tool definitions and few-shot examples. It is forgetting things. Output quality drops as conversation length grows.
  2. Task specialization. The work spans genuinely different domains — security review, infra generation, frontend code, billing analysis — each of which benefits from its own prompt, tool list, and retrieval index.
  3. Latency / parallelism. Steps are independent and could happen at the same time, but a single agent is doing them in serial. Wall-clock time is hurting the product.
  4. Fault isolation. A failure in one capability (the codegen agent times out) should not take down the rest of the system (the user can still ask the documentation agent a question).

This is the same lesson microservices taught us a decade ago and that every senior engineer has lived through twice. Distributed systems are a tax you pay to solve a problem you actually have. Pay it on purpose.

The four planes — document these before you write any code

This is the most useful frame in the entire deck. Every production multi-agent system, AWS argues, has four distinct architectural planes. They map almost one-to-one onto AWS services, but the frame holds regardless of cloud.

Four planes of a multi-agent system: Control plane (orchestration), Execution plane (specialized agents), State plane (shared memory), and Capability plane (tools and MCP servers).

The four planes of a production multi-agent system. Adapted from AWS Marketplace, 2026.

  • Control plane. The orchestrator. Decomposes a request into subtasks, decides which agent gets what, monitors progress, and synthesizes results. On AWS: Bedrock Multi-Agent or Step Functions. Critically, it directs — it does not execute domain work itself.
  • Execution plane. The specialized agents. Each one has a narrow system prompt, a curated tool list, and its own retrieval index. They scale independently. One job, done well.
  • State plane. Shared memory. Durable task state in something like DynamoDB, fast session context in ElastiCache, domain knowledge in a vector store / Bedrock Knowledge Bases, archival logs in S3. Agents coordinate through shared state, not by calling each other directly.
  • Capability plane. Tools and MCP servers. Standardized, discoverable, governed at the server level with per-agent access control. Not buried in system prompts.

The deck's biggest piece of upfront advice is: draw this diagram for your system, by name, before you write the first agent. If you can't put a service in each of the four boxes, you do not yet have an architecture — you have a demo.

The four orchestration patterns — and where each one breaks

Once you have the four planes, the next decision is how the control plane delegates. AWS breaks this into four patterns, each with a sharp-edged failure mode founders should know about before they pick one:

1. Centralized supervisor

One supervisor agent owns task decomposition end-to-end. Easy to reason about, easy to debug. The supervisor is also a single point of failure and a single bottleneck. AWS pairs this with Bedrock Multi-Agent. Great for chat-shaped workloads. Risky for high-throughput pipelines.

2. Skill-based dispatch

A thin router classifies the request and dispatches to a specialist agent — usually exposed as Lambda functions behind MCP servers. Fast, cheap, predictable. The trap is rigidity: skills don't compose, and anything cross-domain falls back into the router, which becomes a god object.

3. Handoff chains

Agent A finishes, hands a structured payload to agent B, which hands to agent C. Easy with Step Functions and Choice states. Wonderful for well-defined pipelines (e.g., "ingest → classify → enrich → notify"). Becomes unmanageable the moment requirements bend the linear flow into a DAG. You will end up rewriting it.

4. Parallel fan-out and synthesis

EventBridge or SQS triggers a fleet of agents in parallel; a synthesizer agent merges their outputs. This is the highest-throughput pattern and the right answer for research-style workloads. It is also the one with the most underbuilt piece in practice: synthesis. Almost everyone writes "and then the synthesis agent merges the results" on a whiteboard and discovers in production that "merging" is the entire problem.

The math founders don't want to hear

This is the slide that should be required reading for every founder pitching an "AI agent that does X, Y, and Z" demo.

Non-determinism compounds. If each agent in your pipeline is correct 90% of the time, a 4-agent pipeline is correct 65.6% of the time. If each agent is at 95%, the pipeline is 81.4%. To hit 96% end-to-end across 4 agents, each individual agent has to be at 99%.

Compounding accuracy across multi-agent pipelines. 90 percent per-agent drops to 66 percent across four agents; 95 percent drops to 81 percent; 99 percent stays at 96 percent.

The compounding-accuracy curve every multi-agent founder should have on the wall. Math: SAMO. Source frame: AWS Marketplace, 2026.

The deck names three defenses, and we have shipped variants of all three:

  • Schema at every boundary. Pydantic / JSON Schema on every input and every output. If it doesn't validate, it doesn't proceed — period.
  • Abstention as a feature. An agent that returns { "answer": null, "reason": "..." } is dramatically better than an agent that confidently makes something up. Train and prompt for this explicitly.
  • End-to-end evals. AWS specifically calls out AgentCore Evaluations and the Builtin.GoalSuccessRate metric. The pattern matters more than the tool — evaluate the workflow, not just each agent in isolation.

MCP vs A2A — different problems, different protocols

This is the single piece of the deck that I think will save founders the most pain over the next 18 months, because the terminology is genuinely confusing and the two protocols are often discussed interchangeably online. They are not interchangeable.

  • MCP (Model Context Protocol) is agent-to-tool. It is a client-server protocol. An agent (the client) discovers and calls capabilities (the server). Access control is enforced at the server level, per agent. This is the right shape for "my coding agent needs to call the GitHub API."
  • A2A (Agent-to-Agent) is agent-to-agent. Peer-to-peer. Uses Task objects and "agent cards" for discovery. This is the right shape for "my orchestrator needs to delegate a subtask to another autonomous agent that may itself decompose the problem."

Zero trust at every boundary — the five risks

Multi-agent systems multiply the attack surface in ways traditional services don't. The deck lists five risks worth memorizing:

  1. Overprivileged subagents. A specialist that needs read-only access to one S3 bucket somehow ended up with the orchestrator's IAM role. Fix: per-agent STS-issued credentials, never the parent role. Secrets in a real secrets manager (Bedrock + CyberArk in their example; AWS Secrets Manager or HashiCorp Vault in the wild).
  2. Prompt-injection propagation. A retrieved document contains adversarial instructions. The first agent passes them downstream. By the time the synthesizer reads them, the system has been jailbroken three hops back. Fix: Bedrock Guardrails (or equivalent input/output filters) at every boundary, not just the entry point.
  3. Conflicting outputs. Two parallel agents return contradictory answers and the synthesizer picks the wrong one. Fix: explicit conflict-resolution logic. For safety-critical decisions, the conservative answer wins by default.
  4. Credential exposure in handoffs. An agent puts raw credentials in a tool-call result that gets logged by the orchestrator. Fix: IAM roles and short-lived STS tokens. Never raw credentials in the handoff payload. Ever.
  5. Unauthorized agent discovery. Any agent on the network can call any other agent. Fix: an authoritative agent-card registry with identity-aware access control (Ping Identity in their example; whatever your IdP is in yours).

If your security posture is still maturing, we wrote a startup-grade primer in Zero Trust Security: The Startup Guide.

Failure modes nobody warns you about

Beyond the standard distributed-systems failures, the deck calls out four that are unique to multi-agent:

  • Cascading failures. One agent fails, the next agent retries forever, the orchestrator times out, the user sees nothing. Fix: circuit breakers between agents, durable workflow state (the deck names Temporal for resumability), and explicit per-edge timeouts.
  • Orchestration loops. Agent A asks B, B asks A back, infinite ping-pong. Fix: Step Functions MaxExecutionTime and explicit step-count limits on every workflow. Hard caps, not advisory.
  • Conflicting parallel outputs. See the synthesis problem above. Fix: build the synthesis logic before you turn on parallelism, not after. The synthesizer is the most important agent in any fan-out system.
  • Context corruption. A handoff payload is malformed (the model emitted invalid JSON). Downstream agents inherit garbage and produce confidently wrong output. Fix: schema validation on every handoff, with the validator's failure being a first-class workflow event — not a silent log line.

Observability is non-negotiable

You cannot debug a four-agent system by reading logs. You need distributed tracing — trace IDs propagated through every handoff, every tool call, every retrieval — surfaced in CloudWatch (or whatever your APM is) alongside LLM-specific tools like LangSmith and LangFuse.

And evaluation, the deck argues, is layered:

  • Per-agent evals. Does each agent meet its quality bar on its own?
  • Workflow evals. Does the full system hit Builtin.GoalSuccessRate end-to-end?
  • Differential evals. When you change a prompt or swap a model, does end-to-end quality go up or down? Run this before every prompt change.

Without all three, you are flying blind and shipping vibes. With all three, you have a real engineering practice.

The six-step founder checklist

If you walk away from this post with nothing else, walk away with this. AWS closes the workshop with six concrete next steps. I've reordered them slightly to match how I would actually sequence the work for a startup:

  1. Document the four planes by name for your system. Pin the diagram on the wall before the first commit.
  2. Pick exactly one orchestration pattern for the workflow you are building. Don't mix.
  3. Set up shared state on day one. DynamoDB for durable workflow state, ElastiCache for fast session context, a vector store for retrieval. Add S3 for archival.
  4. Schema-first handoffs. Define every inter-agent payload as JSON Schema before you write the prompts. Validate at every edge.
  5. Distributed tracing before debugging. Wire trace propagation through every handoff before the first incident. You will thank yourself at 3am.
  6. Least-privilege per agent. Each agent gets its own STS-issued credentials. Never pass the orchestrator's credentials downstream. Ever.

Do those six things and you will be ahead of probably 80% of the multi-agent systems being built in production right now — including some at companies with much, much bigger budgets than yours.

The bottom line

Multi-agent systems are not magic. They are distributed systems with a probabilistic substrate. The "distributed systems" part is well understood — we have 20 years of patterns for it. The "probabilistic substrate" part is the new tax, and the AWS deck is the clearest articulation yet of what that tax actually costs and how to budget for it.

If you are evaluating whether your product needs a multi-agent backend, or you have already shipped one and the on-call rotation is getting ugly, that is exactly the conversation we have most weeks. Sometimes it ends with "you don't need multi-agent yet." Sometimes it ends with a nearshore pod, a clean four-planes diagram, and a path to production. Either way it ends with the right answer for the company you are actually building.

If you want a second opinion before you commit, come talk to us. And if you want the strategic frame around this in the broader context of modern engineering, the Four Pillars piece is the natural companion to this one.

Build your team with SAMO.

Senior nearshore engineers, vetted and shipping in two weeks. Start a conversation — no pressure, no recruiter spam.

Get a free 15-min consult