Building multi-agent systems with Google ADK — a field guide.

Most agentic projects don't need multi-agent. A single well-prompted LLM with three good tools will outperform a carefully orchestrated five-agent pipeline ninety percent of the time. The other ten percent — the ones where one model can't hold the whole problem in its head — are where Google's Agent Development Kit (ADK) earns its weight.

Google's Agent Development Kit (ADK) is an open-source, code-first framework for building production agents. It's model-agnostic (Gemini, Claude, Gemma, Ollama, vLLM, LiteLLM all plug in), deployment-agnostic (local, Cloud Run, GKE, or Vertex AI Agent Engine), and the thing it gets right that most agent frameworks get wrong is treating workflow control as a first-class concept, separate from LLM-driven reasoning. This post is a field guide to the parts that matter.

The mental model: three kinds of agent

Everything in ADK extends a single BaseAgentclass, but the framework split into three categories that map cleanly onto how you actually think about agentic work:

LLM agents (LlmAgent, also aliased as Agent) — the LLM is in the driver's seat. It reasons, picks tools, decides when to hand off to another agent. Use these when the task genuinely benefits from a model's judgment on every turn.
Workflow agents — deterministic. No LLM decides control flow. Three flavors: SequentialAgent, ParallelAgent, LoopAgent. Use these when you know the steps in advance and don't want a model improvising the order on you.
Custom agents — subclass BaseAgent and implement _run_async_impl. Use these when you need control flow neither an LLM nor a built-in workflow can express — typically business rules, gating logic, or integrations that should never depend on a model's mood.

The first design decision in any ADK project is choosing which of these three categories each step in your workflow belongs to. Get that wrong and you spend the rest of the project debugging an LLM's improvisation when a for loop would have done the job.

Composition: three ways agents call each other

ADK gives you three ways to wire agents together, and the difference matters more than the docs make it sound.

1. Sub-agents (LLM-driven delegation)

Pass sub_agents=[...] to a parent LlmAgent and the parent's LLM can call transfer_to_agent to hand off control to a child it judges relevant. The parent uses each child's description field to make that judgment, so write descriptions like dispatch instructions, not marketing copy.

2. AgentTool (explicit invocation)

Wrap an agent in AgentTool and pass it via tools=[...]. Now the parent calls the agent the same way it calls any other tool — with arguments and a return value. The parent stays in control of the conversation. Use this when the child agent is a self-contained capability ("summarize this," "classify that") rather than a destination for the user's session.

3. Workflow agents (deterministic composition)

Pass agents as sub_agents to a SequentialAgent, ParallelAgent, or LoopAgent. Now the order, concurrency, or repetition is fixed by code, not by an LLM's judgment. This is the lever most teams underuse.

A worked example: a document-analysis pipeline

The Google Cloud team's own walkthrough builds a travel concierge — flight, hotel, sightseeing, summary, review. We'll build something different to show the same primitives in a setting more agency clients ask for: a document-analysis pipeline.

Imagine you're building an internal tool for an ops team: feed in a long document — a meeting transcript, a contract, an incident report — and get back structured analysis. Three kinds of work happen:

Independent extractions — entities, sentiment, topic. These don't depend on each other and can run concurrently.
An iterative summary — draft a summary, review it, refine if needed. Bounded loop.
A policy check — does the document contain anything that needs to be flagged before the analysis is shown? Deterministic; should never depend on a model's mood.

Figure 1 · Composition. Solid arrows = pipeline order. Dashed orange = LoopAgent feedback path.

Three layers, three different reasons to use the agent type we picked:

A ParallelAgent wraps the three independent extractors. Concurrency is mechanical — no LLM should decide to run them serially.
A LoopAgent wraps a Drafter and a Reviewer with max_iterations=3. The Reviewer can emit an Event with escalate=True to break the loop early when it's satisfied. This is genuinely one of the cleanest patterns in ADK — bounded self-improvement with a hard cap.
A custom BaseAgent handles policy validation. No LLM judgment; it reads session.state, applies hard rules (regulated terms, PII patterns, banned phrases), and emits an Event with the verdict.

All three live inside one SequentialAgent that also serves as the parent for context passing. Each child can write to session.state via output_key — that's how the Drafter sees what the extractors found, and how the validator sees the final draft.

The code, in 25 lines

Stripped to the essential shape (no instructions, no tool wiring), the pipeline is roughly:

from google.adk.agents import (
    LlmAgent, SequentialAgent, ParallelAgent, LoopAgent,
)
from policy_validator import PolicyValidator   # custom BaseAgent

extract_concurrently = ParallelAgent(
    name="Extract",
    sub_agents=[entity_agent, sentiment_agent, topic_agent],
)

refine = LoopAgent(
    name="DraftAndReview",
    max_iterations=3,
    sub_agents=[drafter, reviewer],
)

pipeline = SequentialAgent(
    name="AnalyzeDocument",
    sub_agents=[extract_concurrently, refine, PolicyValidator()],
)

coordinator = LlmAgent(
    name="Coordinator",
    model="gemini-2.5-flash",
    instruction="Route the user's document to the analysis pipeline.",
    sub_agents=[pipeline],
)

That's the whole thing. Twenty-five lines, four agent types, deterministic where it should be deterministic, LLM-driven where it should be model-driven. This is the shape you want to be reaching for any time you find yourself writing a multi-step agent pipeline.

Sessions, state, memory — the part that bites you

ADK distinguishes three concepts that other frameworks tend to smush together:

Session — one ongoing interaction. A chronological log of Events.
State — the working scratchpad within a session. Where output_key values land, where parent-to-child context passing happens.
Memory — the long-term store across sessions. Searchable knowledge.

The pitfall: people stuff long-term knowledge into session.state and wonder why their agent slows down by turn fifty. State is for the active turn's scratchpad. Anything you want available next month belongs in memory.

Eval is in the box

ADK ships an eval CLI: adk eval <agent> <evalset>. Eval files come in two flavors — .test.json for unit-level cases and .evalset.json for integration scenarios — backed by Pydantic schemas.

Two metrics are wired by default: tool_trajectory_avg_score (did the agent invoke the tools you expected, in the order you expected, threshold 1.0) and response_match_score (ROUGE-1 against a reference response, threshold 0.8). Both are blunt instruments. Pair them with rubric-based metrics for anything where the right answer is judgmental.

We've written about evals as a deliverable elsewhere — the short version is: build the harness on day one, plug it into CI, and refuse to ship without it. ADK's eval CLI gives you the scaffolding for free; the rules of the game don't change.

Deployment: three good options

ADK has three first-class deployment targets, and the choice actually matters:

Vertex AI Agent Engine — fully managed, auto-scaling, built specifically for ADK. The right answer when you want to stop thinking about infrastructure.
Cloud Run — container-based, pay-per-use, scales to zero. The right answer when you want managed compute without the Vertex price floor.
GKE — bring your own cluster. The right answer when you have specific compliance, networking, or GPU-scheduling needs.

For most projects in 2026, start on Cloud Run, graduate to Agent Engine when scale or session-affinity warrants it, and only reach for GKE if a specific constraint forces you there.

Where ADK ends and on-device begins

ADK is a cloud framework. It runs inside one of three Google Cloud surfaces; even with model-agnostic LLM bindings, the orchestration runtime itself is server-side Python. That's the right shape for a lot of agentic problems — but not all of them.

We build a lot of mobile-first products. The pattern that works for us is hybrid:

On the device: the user-facing loop. Voice UX, real-time inference, anything touching personal data. Runs on Apple Foundation Models or a small on-device model via MLX or LiteRT. Never makes a network call unless the user explicitly opts in.
In the cloud, behind ADK: the heavy orchestration. Multi-agent pipelines, long-form reasoning escalations, agent-to-agent coordination across users. ADK on Cloud Run, with sessions keyed by an opaque token the device generates per request — never a user identifier.

The boundary is the most important architectural decision in the system. Done well, the user never knows there's a cloud at all unless they ask for the kind of work that genuinely needs one. Done poorly, you've built a thin on-device wrapper around a cloud product and inherited every downside of cloud-tethered AI.

What to do with this

If you're evaluating multi-agent frameworks, ADK is the most production-shaped option in the field — particularly the clean separation between deterministic workflow agents and LLM-driven agents. Most of the agency-grade pain we've seen with other frameworks comes from blurring that line.

If you're already building on ADK, the highest-leverage moves are: (1) move steps into SequentialAgent and ParallelAgent aggressively — every step that doesn't need a model's judgment shouldn't have one; (2) put any business rule that should never depend on a model in a custom BaseAgent; (3) build the eval harness early.

If you're building something mobile-first, treat ADK as the cloud half of a hybrid system, not the whole stack. The agency that wins in 2026 is the one whose user can pull the Wi-Fi cable out and not notice.