Building agents with the Claude Agent SDK — a field guide.

If you've used Claude Code — Anthropic's coding agent CLI — you've already seen the Claude Agent SDK in action. The SDK is the same harness, the same tool loop, the same context-management plumbing, exposed as a library you can build on top of. It's available as claude-agent-sdk on PyPI and @anthropic-ai/claude-agent-sdk on npm.

It's a meaningfully different proposition from Google ADK. ADK gives you composition primitives — sequential, parallel, loop, custom — and asks you to wire them up. Claude's SDK gives you one well-instrumented agent loop and asks you to give it good tools and clear hooks. Different philosophies, both correct in different situations.

The mental model: an agent loop, not a workflow graph

Anthropic's own framing, from Building Effective AI Agents: "The most successful agent implementations use simple, composable patterns rather than complex frameworks." The SDK embodies this. There is no SequentialAgent wrapping a ParallelAgent wrapping a LoopAgent. There is one agent, running one loop, with the right tools and the right hooks.

The loop has four phases:

Gather context. File search, semantic search, subagents, web fetch, MCP queries.
Take action. Bash commands, file edits, API calls, custom tool invocations.
Verify work. Linting, tests, an LLM-as-judge subagent, hook-driven assertions.
Iterate. If verification fails, gather more context and try again — until done or a turn cap is hit.

The lever Claude Agent SDK gives you is not workflow composition. It is a single autonomous loop, instrumented at every phase, with the freedom to add or constrain capability through tools, hooks, and skills.

The primitives

Two entry points. Use them in different situations.

`query()` — for one-shot tasks

An async generator. You give it a prompt and options; it runs the agent loop and yields messages until done. Good for non-interactive tasks: a one-shot script, a CI step, a scheduled job.

`ClaudeSDKClient` — for sessions

A persistent client. async with opens a session; repeated client.query() calls build a multi-turn conversation; client.receive_response() streams events back. Use this when the user is going to keep typing.

`ClaudeAgentOptions` — the dial

The single options object that shapes everything: which tools the agent can use (allowed_tools), which it absolutely cannot (disallowed_tools), the permission_mode, the system prompt, the model, the thinking budget, hook matchers, MCP server connections. Most of the engineering effort on a Claude SDK agent goes into shaping this object well.

Tools: ten in the box, the rest you bring

The SDK ships ten or so built-in tools — the same ones Claude Code uses: Read, Write, Edit, Bash, Glob, Grep, WebSearch, WebFetch, AskUserQuestion, plus an Agent tool for spawning subagents. The first decision on any new agent is which of these you allow. The allowlist is your security model.

Custom tools come in via the @tool decorator and create_sdk_mcp_server() — the SDK turns your Python or TypeScript functions into in-process MCP servers the agent can invoke. The naming convention is uniform: a built-in tool is called by its bare name (WebSearch), an MCP-served tool by mcp__<server>__<tool>. That uniformity is the point — the agent doesn't care whether a tool is local or remote, built-in or custom.

Hooks: the part most teams underuse

This is where Claude Agent SDK genuinely differs from every other agent framework on the market. Hooks fire at specific points in the loop and can observe, modify, or block the agent's behavior:

PreToolUse — before any tool is invoked. Block dangerous bash commands. Strip PII from arguments. Require a permission prompt for sensitive operations.
PostToolUse — after a tool returns. Inject additional context the agent needs. Reformat noisy output. Trigger a side-effect (logging, audit).
UserPromptSubmit — before the prompt reaches the model. Validate, expand, or rewrite. Useful for command-style interfaces ("/summarize URL" becomes a richer prompt).
Stop — when the agent decides it's done. The last chance to inspect the result, run a final verification, or extend the conversation.
PreCompact, SubagentStart, SubagentStop, SessionStart, SessionEnd, Notification, PermissionRequest — and a dozen more for niche situations.

Most teams reach for hooks late, after they've already had an incident. The teams that ship clean reach for them first, before the agent has done anything dangerous.

A worked example: a research agent that verifies itself

A canonical Claude SDK use case: take a research question, gather context from the web and from a local document store, draft an answer, have a subagent grade it, iterate until the grade clears a threshold. All four loop phases get exercised.

Figure 1 · The Claude Agent SDK loop. Solid arrows = success path. Dashed orange = iteration on failure.

The code, in 22 lines

import asyncio
from claude_agent_sdk import (
    query, ClaudeAgentOptions, HookMatcher,
)

async def block_dangerous_bash(input_data, tool_use_id, ctx):
    cmd = input_data.get("command", "")
    if any(b in cmd for b in ["rm -rf /", "curl | sh", ":(){"]):
        return {"hookSpecificOutput": {"permissionDecision": "deny"}}
    return {}

options = ClaudeAgentOptions(
    allowed_tools=["Read", "Grep", "WebSearch", "WebFetch", "Bash"],
    permission_mode="default",
    hooks={"PreToolUse": [HookMatcher(matcher="Bash", hooks=[block_dangerous_bash])]},
)

async def main():
    async for msg in query(prompt="Research and summarize the latest LiteRT release notes; cite sources.", options=options):
        if hasattr(msg, "result"):
            print(msg.result)

asyncio.run(main())

That's a complete agent. It can search the web, fetch pages, read local files, grep them, and run shell commands — but it cannot run a shell command that contains rm -rf /, because the PreToolUse hook intercepts and denies. That's the shape every production Claude SDK agent should have: an allowlist of capabilities and a hook layer that enforces the rules capability alone can't.

Skills: capability you can ship as Markdown

Skills are the SDK's answer to "how do I give Claude a recipe for handling X?" They're Markdown files (SKILL.md) with YAML frontmatter, living at .claude/skills/ in your project or ~/.claude/skills/ globally. Each one describes when to use it, what it knows, and any reference workflow. Claude discovers them automatically when relevant — no registration code, no API call.

The mental model: tools are what the agent can do; skills are what the agent knows how to do well. A bash tool lets Claude run shell commands; a "deploy-this-service" skill tells Claude the specific sequence and gotchas of deploying your service.

Subagents: same loop, different prompt

Spawn a subagent by including Agent in the allowed tools and providing one or more AgentDefinitions. Each subagent runs the same loop with its own system prompt and tool allowlist. Most useful for: an LLM-as-judge that grades outputs without seeing the orchestrator's context, a specialist that has access to a sensitive tool the parent shouldn't, or a cost-optimized smaller-model worker.

MCP: first-class, not wrapped

Claude Agent SDK is the most MCP-native framework in the market. Connect a server with two lines in options.mcpServers; tools show up as mcp__<server>__<name> alongside the built-ins, indistinguishable to the agent. The same create_sdk_mcp_server() helper lets you publish your own MCP servers without standing up a separate process.

This is one of the strongest reasons to choose Claude SDK if your architecture is already MCP-shaped (or you intend to make it that way). Other frameworks treat MCP as one integration option among many; Anthropic treats it as a primitive.

Where Claude Agent SDK and Google ADK diverge

Both are good. They're aimed at different problems.

ADK shines when your problem decomposes cleanly into named workflow phases — sequential then parallel then loop, with deterministic control flow. The worked example in our ADK post (a document analyzer with three concurrent extractions, a bounded refinement loop, and a deterministic policy validator) is canonical ADK shape.
Claude Agent SDK shines when one well-instrumented agent with the right tools and the right guardrails can carry the task. Coding agents, research assistants, support-triage agents, documentation generators — anywhere the task is exploratory but well-bounded.
If you're committed to Claude and want the production-shaped harness Anthropic uses internally, Claude SDK gets you there faster. If you need to swap models or run heterogeneous agents, ADK's model-agnostic posture matters more.

Cost, latency, and the things that bite you

Claude Sonnet 4.5 is the workhorse model behind most agent SDK use cases right now. A typical task — research a topic with two web fetches, draft a 500-word answer, verify with a subagent — runs in the high-single-digit cents to low double digits. Long-running agentic tasks (Anthropic has reported Sonnet 4.5 maintaining focus across 30-hour multi-step tasks) compound that linearly.

Two practical levers for cost: prompt caching (transparent, but requires repeated context — well-suited to sessions) and extended thinking budget (configurable via thinking in options; trade thinking tokens for output quality on hard tasks). Most teams miss the first one and over-spend on the second.

Where on-device fits

The Claude Agent SDK is a cloud framework. Anthropic doesn't ship an on-device model. So the same hybrid pattern from our on-device manifesto applies, sharper:

On the device: the user-facing loop, voice UX, anything personal. Apple Foundation Models and small on-device LLMs handle the responsive surface.
Behind a Claude SDK agent: the heavy reasoning, the long tool sequences, the multi-source synthesis. Reached only when the on-device tier escalates.

The boundary is the same one we draw with ADK. The difference: Claude Agent SDK's hook layer makes it easier to enforce the rule that an opaque request token, not the user's identity, is what crosses that boundary.

What to do with this

If you're evaluating agent SDKs, the honest answer is: use Claude Agent SDK if your task is one well-bounded agentic loop, use ADK if your task is a workflow graph of multiple agents, and use both if you have both shapes in your product. There is no rule that says you can't.

If you're already on Claude SDK, the highest-leverage moves are: (1) audit your allowed_tools — most agents we see are over-permissioned; (2) put hooks at every layer that touches sensitive data, not just the obvious ones; (3) move repetitive instructions into Skills so the prompt stays clean; (4) build the eval harness early.

Building agents with the Claude Agent SDK — a field guide.

The mental model: an agent loop, not a workflow graph

The primitives

`query()` — for one-shot tasks

`ClaudeSDKClient` — for sessions

`ClaudeAgentOptions` — the dial

Tools: ten in the box, the rest you bring

Hooks: the part most teams underuse

A worked example: a research agent that verifies itself

The code, in 22 lines

Skills: capability you can ship as Markdown

Subagents: same loop, different prompt

MCP: first-class, not wrapped

Where Claude Agent SDK and Google ADK diverge

Cost, latency, and the things that bite you

Where on-device fits

What to do with this

Further reading

Every engagement ships with an eval harness.

Building agents with the Claude Agent SDK — a field guide.

The mental model: an agent loop, not a workflow graph

The primitives

query() — for one-shot tasks

ClaudeSDKClient — for sessions

ClaudeAgentOptions — the dial

Tools: ten in the box, the rest you bring

Hooks: the part most teams underuse

A worked example: a research agent that verifies itself

The code, in 22 lines

Skills: capability you can ship as Markdown

Subagents: same loop, different prompt

MCP: first-class, not wrapped

Where Claude Agent SDK and Google ADK diverge

Cost, latency, and the things that bite you

Where on-device fits

What to do with this

Further reading

Every engagement ships with an eval harness.

`query()` — for one-shot tasks

`ClaudeSDKClient` — for sessions

`ClaudeAgentOptions` — the dial