ArkvectorAI logoArkvectorAIStart a project
← Insights
Frameworks

Building agents with LangGraph — a field guide.

LangGraph is the most-used production agent framework in the field. Here's the mental model, the primitives that earn their weight, and the durable-execution story most teams pick it for.

May 3, 2026·13 min read

If you survey what's actually running in production agent deployments today, LangGraph wins by volume. LinkedIn, Klarna, Replit, Uber, Elastic, and a long list of public companies cite it by name in production case studies. It's also the framework most teams reach for when they outgrow LangChain's original AgentExecutor.

The reason isn't flashy. LangGraph is opinionated about two things most agent frameworks under-emphasize: graph-as-control-flow (your agent is a graph of nodes, not a pile of prompts) and durable execution (every step is checkpointed, every workflow can resume from any point, every conversation survives a server restart). LangGraph 1.0 went GA in October 2025 — the first stable major release in the durable-agent framework space. This post is a field guide to the parts that actually matter.

The mental model: an agent is a graph

LangGraph models an agent as a state machine: a graph of nodes (steps that read and update state) connected by edges (transitions, sometimes conditional). State flows through. The agent loop is just an edge that loops back.

This sounds abstract but it's the cleanest model for a specific failure mode: agent loops that need to make non-trivial control-flow decisions. Should we re-prompt with more context? Hand off to a specialist? Stop and ask a human? In LangChain's original AgentExecutor, those decisions lived inside an opaque ReAct loop. In LangGraph, they're first-class edges you can see, edit, and inspect.

LangChain's own framing: LangGraph "exposes the logic of AgentExecutor in a far more natural and modifiable way." That's the lever.

The core primitives

  • StateGraph — the builder. You add nodes and edges, then compile() into a runnable agent.
  • MessagesState — the prebuilt typed state with a messages key and an add_messages reducer (so updates append rather than overwrite). Use this 80% of the time.
  • START, END — sentinel nodes for entry and exit.
  • add_node(), add_edge(), add_conditional_edges() — wire the graph. Conditional edges take a function that returns the name of the next node based on state.
  • create_agent() (1.0+) — the prebuilt ReAct agent factory. Most projects start here, drop down to StateGraph when the prebuilt isn't flexible enough.
  • InMemorySaver / SqliteSaver / PostgresSaver — checkpointers. Pass one to compile(checkpointer=...) and the graph persists state at every step.

The killer feature: durable execution

LangGraph's persistence layer is the reason most teams pick it for production. From the official docs: "LangGraph has a built-in persistence layer that saves graph state as checkpoints. When you compile a graph with a checkpointer, a snapshot of the graph state is saved at every step of execution."

Concretely: every conversation has a thread_id. Every step writes a checkpoint. If your server restarts mid-conversation, the next request with the same thread ID picks up exactly where it left off. If a long-running workflow gets interrupted, it resumes from the last completed node. Time-travel is a feature: graph.get_state_history(config) returns ordered snapshots so you can replay, branch, or roll back.

Human-in-the-loop is the same primitive: an interrupt at a node pauses the graph; the human reviews, modifies state if needed, and resumes. No bespoke infrastructure required.

A worked example: a research agent with a supervisor

A canonical multi-agent pattern: a supervisor that routes to specialists. LangGraph ships langgraph-supervisor as a separate package; it wraps a few specialist agents and gives the supervisor a tool to hand off to each. State (conversation history) is shared.

LANGGRAPH · SUPERVISOR + SPECIALIST GRAPHCHECKPOINTER · STATE PERSISTED AT EVERY NODESTARTSupervisorcreate_supervisor()routes via tool callshandoff: researchhandoff: mathResearch agentcreate_react_agenttools=[web_search]Math agentcreate_react_agenttools=[add, multiply]completeENDTime-travelget_state_historyresume from anycheckpoint
Figure 1 · LangGraph state graph. Dashed orange ring = checkpointer, persisting state at every node.

The code, in 14 lines

from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor
from langgraph.checkpoint.postgres import PostgresSaver

research_agent = create_react_agent(
    model="openai:gpt-4o", tools=[web_search], name="research")

math_agent = create_react_agent(
    model="openai:gpt-4o", tools=[add, multiply], name="math")

with PostgresSaver.from_conn_string(POSTGRES_URL) as checkpointer:
    workflow = create_supervisor(
        [research_agent, math_agent], model=ChatOpenAI(model="gpt-4o"))
    app = workflow.compile(checkpointer=checkpointer)

That's a complete supervisor agent with two specialists, durable execution, and resume-from-anywhere. Swap PostgresSaver for InMemorySaver during development. The graph the supervisor compiles is the one in Figure 1.

The deployment story: LangGraph Platform

LangGraph Server (now branded under LangSmith Deployment) is the managed runtime. Three first-class concepts:

  • Assistants — a deployed graph configured with a specific model, prompt, and toolset.
  • Threads — persistent conversations.
  • Runs — individual invocations on a thread.

The platform also supports RemoteGraph (deployed agents calling each other), MCP, and Google's A2A protocol. Three deployment tiers: Cloud, Hybrid (your cloud, their control plane), and Self-Hosted. "Same runtime, same APIs. What changes is who manages the infrastructure."

LangSmith Studio: the observability gap most teams have

LangSmith is the observability piece — and LangSmith Studio specifically is the difference between "our agent works" and "our agent works in production." It connects to any agent server, visualizes the graph, lets you step through state at each checkpoint, modify state mid-execution, and explore alternative paths from any node.

Production agent debugging without something like Studio is guesswork. Most LangGraph teams adopt it within a few weeks of going live.

Production users — who actually ships LangGraph

  • LinkedIn — "hierarchical agent system built on LangGraph" for AI-powered recruiting.
  • Klarna — "Klarna's AI Assistant, powered by LangGraph and LangSmith, handles customer support tasks for 85 million active users — reducing customer resolution time by 80%."
  • Replit — multi-agent system for their AI coding copilot, with human-in-the-loop.
  • Uber — "network of specialized agents" for large-scale code migrations and unit-test generation.
  • Elastic — multi-agent network for real-time threat detection.

The supervisor pattern shows up over and over in these production deployments. It's the right shape when one agent should hold the conversation while specialists do the actual work.

Where LangGraph fits next to ADK and Claude SDK

  • LangGraph — heterogeneous models, durable execution, time-travel debugging, complex graphs. The right answer when your problem is a graph, not a single loop, and especially when you need to resume something after failure.
  • Google ADK — workflow-agent composition, vertically integrated with Vertex AI Agent Engine. The right answer when you're on Google Cloud and want managed infrastructure under your agent.
  • Claude Agent SDK — single well-instrumented agent loop with strong hooks. The right answer when one capable agent with the right tools and the right guardrails carries the task.

These are not interchangeable. They're solving different shapes of problem. Most production AI shops we work with end up using more than one.

Where on-device fits

LangGraph is a server-side framework. It runs in Cloud Run, Kubernetes, or LangGraph Cloud — never on the user's device. The same hybrid pattern we use for ADK and Claude SDK applies here, with a twist: LangGraph's durable execution is most useful for long-running tasks, which by definition aren't latency-sensitive — and long-running tasks are exactly the ones that should not be on-device.

The on-device tier handles real-time interaction. The LangGraph tier handles the multi-hour research synthesis or the overnight code-migration job. Different time horizons, different deployment surfaces, the same product.

What to do with this

If you're evaluating agent frameworks and your problem has any of these shapes — multi-agent supervision, long-running workflows, human-in-the-loop, or production durability requirements — start with LangGraph. The community is the largest, the production references are the most substantive, and durable execution is genuinely a differentiator.

If you're already on LangGraph, the highest-leverage moves are: (1) wire a real checkpointer (Postgres in prod, not InMemorySaver) — the durable execution story doesn't work without it; (2) adopt LangSmith Studio early — it accelerates debugging at least 5x; (3) use the prebuilt create_agent until you can name a specific reason it isn't enough.

Further reading

Want this in your project?

Every engagement ships with an eval harness.

Start a project →