Building agents with CrewAI — a field guide.

CrewAI is what happens when you stop thinking about agents as functions and start thinking about them as employees. It's an open-source Python framework, built from scratch (independent of LangChain), with one opinionated idea: multi-agent work is role-based.

You don't define a graph or a workflow. You hire a team. Each agent has a role ("Senior Data Researcher"), a goal ("uncover cutting-edge developments"), and a backstory ("you're a seasoned researcher with a knack for..."). Tasks are the unit of work. The Crew is the team. The Process is the management style. That's the whole framework, top to bottom.

It works. CrewAI claims more than 450 million agentic workflows per month and customers including DocuSign, PwC, IBM, and Johnson & Johnson — with named outcomes: 75% faster first contact with leads (DocuSign), code-generation accuracy from 10% to 70% (PwC), 90% reduction in development time for curriculum design (General Assembly).

The mental model: role, goal, backstory

Most agent frameworks ask you to think about flow — what happens, in what order, with what state. CrewAI asks you to think about roles — who does what, what their job is, what they're good at. The flow falls out of the role definitions and a sequential or hierarchical process pinning them together.

This is genuinely a different mental model. It works especially well when:

The task naturally decomposes into specialist roles (research → analyze → write).
You want non-technical stakeholders to read and edit agent definitions (a role and goal in plain English is much easier than a state graph).
You're prototyping a workflow that's only loosely defined and want the agents themselves to figure out the collaboration.

It works less well when you need precise control flow, durable execution, or heterogeneous models. The framework is opinionated; if you're fighting the opinions, you're using the wrong tool.

The five primitives

Agent — role, goal, backstory, tools, llm, plus knobs like allow_delegation and memory.
Task — description, expected output, the agent who's assigned, optional context (other tasks whose output feeds this one). Task is the unit of work, not the agent.
Crew — the team. Bundles agents, tasks, and a process.
Process — the management style. sequential (each task feeds the next) or hierarchical (a manager LLM plans and delegates).
Flow — the newer, event-driven layer. Use Flows for the overall structure; use Crews inside Flow steps when a step needs a team.

Crews vs. Flows — the decision that matters

CrewAI's 2025 positioning shift is real and load-bearing. The official guidance: "Use a Flow to define the overall structure, state, and logic of your application. Use a Crew within a Flow step when you need a team of agents to perform a specific, complex task that requires autonomy."

In practice:

Crews = autonomous collaboration. Good for tasks where you want the agents to figure out the micro-steps. Less good for production reliability — the agents can drift.
Flows = event-driven orchestration with decorators (@start(), @listen(), @router(), @human_feedback()). State is persisted, executions can resume. This is the production-shaped surface.

Most production CrewAI apps end up Flow-first with Crews invoked inside specific steps. That's the pattern the docs steer you toward, and it's the right one.

A worked example: a content-research crew

The canonical CrewAI use case: content production. A researcher gathers material, an analyst extracts insights, a writer turns it into a draft. Three agents, three tasks, sequential process, done.

Figure 1 · A CrewAI crew. Sequential process, shared memory underneath.

The code, in 22 lines

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Data Researcher",
    goal="Uncover cutting-edge findings about {topic}",
    backstory="You are a seasoned researcher with a knack for synthesis.",
    tools=[web_search])

analyst = Agent(
    role="Insights Analyst",
    goal="Extract themes and gaps from research",
    backstory="You read between the lines of any report.")

writer = Agent(
    role="Content Writer",
    goal="Write an engaging draft from the analyst's findings",
    backstory="You write for technical readers without dumbing down.")

research_task = Task(description="Research {topic}", agent=researcher,
                    expected_output="A list of 10 findings with sources.")
analyze_task  = Task(description="Synthesize themes", agent=analyst,
                    context=[research_task],
                    expected_output="Three named themes and the gaps between them.")
draft_task    = Task(description="Draft article", agent=writer,
                    context=[analyze_task],
                    expected_output="A 1200-word draft.")

crew = Crew(agents=[researcher, analyst, writer],
            tasks=[research_task, analyze_task, draft_task],
            process=Process.sequential, memory=True)
result = crew.kickoff(inputs={"topic": "on-device agents in 2026"})

That's a complete content-research pipeline. Three agents, three tasks, sequential process, shared memory. Notice that nowhere did we write "step 1, step 2, step 3." The agents figured it out from their roles and the task descriptions.

The unified memory model

CrewAI consolidated short-term, long-term, entity, and external memory into a single Memory class. Set memory=True on the Crew and the team has access to a shared store with three knobs: semantic similarity, recency decay, and importance weighting. The composite ranking is configurable but the defaults are sane.

The default LLM for memory operations is gpt-4o-mini — small and cheap, which is correct. You don't need a frontier model to decide whether two notes are about the same topic.

Hierarchical process and the manager LLM

Set process=Process.hierarchical and pass a manager_llm, and CrewAI promotes one of the agents to project-manager status. The manager plans the sequence, allocates tasks, reviews outputs, and decides what to do next. It's the right shape when you don't know in advance which task needs to run, or when the order depends on intermediate findings.

Hierarchical mode is more expensive (the manager makes decisions on every step) and less predictable (the manager is an LLM). Use it where you genuinely need the autonomy; otherwise stick with sequential.

CrewAI AMP: the enterprise surface

CrewAI ships an enterprise platform branded AMP (Agent Management Platform), with two deployment surfaces: AMP Cloud (managed, with a visual editor) and AMP Factory (private VPC on AWS, Azure, or GCP). Native trigger integrations include Gmail, Slack, Salesforce, Outlook, Teams, OneDrive, and HubSpot — useful for "an agent runs when this email arrives"-style automations.

AMP also adds workflow tracing, agent training, task guardrails, and RBAC. For organizations comparing CrewAI to LangGraph Cloud or Vertex AI Agent Engine, the integration list is a real differentiator.

Production users and outcomes

From CrewAI's public customer page (numbers are their published claims, not independently audited):

DocuSign — 75% faster first contact with leads.
PwC — code-generation accuracy from 10% to 70%.
General Assembly — 90% reduction in development time for curriculum design.
Gelato — enriches 3,000+ leads per month.
Piracanjuba — 95% response accuracy for customer support.

Where CrewAI fits next to the others

CrewAI — fastest path from "we have a workflow with named roles" to a working multi-agent system. Best when the team is the abstraction.
LangGraph — when you need durable execution, time-travel, or heterogeneous models.
Google ADK — when explicit workflow primitives (sequential / parallel / loop) need to be first-class and you want Vertex AI deploy targets.
Claude Agent SDK — when one well-instrumented agent loop carries the task.

Where on-device fits

CrewAI is server-side. Same hybrid pattern as the others: on-device for the user-facing surface, CrewAI behind it for team-shaped work. The pattern fits especially well when the cloud-side work is naturally role-decomposable — a research synthesis, a multi-step content workflow, a multi-team process.

What to do with this

If you have a workflow you can describe in plain English as "a researcher does X, then an analyst does Y, then a writer does Z," CrewAI is the smallest possible step from that description to a working system. The framework leans into the metaphor — your agent definitions read like job postings.

If you're already on CrewAI, the highest-leverage moves are: (1) graduate from Crews-only to Flow-first as soon as you need durability or human-in-the-loop; (2) write task descriptions that look like ticket templates (specific, bounded, with an expected output); (3) keep the manager LLM for hierarchical mode small unless you genuinely need a smart manager.