ArkvectorAI logoArkvectorAIStart a project
← Insights
MCP

Shipping an MCP server for a cooking workflow — what we learned.

We extracted RecipeGuide's tool layer into a public Model Context Protocol server. Here's how we designed the tool boundaries, handled streaming, and what we'd do differently.

May 3, 2026·11 min read

We had been shipping RecipeGuide for about a month — an on-device cooking agent with nine tools — when we started getting the same email from three different developers in the same week: can I use your tools in my own agent? The polite answer was "not really, they're bound to a SwiftUI app." The honest answer was "the tool surface is the most reusable part of this thing — we should ship it as an MCP server."

So we did. This is a field-notes account of designing the boundaries, handling streaming, the auth and rate-limit story, and the things we got wrong on the first pass.

Why MCP, and not just a REST API

The Model Context Protocol is the thing that makes a tool reusable across agents. A REST API is a contract between two specific services. An MCP server is a contract any compliant client — Claude Desktop, an Anthropic SDK app, an open-source agent framework — can speak without any code on our side knowing it exists.

For a cooking workflow that's the right shape. The tools themselves (scan a pantry, match recipes, balance macros) are not secret. The value is in the data we've curated and the substitution logic we've tuned. Letting any agent call them cheaply doesn't cannibalize anything — it grows the surface area people can use this thing in.

Designing tool boundaries — the hard part

Most of the work in shipping an MCP server is not the protocol. It's deciding what counts as a tool. We started with eleven candidates and ended with seven. The four we cut taught us most.

Cut #1: tools that were really one tool with branching

We had separate find_recipe_by_ingredient and find_recipe_by_cuisine tools. Both are searches with different filters. Splitting them meant the agent had to decide between them before it knew what the user wanted, and the wrong choice ate a turn. We collapsed them into match_recipes with optional ingredients, cuisine, and max_time parameters.

The rule we drew from this: a tool boundary is a decision boundary. Two tools should exist when the agent has enough information to pick correctly between them. Otherwise it's one tool with parameters.

Cut #2: tools that were syntax sugar for the LLM

We had a summarize_recipe tool. It took a recipe and returned a one-line description. We removed it. An LLM doesn't need a tool to summarize — it needs a tool when it needs to reach outside its weights for data, computation, or side effects. If you can imagine the model doing it without a tool, the tool is dead weight.

Cut #3: convenience composites

We had a plan_dinner tool that internally called scan_pantry, match_recipes, and balance_macros in sequence. Convenient — and a trap. It hides the tool-routing decisions from the agent, which means the agent can't recover when one step fails. The agent should compose. The MCP server should expose primitives.

We did keep one composite — generate_shopping_list — because the steps inside it are lossless and deterministic. Heuristic: compose when the inner steps are mechanical, expose primitives when the inner steps need judgment.

Cut #4: tools whose return shape would never stabilize

Our first design had a get_nutrition_facts tool whose output was "whatever the underlying provider returned." Every external nutrition API has a different schema. Letting that leak through MCP means every consuming agent has to special-case our provider. We replaced it with a normalized envelope and lost information we don't use anyway.

The seven tools we shipped

  • scan_pantry — list current pantry items with quantities
  • match_recipes — search recipes by ingredients, cuisine, time, dietary constraints
  • get_recipe — fetch the full recipe by ID
  • substitute_ingredient — propose a substitution given a constraint
  • balance_macros — estimate macros for a recipe and suggest adjustments
  • generate_shopping_list — diff a recipe against the pantry and emit a list
  • scale_recipe — adjust quantities for a serving count

The shape that fell out: read-only retrieval primitives plus small deterministic transformations. The agent does the reasoning. The server does not.

Streaming — and why we don't use it for most tools

MCP supports streaming responses. We use it for exactly one tool: match_recipes, where matches arrive over a few seconds and a streaming UI feels alive. For the others, streaming is worse than batch. scale_recipe takes 4ms. Streaming a 4ms response adds protocol overhead and a worse client experience.

The instinct most teams have on a new protocol is to use every feature it supports. The discipline is to use only the features that make the consumer's life better. Streaming is for tools where the first token of the answer is useful before the last one arrives.

Auth and rate limits

We landed on per-API-key bearer auth, with three tiers: a free tier for tinkerers (60 calls/day), a developer tier (10K calls/day, $9 / month), and a custom tier for production agents. The free tier matters. It is the single biggest determinant of whether a developer ever tries your MCP server. We make it generous on purpose.

Rate limits are enforced at the edge with token-bucket counters. The thing we got wrong on the first pass: returning a generic 429 with no guidance. We now return a 429 with an MCP-typed error including the bucket window, current count, and reset time. Agents handle this beautifully — they back off and retry. Generic 429s caused agents to loop or fail.

Publishing as open-source

The server is MIT-licensed on GitHub. The data and the matching logic sit behind the API. This is the boundary we picked deliberately:

  • Open: the protocol shim, the tool schemas, the example clients, the eval harness for the public tools.
  • Closed: the curated recipe corpus, the substitution model, the macro-balancing heuristics.

The thing we worried about — that opening the shim would invite forks that bypass our service — turned out to be a non-issue. Forks require running your own corpus and your own tuned substitution model, which is the part that's actually hard. Opening the shim cost us nothing and earned us a stream of pull requests, three of which we shipped in the first month.

Things we got wrong, in order

  1. JSON Schema too loose. Our first tool definitions accepted free-text fields where enums would do. Agents passed plausible-but-wrong strings ("medium" instead of medium-heat). Tightening to enums cut the malformed call rate from 14% to under 1%.
  2. No idempotency keys. generate_shopping_list is idempotent in spirit but we didn't expose a key. Agents that retried after timeouts ended up with duplicate state on the consumer side. We added an optional request_id parameter; the next consumer thanked us immediately.
  3. Errors as strings. We started with { error: "recipe not found" }. Agents would parse the string and route on it, which broke when we changed copy. We moved to a typed error envelope with a stable code, a human message, and an optional details field.

When to ship an MCP server vs. embed tools directly

Not every workflow should ship as an MCP server. The decision rule we use:

  • Ship MCP when the tools are stable, the consumer is plural, and the value is in the data or computation behind the tool — not in the UI around it.
  • Embed when the tools are coupled to your UX (a tool that returns "the next step the user should see" is not an MCP tool — it's an app component), when there's one consumer, or when the contract is still in flux.

For RecipeGuide's on-device cook mode, the tools stay embedded — they're tied to the AVFoundation speech pipeline and SwiftUI view state. For everything that can stand alone, we extracted to MCP.

What this means for your project

If you're thinking about MCP, the fastest way to learn what belongs in a server is to draw the line through your existing app between "reusable primitives" and "UX-coupled components." The primitives go in the server. The UX-coupled bits stay in the app. The agent — whether yours or someone else's — is the thing that composes them.

We can usually do this extraction in the second week of a 14-day PoC, which is also when most teams realize they've been shipping their best tools as private functions instead of as a contract anyone could use.

Want this in your project?

Every engagement ships with an eval harness.

Start a project →