The assistant

Potter is the chat panel (⌘L). It's deliberately a side panel, not the product: the squad and the project task graph are the primary AI surfaces.

Providers

On-device: Apple's Foundation Models, in-process. No download, no API key, nothing leaves the Mac. Needs Apple Intelligence enabled in System Settings. It's private and free, but the least powerful of the three, so it's the fallback rather than the first reach.
Cloud: Anthropic's claude-opus-4-6 over the Messages API, streamed with URLSession SSE. Available as a one-click switch when ANTHROPIC_API_KEY is set in the environment before launch. The key is only ever read from the environment; the pre-commit hook blocks anything that looks like an sk-ant- key from being committed.
Codex: OpenAI's gpt-4o over the Chat Completions API, streamed with URLSession SSE — same shape as the Anthropic backend. Used as the cloud fallback when only OPENAI_API_KEY is set (Anthropic still wins when both keys are present). The key is read from the environment only.
Kiln account: hosted, metered agents through your Kiln account — no BYOK key, the account's bearer token instead.

The default tier (#potter-default)

Potter and the squad reach for the most potent model available out of the box, treating the weak on-device model as a last resort rather than the first pick: Kiln account → Claude → on-device. Settings ▸ General ▸ Providers ▸ Potter's default model lets you change where they start — Kiln, Claude, or on-device. Whichever you pick is a starting point: a tier that isn't signed in or keyed falls through to the next in potency order, so the picker never strands a call. PotterDefault is the pure setting (its order is the fallback ladder), threaded through Potter.pickProvider, and the choice still bows to two harder constraints — an agent that pins its own model, and the on-device-only leash (privacy/budget/ambient), which never reaches the cloud.

Including providers

Settings ▸ General ▸ Providers has a toggle for each backend (on-device, Claude, Codex). The two cloud tiers ship on; the weak on-device model defaults off and is opted into here (an unset toggle falls back to the provider's Provider.defaultIncluded). Turning a provider off removes it from every availability check, so it disappears throughout — the assistant picker, the squad, ambient agents, and every routed call. The toggle only ever subtracts: a provider you leave on still needs its API key (or Apple Intelligence) to actually run. The single seam is Backends.included(_:), which gates cloudAvailable() / codexAvailable() / onDeviceAvailable(), so every consumer honors the choice without its own check.

Context anchoring

You can feed context straight into a prompt with #. Where a leading @Name routes a turn to a teammate, an inline #ref feeds the model what it needs to answer — your actual files, not a guess:

#path/to/file splices a file's contents.
#some/folder lists a directory.
#docs hands over the whole feature library; #docs:editor picks one page.
#search:query ranks the project for query and feeds back the strongest hits as path:line — snippet, the same grounded search behind the navigation cards. It lets the model find where something lives and pull in only that, instead of being handed the whole tree — agent-side function tooling that costs nothing on-device.
#review folds in the comments left on the current diff — the same review "Request changes" acts on.
#plan folds in your current plan and its steps — what you're working towards (.kiln/plan.local.json).
#web:https://… (or a bare #https://…) fetches a page and strips it to text.

# and not @ is deliberate — @ already addresses a teammate, so the two read cleanly side by side: "@Wren, look at #Editor/Buffer.swift". Start typing # and the composer offers a strip of matches — the docs/search:/web:/review keywords plus fuzzy-ranked project paths; click one to complete it. Anchors resolve when you send, just ahead of your message, and each block is capped so one #docs can't blow the on-device window. A path that can't be read becomes a short "couldn't read" note in the prompt rather than vanishing silently, so you can tell the model didn't get what you meant. Paths resolve against the open workspace; a .. that would climb out of the project is refused.

ContextAnchor is the pure parser (which #ref is which); ContextResolver does the reading and fetching. Both are covered by tests.

Acting on a review in chat

You don't have to reach for #review by hand. When a message is about the open review — "action the review comments", "address the feedback", "work through the requested changes" — Potter folds the diff's comments in for you, so it acts on them instead of asking you to paste them back. The comments come from the same .kiln/review-comments.local.json the Diff surface writes, read back for the current branch, so this works even when the Diff view isn't open. ReviewIntent is the pure trigger test and ReviewBrief.chatContext assembles the block; both have tests. This is the chat-side complement to Request changes, which fans the same comments out to one worker per file.

Knowing the plan

The same goes for your current plan (the local PR-in-waiting). "Execute the plan", "what's the next step?", and the like used to land on a Potter that had never been shown the plan, so it asked which plan you meant. Now Potter folds the plan and its steps in whenever a message is about it — PlanIntent is the pure trigger, PlanBrief.chatContext renders the block, and PlanStore reads it from .kiln/plan.local.json. The squad survey carries the same block, so a directed run asked to carry the plan out works the steps instead of guessing. Reach for #plan to pin it explicitly.

One streaming primitive

Every backend turns a prompt into one AsyncThrowingStream<AssistantEvent, Error> — text deltas, reasoning deltas, token usage, and a truncation marker. Two kinds of caller sit on top:

Buffered (Potter.oneShot): accumulates the text events and returns the finished string. The ambient features — section summaries, comment tightening, squad reviews — use this. It follows the configured default tier (most potent available), except the always-on ambient features, which pin themselves on-device to stay free — local inference costs nothing.
Live (Potter.streamEvents): hands the stream back so the caller can render each delta as it lands. Chat uses this, and so do streamed squad runs.

The same stream cancels cleanly: cancelling the consuming task tears down the underlying network request through the stream's onTermination, so pressing stop actually stops the call.

Streaming and thinking

Chat replies stream token by token. Turn on live agents (the waveform glyph in the squad header) and squad reviews stream into the feed too — you watch the agent write, and the raw reply resolves into structured notes and proposals when it settles. Turn on thinking (the brain glyph, cloud only) and the model's reasoning streams in alongside the answer, dimmed and collapsible above it so the conclusion still reads first.

While a reply is still landing it renders as plain text; the markdown parse runs once, when the text settles, rather than on every delta.

Robustness

The cloud backends share a session with real streaming timeouts (an inactivity watchdog plus an overall ceiling) instead of the default request timeout, which could guillotine a long generation. A failed request surfaces the server's own error message, not a bare status code. Transient overload and rate-limit responses back off and retry — but only before any output has been shown, so a retry never duplicates a partial reply. A reply that hits the output ceiling is marked truncated rather than read as complete.

Fitting the on-device window

Apple's Foundation model has a small, fixed context window shared between the prompt and its reply, so a big input — a whole source file handed to a squad worker, a failing CI log — can overflow it. ContextBudget is the pure arithmetic that protects against that:

Routing (#233): the default tier already favours the cloud, so most calls never touch the on-device window. When on-device is the only model available, an oversized prompt is fitted to the window rather than dispatched to fail. Pinning an agent on-device (or the global on-device-only switch) always keeps the work local.
Trimming (#234): when a call does run on-device — including the on-device-only case — the prompt is fitted to the window first. Oversized prompts keep their head and tail and elide the middle, where context is usually densest at the edges (a file's signature and its end, a log's start and its failing tail).

Providers​

The default tier (#potter-default)​

Including providers​

Context anchoring​

Acting on a review in chat​

Knowing the plan​

One streaming primitive​

Streaming and thinking​

Robustness​

Fitting the on-device window​