Skip to main content

The squad

The squad is kiln's agent-multiplayer layer (⌘L toggles the squad & assistant panel): named AI teammates with visible presence — dots in the sidebar, chips on the editor tab bar, a roster and activity feed in the panel. Each agent wears a profile icon drawn from its name (an SF Symbol over its roster color, #190), so a teammate is recognisable at a glance in the roster and its inspector, not just a coloured dot. Members are ephemeral; they join for a session and their presence shows which file each one is "in". When a member finishes a job it doesn't vanish — for a two-minute grace window it keeps reading as active (a "just finished" line, still counted in the panel's N working header), so you retain oversight of what just completed before it settles to idle (#241). An ephemeral member that then sits idle for five minutes drifts back out of the roster on its own, so one-shots and finished squad runs don't pile up as ghosts (#236). This only reaps active members — the pair, pinned standing jobs (pulse, stewards), and the project's defined agents are left alone.

Potter, the resident assistant, leads the roster — always present, above the ephemeral members — and it's the squad's coordinator: every handoff lands on Potter first, who plans and then hands the work to the specialist agents who do it (#potter-coordinator). Its row carries a backend chip: tap it to move Potter between on-device and cloud (and your Kiln account, when you're signed in), cycling whichever backends are wired up. It's the same switch that used to sit as a lone glyph in the panel header, now where Potter reads as a teammate. The chip shows Potter's current backend at a glance, and the dot pulses while a reply is streaming.

Every defined agent shows up too, even before a run drafts it (#228). Below the live members the roster lists an asleep bench: each roster seed and each .kiln/squad.json / squad.local.json agent that no live member stands for yet, dimmed with a moon and its model line. Tap one to wake it into the roster as an idle worker — the whole bench reads at a glance, not just whoever happens to be working.

All squad inference goes through Potter.oneShot, which follows the configured default tier (most potent available — Kiln → Claude → on-device, set in Settings ▸ Providers, #potter-default). Flip on live agents in the panel header (the waveform glyph) and a run streams into the feed instead: you watch the agent write, and the raw reply resolves into structured notes and proposals when it lands. With thinking on too (the brain glyph, cloud only), the model's reasoning streams in dimmed above the answer. The per-agent oversight view (the active-tasks pane opens it) shows the same live timeline with relative timestamps, so you can see when each step happened and roughly how long it took.

Ways to put the squad to work

  • Pairing. Invite a pair onto the active buffer and it reviews as you pause typing.
  • Squad runs. Set the squad on the project: Potter, the coordinator, surveys the codebase (reading each file's first-comment headline as frontmatter, plus a structural outline of the heftiest source files so it plans over real declarations and their dependencies, not just filenames) and hands files to workers, who read their file — along with the signatures of the symbols that file calls but declares elsewhere (#agent-potency), so a call-site edit isn't a guess — and propose edits. Every dispatch reads in the feed as Potter delegating to the named agent.
  • Explain selection (⌘⇧E). Highlight code you don't understand and an explainer walks it; the explanation can be committed back as comments.
  • Quick edit (⌘⇧K, #212). Highlight a passage — in CODE, or in the editable code of NOTES and DOC — and a small ✦ Quick edit pill floats above it, Cursor-style. Tap it (or press ⌘⇧K) and type one instruction — a teammate rewrites just those lines and offers the change as a buffer-only proposal. Nothing touches disk until you apply it.
  • To-do dispatch. Any row in the to-do view can be handed to an agent. Click send an agent to run on the squad's default model, or open the menu beside it to pin that one run to a specific tier — on-device, or a cloud model like Opus 4.8 or Fable 5 (#200). The same picker rides the Hand off control on a failing job and a failing CI run. When the failure names a source file, Potter delegates that file to a worker who proposes a fix; when it names none — a port already in use, a missing tool — Potter reads the output itself and says what it makes of it, rather than dropping the failure on an idle on-device worker. (Letting the squad auto-triage and route to the right tier is a deferred follow-up.)
  • Action a note (#146). Every member-authored note in the feed carries an action this chip: the note's author (or a visitor, if they've left) re-reads the file with the note as their assignment and comes back through the normal proposal path. One use per note — actioned notes show a quiet receipt.
  • Ambient mode. Opt-in, from the moon toggle in the panel header: a beat after a file becomes active, a scout drifts in and only speaks if something is worth saying. It fails silently — it was never asked for. On-device by default (see below).
  • Pseudocode first (#90). Write the structure as comments and run Weave: Draft From Pseudocode: a member grows the implementation under your outline, arriving as a normal proposal.
  • Issue vetting (#100). Hover a GitHub issue in the to-do view and hit vet: a member judges whether it's actionable, applies suggested labels (only ones the repo already has), and posts clarifying questions as one signed comment.
  • Scope an issue (#235, first slice). Hover a GitHub issue and hit scope: a capable member reads it skeptically — does the premise even hold? — then scopes the concrete change (plan, files it would touch, risks), posts that analysis as a signed comment on the issue, and, only when the work is genuinely actionable, asks you in the squad chat for the go-ahead before doing anything. A stale or under-specified issue gets the comment and stops there; it never asks to proceed. Actually opening the PR (in an isolated worktree, off your local work) leans on the worktree-job mechanism in #178 and lands once you say yes.
  • Grow a plugin (#34). Improve kiln: Grow a Plugin… shapes a wish into a .kiln/plugins/*.json palette command — read/build only, behind a safety gate.
  • Brief an agent. Squad: Brief an Agent… (⌘K) takes a one-line instruction and hands it to whoever's paired on the file in front of you — or a visitor, if no one is — who acts on it and comes back through the normal note/proposal path.
  • Type past the palette. When a ⌘K command search finds no match, the typed text isn't a dead end — an Ask the squad: … row offers to hand it to the squad as a free-form instruction. Hit Return and the text goes to Potter in the panel chat, which opens to catch the streaming reply. So you can type a half-remembered command or a plain request, and if nothing matches it still does something useful.

Inspecting an agent

Click any roster row to open that agent in the active-tasks pane: the bottom tray rises (if it wasn't already) and pins its detail to that member's live oversight — its thought timeline, its proposed diffs, and a chat bar scoped to just this teammate — even when the agent is idle and so has no running task row of its own. It's the same oversight surface a running agent's task row opens, so an agent reads the same whether you reach it from the roster or from the tray. Picking another row in the tray hands the detail back to the list.

Each roster row carries an ellipsis (⋯) menu on hover, and the oversight pane repeats it in its header — Edit agent… opens the edit modal (the add-an-agent sheet, prefilled with that agent's name, model, skills, and instructions), so reshaping a teammate is one click from wherever you're watching it. Saving upserts to .kiln/squad.local.json, the same path the add an agent chip writes.

A presence dot in the sidebar or the who's here chip on the editor still opens the agent's lightweight inspector sheet instead — a quick read of what it's up to that mounts at the window root, so it works even with the squad panel closed. The inspector's top card is its live job (current activity, the task it was handed, the file it's in); below sits its character and its own slice of the activity feed. The roster row itself also shows that headline, so an agent working away from any open file reads clearly instead of a bare "reading".

Chatting in the panel

The composer at the foot of the panel talks to Potter, the resident assistant; your turns and Potter's streaming replies land in the same feed as the squad's activity, so one place holds the whole conversation (#174).

Open a message with @Name to address a specific teammate instead (#174). That member answers in Potter's place — in character, on its own persona, skills, and model — and the reply wears its name and color in the feed. Typing a leading @ raises a strip of the squad you can address; click one to complete the mention. Matching is case-insensitive and only routes when the @Name opens the message; a name buried mid-message, or one that matches no one currently in the squad, just goes to Potter.

Every chat turn carries its speaker's profile icon (#262): your own turns wear a person glyph, a teammate's reply wears its name-derived face in its roster color, and an unattributed Potter reply wears kiln's own mark — the same recognisable avatar the roster uses, so who-said-what reads at a glance.

Pasting images (#chat-images)

Drop a screenshot into the conversation. Paste a copied image with ⌘V, drag one onto the composer from Finder or a browser, or click the photo button to pull a copied image off the clipboard (or pick a file). Staged images show as thumbnails above the field, each with a remove control, and ride your next message — so "what's wrong with this layout?" over a screenshot just works. The same affordances live in the per-agent oversight chat, so you can show one teammate a picture too. An image-only turn (no words) is fair game.

Images travel as PNG bytes folded into the turn, the multimodal shape the cloud models read: Claude and GPT-4o see the picture; the on-device model and the hosted Kiln tier read only your words. So reach for a cloud backend when the question is about an image.

Ask the chat where something lives — "where is the highlighter", "find the diff commit", "which file holds the control port" — and the panel runs a real search over the workspace before anyone answers. The hits land as a navigation card in the feed: each row is a path · line a search actually turned up, with the matching source line beneath it. Click a row and the editor opens that file and lands you on the line, centred and selected. Because the rows come from a grounded search (CodeSearch), they point at real locations — never a line number a model guessed. The same hits are fed to the responder, so Potter's prose cites the places the card links to.

Phrase it as a command — "open the AppState file", "go to the chat sender", "take me to the reveal" — and the panel goes one step further: it still posts the card, and opens the strongest hit for you. NavIntent is the pure classifier that tells find me where (just show the card) from take me there (show it and open the top hit); ordinary chat is left untouched.

Asking Potter to make a change (#potter-do-the-work)

Ask the chat to make a change — "fix the author key in the blog post", "rename foo to bar", "update the docusaurus config", or just "do it" after Potter sketches a plan — and Potter does the work instead of handing you a to-do list. Paste a build error or a stack trace and the same thing happens: Potter solves it rather than narrating the fix. It takes the request as the squad's coordinator: works out which files the change touches, then hands each to a worker that proposes a real buffer edit. The request decides the target — the file a path in the error points at, a named symbol — and the file you happen to have open is only a hint, not assumed to be the one that changes. The proposals land in the feed for you to review and apply, exactly like an architect's run — nothing hits disk until you say so.

FixIntent is the pure classifier behind the imperative path: a verb-led command ("fix", "rename", "update", "refactor"…) or a bare confirmation ("do it", "apply that") fires it. ErrorIntent is its sibling for pasted diagnostics — a build failure, compiler error, or stack trace reads as "solve this", unless you frame it as a question ("what does this error mean?"), which stays a prose reply. Conceptual questions, status asks ("update me on…"), and (for FixIntent) pasted code are likewise left to a normal reply. Review-flavoured asks ("action the review comments") stay on their own path, where the diff's comments fold in as context. When Potter can't pin the change to a file, it says so rather than guessing — name the file or open it and try again.

You don't even have to phrase it as a command. Report the symptom — "the build is failing", "this throws an error", "the export button doesn't work" — and Potter takes it as a fix request too, because a breakage you describe is one you want gone, not narrated. (Earlier it would explain the fix and leave you to apply it by hand; now it does the work.) The same goes for asking after the breakage: "why is the build failing?", "what's wrong with this code?", "where's the bug?". Asking why something is broken is asking to have it fixed — so a diagnostic question routes to the fix too, instead of a correct diagnosis you then have to action by hand. Genuinely conceptual questions ("what does this function do?", "how does the squad work?") still get a plain reply.

FixIntent is the pure classifier behind it: a verb-led command ("fix", "rename", "update", "refactor"…), a bare confirmation ("do it", "apply that"), a symptom report ("X is broken", "the build is failing", "it doesn't work"), or a diagnostic question about that breakage ("why is the build failing?", "what's wrong here?") fires it, while conceptual questions ("how does the squad work?"), status asks ("update me on…"), and pasted code are left to a normal prose reply. Review-flavoured asks ("action the review comments") stay on their own path, where the diff's comments fold in as context. When Potter can't pin the change to a file, it says so rather than guessing — name the file or open it and try again.

Offering to action a recommendation (#potter-offer-action)

The flip side: when you ask a genuine question and Potter answers in prose rather than making an edit, a good answer often still carries a concrete recommendation — "you should extract this into a helper", "I'd debounce the save handler". You shouldn't have to retype that as a command to get it done. So once the reply settles, Potter offers an action this chip beneath it. Tap it and the recommendation is handed straight to a directed run — the same path "fix the typo" drives — and a buffer proposal comes back to review. Nothing hits disk until you say so, and the chip is one use per reply: once spent, it leaves a quiet actioned receipt.

ReplyAction is the pure classifier behind the offer — the read-the-answer counterpart to FixIntent's read-the-question. It fires only on Potter's own replies (a @mentioned teammate answers in its own voice, not as the coordinator) that carry a recommendation phrase ("you should", "consider…", "I'd recommend", "the cleanest approach is…"); a plain explanation ("this function returns the name") gets no chip it can't honour. So the two halves bracket the conversation: a request routes straight to the work, and an answer that recommends work offers to carry it out — neither leaves you copying Potter's words back into the composer.

The ambient layer

Three squad members never wait to be asked. All on-device by default:

  • Pulse (#82). In ambient mode, a scout takes the project's pulse every four minutes — the most recently touched source file gets a read-through, real problems only. The status bar carries the verdict as a small dot; attention also lands in the feed.
  • Stewards (#94). A steward is bound to one key document (CHANGELOG.md, README.md) for the session. Saving a source file cues it to check whether its document still tells the truth; drift arrives as a proposal. Five-minute cooldown, never nags.
  • The nanny (#42). Every proposal gets a second pair of eyes the moment it lands: one sentence in the feed if the change could bite, silence otherwise.

Cloud ambience (#140)

A project that wants sharper ambient reviews can opt the ambient agents — the nanny, the pulse, the stewards, and proactive reviews — into the cloud, via two layered-config keys in .kiln/config.json:

  • squadAmbientCloud (bool, default false) — when true and an ANTHROPIC_API_KEY is present, ambient calls prefer the cloud. squadOnDeviceOnly still wins, so a hard "never reach for the cloud" rule can't be undone by a later layer's opt-in.
  • squadAmbientModel (string, default claude-opus-4-6) — which Claude model ambient calls use, so high-frequency ambience can run on a cheaper tier without touching the architect.

No config means no cloud: the default is unchanged. Ghost completions and section summaries stay on-device-only either way — they're latency-sensitive and high-volume, not agents.

On-device concurrency cap (#on-device-memory)

The ambient agents fan out — the nanny fires per proposal, a steward per bound file, the pulse on its timer, proactive reviews on every buffer switch — and on-device inference keeps the model's weights and activations resident for the life of each session. Several landing at once can stack their footprint and starve the machine. So every on-device call funnels through one gate that bounds how many run in parallel:

  • onDeviceConcurrency (number, default 2) — how many on-device inferences may run at once. Two is enough to stop the fan-out piling up in memory while leaving a slot free for latency-sensitive work (a ghost completion, a section summary — both on-device); drop it to 1 to fully serialize, or raise it on a Mac with headroom. A value below 1 clamps to 1 (the gate never lets nothing through, which would deadlock on-device). The cap reconfigures live from config — no relaunch — and raising it wakes any queued calls immediately.

Cloud calls aren't gated: their memory lives on someone else's machine. The cap is only about local inference, so it applies whether or not ambient cloud is on.

Cost meter (#185)

Flipping ambient cloud on used to be a blank check — nothing measured what the watchers spent. Now, whenever a key is present, the squad header carries a spend gauge: the running US-dollar estimate for this session, drawn from the token usage Anthropic reports on every cloud call (input on message_start, cumulative output on message_delta). On-device calls are free and never counted.

Tap the gauge to set a per-session budget — $1, $5, $20, a custom amount, or off — and the gauge turns amber once spend crosses the cap. The total resets each launch (it's never persisted); only the chosen budget survives a relaunch. Cost is a rough estimate from the published per-model rates, not a billing figure.

The cap is now a real cutoff, not just a colour (#agent-potency). Once spend reaches the budget, the squad stops adding cloud cost: a call that would have gone to the cloud falls back to the free on-device model when it can serve, and only refuses outright — "session budget reached" — when the work genuinely needs the cloud and there's no local fallback. Raise or clear the cap to keep going.

Auto-tiering (#agent-potency)

On-device inference is free, so the squad keeps work local by default — but some assignments (a tricky refactor, a concurrency or security question, a large file) are where a top-tier cloud model earns its keep. With squadAutoTier: true in .kiln/config.json, a cheap pure triage reads each assignment and bumps the deep-looking ones to the architect's cloud tier, leaving shallow work on-device. It only ever upgrades an unpinned worker, and only when a cloud key is present and squadOnDeviceOnly is off, so it never adds spend a project didn't ask for. Off by default.

CLI handoff flags (#188)

When the squad escalates a file to a full coding-agent session — Hand Off This File (⌘K), or waking a CLI model from the roster — kiln builds the command line from three layered-config keys in .kiln/config.json, each unset by default so the CLI keeps its own:

  • handoffModel (string) — the --model a cold-shell handoff launches with: an alias (sonnet, opus, haiku, fable) or a full model id. The roster's CLI models pin their own model, so this only applies to the Hand Off This File command.
  • handoffEffort (string) — the --effort passed on both a handoff and a wake: low, medium, high, xhigh, or max. An unrecognised value is dropped rather than passed.
  • handoffPermissionMode (string) — the --permission-mode: default, acceptEdits, plan, auto, dontAsk, or bypassPermissions. Also filtered against the known set.

Claude Code or Codex (#openai-codex)

Every CLI surface — the roster, the wake split-button, Hand Off This File, the Start Here palette commands, and the headless watchers below — speaks two agents: Claude Code (claude) and OpenAI's Codex CLI (codex). Each is gated by its app-wide provider toggle in Settings ▸ General ▸ Providers: turn Codex · OpenAI off and the codex roster, Codex: Start Here, and codex handoffs disappear everywhere; turn Claude · Anthropic off and Claude Code goes the same way — the same "this only ever subtracts" rule the cloud backends already follow. The roster wakes each tier on its own binary, so you can run a Claude session and a Codex session side by side.

Codex spells its flags differently — it has no --effort or --permission-mode, and runs headless through codex exec rather than claude -p — so the shared handoff* knobs map down to just the validated --model for it. One layered-config key picks which agent the cold-shell handoff and the headless watchers drive:

  • cliAgent (string) — claude (default) or codex. The roster wakes each model on its own agent regardless; this only steers Hand Off This File and the autonomous/research runs. An unknown value falls back to Claude Code. The Claude Code-only session controls — remote control and turbo — stay hidden on a Codex row, since Codex has no such modes.

Autonomous spotter handoff (#178)

The handoffs above type into a live terminal for you to watch. The autonomous path is the hands-off cousin: a spotter's finding becomes a full Claude Code agent that works in an isolated git worktree off HEAD and opens a PR — no human in the loop. The shared working tree is never touched, so other agents' uncommitted edits stay safe.

A finding is the grounded unit of work: a one-line TITLE, an anchor (a FILE and/or an ISSUE number), and an ACCEPT line — the single observable check that proves it's done. It's parsed leniently from the squad wire format, fields in any order. A finding with no title, no acceptance, or no anchor is treated as a note, not a handoff.

The pipeline, all inside a throwaway worktree under .git/kiln-worktrees/: create the branch (auto/<issue>-<slug>-<id>) off HEAD, run claude -p headless with a deliberately skeptical prompt (verify the premise first; reply NO-CHANGE: … and edit nothing if the finding is stale or wrong; otherwise make the smallest correct change and commit), then — only if the agent actually committed a diff — push and gh pr create. The worktree is always removed afterward, success or failure. A declining agent, an empty diff, or any failed step ends without a PR.

This is off by default and gated three ways, every one of which must pass: the project opts in with autoHandoff: true in .kiln/config.json, a cloud key is present (a worktree agent never runs on-device), and the finding is grounded. It reuses the handoff* CLI flags above for model/effort/permission, and the project's cliAgent decides whether the worktree agent is a headless Claude Code (claude -p) or Codex (codex exec) run.

First slice (#178): the structured finding, the worktree job, and the gate. Deduping repeat findings and the spotter→handoff wiring into the ambient layer are deferred.

Issue research watcher (#378)

An ambient watcher for the tracker itself: every five minutes it diffs gh issue list against a seen-set in .kiln/issue-watch.local.json (machine-local; the first poll seeds it silently, so an existing backlog never stampedes). Each new issue gets a cheap on-device gate — would research help whoever picks this up? — and only a yes escalates.

The research is a headless coding-agent run (claude -p or codex exec, per the project's cliAgent, same mechanism as #178) with a skeptical prompt: read the codebase, hit the web only when the issue touches external APIs or platform behavior, make no edits, and reply NO-RESEARCH: … rather than padding an issue that needs none. The result parks in the feed as a draft card — the comment body, a Post button, a Discard button. Nothing reaches GitHub until a human posts it; the posted comment is signed — kiln research.

Two layered-config keys in .kiln/config.json:

  • issueWatch (bool, default false) — the master switch. Also needs a cloud key; the gate runs on the ambient policy, but the research agent is never on-device.
  • issueWatchModel (string) — the --model for the research run, e.g. claude-fable-5. Unset falls back to handoffModel. Effort carries over from handoffEffort; a permission mode is never passed — research reads, it doesn't write.

A burst of new issues drains at most two per tick; the rest stay unseen for the next beat, so nothing is silently dropped.

Personas and project agents

Members draw names, colors, and personas from the roster (hover a roster row to read one). A project can reshape them (#92) and add brand-new ones (#166) through agent definitions in two layers, mirroring the layered config:

  • .kiln/squad.json — the committed, team-shared layer.
  • .kiln/squad.local.json — a gitignored, per-user layer (the .local convention). It wins on a name clash, so you can tune or add agents just for yourself without touching what the team gets.

Each file is an array of agents. A name that matches a roster seed reshapes that seed; any other name defines a new agent that drafts ahead of the seeds whenever a member is needed, with a stable color and profile icon drawn from its name.

[
{
"name": "Scout",
"model": "claude-haiku-4-5",
"skills": ["release-checklist", "api-conventions"],
"persona": "Reviews public API changes against the conventions doc. Blunt about breakage."
}
]
  • modelon-device pins the agent to the local model; a Claude model id sends its calls to the cloud on that model (key required); a kiln:<id> value (or a bare kiln: for the tier default) routes the agent through your signed-in Kiln account instead of a BYOK key; omit it for the squad's default policy. squadOnDeviceOnly still wins.
  • skills.claude/skills/<slug> slugs (#44); each skill's SKILL.md rides along in the agent's system prompt, capped per skill.
  • persona — the agent's instructions. An agent with no persona is dropped.

The add an agent chip in the squad panel writes this for you: name, model, skill checkboxes, instructions. The model picker offers the squad default, on-device, a cloud (BYOK) model id, and — when you're signed in to a Kiln account — a kiln segment whose dropdown lists the models your account exposes (so you can build an agent on a Kiln-account model, not just a BYOK one). Start from a role — Pike, Quill, Thorn, or Hawk — and the sheet fills in the persona, suggests the name, and wears the face that role's name resolves to; the active role stays highlighted until you hand-edit the instructions, and switching roles carries the suggested name along (a name you typed yourself is left alone). The live identity preview in the header shows the colour and face the typed name will wear before you save. Saving upserts the agent into .kiln/squad.local.json (never the shared file) and it joins the roster on the spot. New Squad Agent… (⌘K) is the same thing from the command palette — it asks for a name then a persona and upserts the agent the same way.

Proposals

Agent edits arrive as proposals rendered as line diffs (an LCS over lines, capped — proposals are snippets, never whole files). Applying one changes the buffer, never the disk; saving stays a human act.

A proposal is the feed's decision moment, so it reads as a distinct card rather than another paragraph of chat. An accent rail and a tracked PROPOSED CHANGE eyebrow mark it out; the author and file sit beneath, with a +N −M line stat trailing so you know the size of the edit before reading it. The diff caps at 24 lines with a quiet "+N more" footer, the verification verdict wears the same pass/fail badge as the rest of the app, and the rail and eyebrow turn sage once you apply — the card visibly changes state from "waiting on you" to "applied · unsaved".

Applying is forgiving about cosmetic drift (#agent-potency). On-device models routinely echo the OLD lines with the trailing whitespace shaved or the punctuation prettified (straight quotes turned curly, - turned em-dash), which used to mean a perfectly good edit died at apply time over a stray space. When the verbatim match misses, a line-by-line tolerant pass folds that drift back, lands the edit, and keeps every surrounding line byte-for-byte — and still refuses anything absent or ambiguous rather than guessing.

Each unapplied proposal also carries a refine affordance (#agent-potency, the deferred half of #185): type a one-line steer — "keep the guard, tighten the message" — and the proposal's author (or a visitor, if it's left) re-reads the file with the prior edit and your instruction, then comes back with an updated proposal through the same buffer-only path. It builds on the edit on the table rather than starting over, so you can converge on a change in a couple of turns instead of taking it or leaving it.

Verify in the loop (#agent-potency)

With squadVerify: true in .kiln/config.json, a squad worker's edit is checked before you ever see it: the proposal is staged onto disk, run through the project's cheapest truthful check (make lint, swift build, … — the same VerifyPass the "verify first" button uses, which restores the disk byte-for-byte afterward), and a failure feeds the build error straight back to the worker for one corrective pass. The surfaced proposal then wears the verdict, so a green badge means the project's own tooling already signed off. Off by default — verification spends a build.

The wire format

Agents answer in a strict shape — NOTE: lines, PROPOSAL: blocks, ASSIGN: lines for the architect — parsed leniently in SquadReply/SquadPlan, because small on-device models drift. A reply with no parseable structure becomes one note. Change prompts and parser together.

Presence beyond the window

When the app isn't frontmost, squad activity surfaces as native notifications — only then, so presence follows you out of the window without nagging you inside it. Needs a real .app bundle (./run.sh); a bare swift run binary skips it.