Skip to main content

The workflow spine

kiln used to have four peer surfaces for getting work done — the inbox, plans, the squad, and the activity tray — each with its own mental model, linked only by the user's memory. A real piece of work crosses all four: a signal arrives, you decide to do it, an agent works it, a proposal lands, the diff becomes a PR. But nothing held that work's identity as it travelled. Start a plan and the agent you dispatched didn't know the plan existed; its proposal didn't tick a step; the activity tray couldn't say which plan a running agent was serving.

The spine is the thread that fixes this. One narrative now runs through every surface:

Signal → Intent → Work → Change → Ship Inbox → Plan → Agents → Diff → PR

The Plan split button's dropdown: a three-step Plan flow — Plan, Review & commit, Review changes — in the title bar.

The plan is the unit of intent — the spine — and the other surfaces are phases or lenses of it:

  • The Inbox (⌘T) is the funnel: unowned signals (issues, TODOs, CI, loops) entering. Its job is triage — dismiss, hand off to an agent, or promote into a plan. Every row carries one split button: add to plan (the signal becomes a step on the current plan, keeping a #42 / Foo.swift:10 source chip so it remembers where it came from) or make a plan (a fresh plan from the signal) when there's no current plan. That universal verb is what merges the two backlogs — Inbox signals and plan steps stop being retyped into each other.
  • The Plan is the committed intent: the local PR-in-waiting, carrying the issue link, branch, base, and a step checklist. It's home base.
  • The Agents (the squad) are the labour applied to a plan's steps. A run is "working step X of plan Y."
  • The Activity tray is the live "now" lens on that labour — a runtime monitor, not a place work lives.
  • The Diff is the output: steps converge into it, and it ships as the PR.

The WorkRef: a pointer from labour back to intent

The technical seam that threads the surfaces is a small value, WorkRef (Plan/WorkRef.swift): a pointer from a unit of labour to the intent it serves.

struct WorkRef: Codable, Equatable, Hashable {
let planID: UUID
var stepID: UUID? // nil = general work towards the plan
}

A squad run, an active task, and a proposal can each carry one. It's a value the labour holds, not a back-pointer the plan owns, so the plan stays the single source of truth: a ref whose plan or step has gone simply resolves to nothing rather than dangling. The pure resolution and transitions live in PlanWork — off the actor, so the spine logic is tested without a workspace.

The status ladder

A plan step used to be a bare done/not-done toggle. Now it reads its place on a ladder, driven by the labour flowing through it rather than hand-toggled:

todo → working → proposed → done
  • todo — nobody's on it.
  • working — an agent is on it; a WorkRef points here.
  • proposed — the agent came back with a proposal awaiting review.
  • done — landed, or approved into the diff.

done stays bridged to the old binary done flag (and the on-disk field), so existing call sites, TODO.md checkboxes, and files written before the ladder existed all still read true. The transitions are monotonic where it matters: reconciling a proposal set only ever walks a step forward toward review — a done step is never reopened.

How the loop closes

  1. Dispatch. On a plan step, send an agent (PlanCover) mints the WorkRef, marks the step working, and hands the ref to the squad via runTodo(work:). The plan cover also offers the batch verb — Send off the agents mints a ref for every todo step at once and hands the lot to runPlan, which works them in sequence under one squad task (falling back to a full architect run when no steps remain). It's paired with a quieter start coding yourself; both leave the plan surface (the loud one opening the squad), since home base is somewhere you return to, not somewhere you implement from.
  2. Carry. The dispatched squad member carries the ref; the proposal it raises inherits it, so the work stays tied to the intent it came from — no re-typing.
  3. Reconcile. When the squad's proposal set changes, RootView's proposal observer reconciles plan-backed proposals awaiting review to proposed (monotonic).
  4. Finish. Applying a proposal walks its step to done — the finish line the reconcile opens at proposed.

The plan cover becomes a live dashboard: each step shows its glyph and an in-flight word (working, review), so you read where the work is, not just what you typed.

Drafting a plan with AI

Forming a plan is the blank page a pull request never has — a fresh title and an empty intent box. Two sparkle buttons on the cover (PlanCover) let an AI help fill it:

  • Draft (next to the intent box) writes the "why" — two or three sentences in your voice — from the title, the linked issue, and any diff so far. It reads Redraft once there's intent to replace.
  • Suggest steps (next to the steps header) proposes three to six concrete steps, appended as normal, editable rows tagged ai so they read as suggestions to keep or drop, never auto-committed work.

Both live in PlanDraft (Plan/PlanDraft.swift) — the same pure-prompt-builder shape as DiffContext/DiffAudit: a system prompt, a pure …Prompt(…) builder, and a parseSteps that strips whatever list markers the model adds and de-duplicates, all tested without a model. The diff body reuses DiffContext.userPrompt, so a draft is grounded in the same budget-trimmed text the diff reviews send. Unlike the ambient on-device reads, drafting a plan is deliberate and once-per-plan, so it prefers the cloud (preferCloud) when a key is present and falls back to on-device — the DiffAudit policy, not the Pulse one.

The spine made visible

The same WorkRef surfaces on the two live lenses, so a running agent can say what it's serving, not just that it's busy. WorkRefChip (Plan/WorkRefChip.swift) is a compact capsule — the plan's title, led by the glyph of the step it's on — shown in the Activity tray's task rows and the AgentOversight header. It resolves against the live plans and renders nothing for a dangling ref. The glyph/tint/in-flight-word reading lives once on Plan.Step.Status, so the plan cover, the tray, and the oversight pane can't drift.

The tray also shows the path ahead, not just the work in flight. The current plan's not-yet-started steps (status todo) queue into a Next up band beneath the running rows — one row per step, in plan order, so the squad's backlog is visible alongside what it's already doing. A step leaves the band the moment it's dispatched (it becomes the agent working it) and rejoins nothing once done. The rows are queue markers, not selectable tasks: tapping one opens the plan rather than a shell. The banding is pure (ActiveTaskList), so "Next up" sits below Active and above Stale without any view-side ordering.

The versatility guardrail

The spine is structure when you want it, never mandatory overhead. kiln serves two users:

  • The planner: issue → plan → steps → dispatch → ship.
  • The cracker-on (the direct workflow mode case): chat to Potter, dispatch a signal, edit, ship the diff — no plan at all.

Ambient agents (Nanny, Pulse, Stewards), issue vetting, and CI fixes are maintenance, not intent — they stay unscoped. Only deliberate work threads through a plan. The value is the option to thread, not the obligation.