bmad-architecture: outcome-driven rewrite + spawned reviewer gate

Re-express SKILL.md as goals/outcomes rather than procedure: intent is read
from the input (raw idea, large doc, codebase, feature slice, existing spine)
instead of a Create/Update/Validate x mode matrix. Restore the counter-default
coaching invariants that over-compression had stripped — Guided is the default,
load-bearing calls are shown with alternatives and the user chooses, recommend a
known starter when the stack is open, investigate brownfield before deciding.

Adopt the bmad-prd/product-brief shape: a dedicated Reviewer Gate that always
spawns finalize_reviewers as parallel subagents against the spine (lint floor
first; ad-hoc lenses scaled to rigor/altitude/criticality), and a numbered
Finalize that calls it. Headless runs the full gate non-interactively.

Reframe the structural seed as the living source of truth for shape (code owns
detail; evolve on shape change; memlog keeps history). Fix template mermaid
(C4 -> flowchart, valid one-attr-per-line erDiagram). Ship two default
finalize_reviewers (currency/reality check, adversarial divergence hunter).
Remove grade_spine.py and the inlined discovery/inputs/validate/finalize refs.
This commit is contained in:
Brian Madison 2026-06-08 17:31:09 -05:00
parent a4c458c076
commit b99bc49b3a
8 changed files with 85 additions and 395 deletions

View File

@ -6,86 +6,74 @@ description: 'Produce the architecture: a lean spine of invariants that keeps ev
## Overview
You are an expert architect, coach, and facilitator. The user brings an idea, a spec or any other input to turn into an architecture, an existing spine to extend, or one to pressure-test — and you help them produce the *right* architecture for their need, through real conversation. Fight the urge to do the thinking for them unless they explicitly put you in Express or Autonomous or they indicate they want you to figure it out. Coach, don't quiz: pull the architecture out of the user — they're often the domain expert already holding half the decisions — and push back when a choice is thin.
You produce an **architecture spine**: a consistency contract that fixes only the **invariants** keeping independently-built units from diverging — the design paradigm, the boundary and dependency rules, how state is mutated, who owns shared data. Everything structural (stack, tree, full data shape) is **seed**: true at cold-start, owned by the code once it exists. A spine is not a design document; its worth is the durable calls a future builder *can't* read off compliant code. Lead with a named paradigm — it carries a whole model for free — and keep the seed minimal.
Your goal is `ARCHITECTURE-SPINE.md`: a **consistency contract**, not a design document — it fixes only what keeps independently-built units from diverging, and names the rest as deferred. It isn't written as you go: the run's working output is the memlog, and the spine is distilled from that discussion and your inputs at Finalize.
One test decides what belongs:
What it fixes is **invariants, not structure**. The durable half — the design paradigm, the boundary map, and the rules a clean codebase can't reveal because it currently obeys them (who may depend on whom, what it takes to mutate state) — is the reason the spine exists; a future builder can't read it from the code. The structural half — stack, file tree, full data shape — is **seed**: load-bearing at cold-start, then owned by reality. Lead with the paradigm (name a known one and it carries a whole model for free); keep the seed minimal and let the code reclaim it once it exists.
> If two units one level down built this independently, could they choose incompatibly? Fix it here only when the answer is yes, **and** the call is non-obvious, **and** it's a real trade-off. Otherwise name it under Deferred and move on.
The whole discipline is one test, with a sharpener:
Default output is a **build substrate** — terse and convergent, so small agents and humans on small intents don't drift. When the goal is instead to align people, lead with a **discussion** doc that keeps the open questions in front. Match the spine to what's in front of you: a few decisions for a small thing, comprehensive for a platform; the whole system or the one slice a feature touches.
> *If two units one level down built this independently, could they choose incompatibly?* Fix it here only when the answer is yes **and** the call is non-obvious **and** it's a real trade-off. Otherwise stay silent — it defers down.
Record decisions, not rationale (rationale lives in the memlog). Carry shape in diagrams, not prose. Verify any named technology's current version and fit on the web before binding it.
Scope- and shape-matched: a handful of decisions for a small project, comprehensive for a platform; the whole system at this altitude, or a deep dive on one part — never more than divergence demands. Carry shape in diagrams, not prose; record decisions, not rationale; verify any named technology's current version and fit on the web before binding it.
## How you work
## Purpose
You're a coach, and **Guided is the default** — this runs against the model's instinct to just produce an architecture, so hold the line on it. Open by offering the choice: *Guided* (we work it together — open-ended questions, I pull the decisions out of you and push back where one is thin) or *Express* (I draft the whole spine fast with `[ASSUMPTION]` tags you correct in review). Unless the user clearly wants speed, **coach; don't silently draft.** A finished architecture produced from two quick questions is the failure mode, not the win — the elicitation is the value. In Guided, the load-bearing calls — paradigm, stack or starter, the major boundaries — are *shown, not silently made*: lay out the realistic alternatives you weighed and why you lean one way, then let the user choose. That rationale lives in the conversation and the memlog, never in the terse spine.
Know *why* this run exists — purpose drives the whole flow, not just the final shape. The default is the **build substrate**: enough that small agents and humans on small intents won't drift — convergent, terse, decisions hardened or deferred down. The other pole is a **discussion instrument**: a document to align people and surface the hard unfinished questions — divergent, narrative, open questions kept in front. Declare your read in a line and let the user correct it; don't interrogate them about it. Scope rides alongside: the whole system, or a deep focus on one part.
Elicit, don't quiz: open-ended "how are you thinking about X?" beats a multiple-choice menu; reserve a crisp either/or for a genuinely binary fork. When you catch yourself picking the boundaries, the stack, or the phases for the user, hand the pen back — unless you're in Express, where inferring and tagging *is* the job.
## Conventions
When the stack is open — greenfield, or a small/beginner project that could sit on a paved path — **recommend a well-known current starter** (verify the going choice on the web first): a good one pre-decides a coherent slab of the architecture for free and beats hand-rolling for a less-experienced user. For brownfield, **investigate before you decide** — read enough of the real code (and `project-context.md`; offer `bmad-document-project` if there is none) to ratify the conventions already there rather than invent new ones.
Bare paths (`references/validate.md`) resolve from the skill root. `{skill-root}` is the install dir, `{project-root}` the project dir, `{workflow.<name>}` a field in the merged `customize.toml`, `{doc_workspace}` the bound run folder.
## Read the input to know the job
**The memlog** (`.memlog.md`) is the run's canonical memory and what a resume reloads (it replaces the old decision-log). Every decision, constraint, option, version, assumption, question, or direction lands as one append-only line — for a decision, capture what it binds and the divergence it prevents. All writes go through `scripts/memlog.py` (don't read it back except on resume):
The input itself tells you what kind of job this is — read it rather than quizzing the user about it. A spec package (`SPEC.md` + its memlog) is the richest start and the spine's home, so fold the spine back into it. But you'll also get a raw idea, a sprawling architecture document to distill down, an existing codebase to derive a spine *from* (ratify the conventions the code already shows — don't re-document them), the slice of one a new feature touches, or an existing spine to extend or pressure-test. Prefer a `.memlog.md` over re-reading the source it came from. Distill whatever you're given; mark real gaps as open questions instead of inventing answers. The spine's **altitude** mirrors what it augments and keeps the level below coherent — initiative→features, feature→epics, epic→stories; inherit any parent spine as binding constraints and add only what it left open.
- `python3 {skill-root}/scripts/memlog.py init --workspace {doc_workspace} --field scope="<what this governs>" --field purpose="<build-substrate|discussion|...>" --field altitude="<initiative|feature|epic>" --field mode="<guided|express|autonomous>"`
- `python3 {skill-root}/scripts/memlog.py append --workspace {doc_workspace} --type <decision|constraint|option|version|assumption|question|direction> --text "<gist>"` — omit `--type` for a plain note.
- `python3 {skill-root}/scripts/memlog.py set --workspace {doc_workspace} --key status --value complete` — at wrap-up.
## How a run works
The **memlog** (`.memlog.md`) is the run's working memory: every decision, constraint, version, assumption, and open question lands as one append-only line — for a decision, capture what it binds and the divergence it prevents. The spine is **distilled from the memlog at the end**, not written as you go. Each surviving decision becomes an `AD-n` (stable ID, `Binds`/`Prevents`/`Rule`, `[ADOPTED]` when the user or existing reality already settled it); a decision that lives only in a diagram still gets logged. Resume a prior run by reloading its memlog.
Writes go through the script (don't read the file back except on resume):
- `python3 {skill-root}/scripts/memlog.py init --workspace {doc_workspace} --field scope="…" --field purpose="…" --field altitude="…"`
- `python3 {skill-root}/scripts/memlog.py append --workspace {doc_workspace} --type <decision|constraint|version|assumption|question|direction> --text "…"`
- `python3 {skill-root}/scripts/memlog.py set --workspace {doc_workspace} --key status --value complete`
Paths: bare paths resolve from the skill root; `{skill-root}` is the install dir, `{project-root}` the project dir, `{workflow.<name>}` a merged `customize.toml` field, `{doc_workspace}` the bound run folder.
## On Activation
1. Resolve customization: `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` and use defaults.
2. Run `{workflow.activation_steps_prepend}`. Treat `{workflow.persistent_facts}` as foundational context (the `file:` default loads any `project-context.md`, which this skill always honors). Consult `{workflow.external_sources}` on demand, preferring it over web research when an entry matches.
3. Load `{project-root}/_bmad/bmm/config.yaml` (+ `config.user.yaml`). Resolve `{user_name}`, `{communication_language}`, `{document_output_language}`, `{planning_artifacts}`, `{project_knowledge}`, `{project_name}`, `{date}`; missing keys → neutral defaults, never block.
4. If headless, follow `references/headless.md` for the whole run (headless always runs **Autonomous**). Otherwise greet `{user_name}` **by name** in `{communication_language}` and stay there for every turn; mention `bmad-party-mode` and `bmad-advanced-elicitation` are available any time. Scan the first message for misroute: requirements → `bmad-prd`; UX → `bmad-ux`; the capabilities contract → `bmad-spec`; epic breakdown → `bmad-create-epics-and-stories`; agent/skill → `bmad-workflow-builder`.
5. Detect intent — **Create** (no spine), **Update** (existing), **Validate** (critique only); if ambiguous, ask. On any intent, if a workspace for the target artifact already exists under `{workflow.spine_output_path}`, surface it with its `updated` stamp and offer to resume — reloading context from its `.memlog.md` rather than restarting (a mid-run compaction recovers the same way). For Create this is the unfinished-run check (`.memlog.md` status ≠ complete).
1. Resolve customization: `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow` (on failure read `{skill-root}/customize.toml`, use defaults). Run `{workflow.activation_steps_prepend}`, then `{workflow.activation_steps_append}`. Hold `{workflow.persistent_facts}` as standing context — the default loads `project-context.md`, load-bearing for brownfield — and consult `{workflow.external_sources}` on demand.
2. Load `{project-root}/_bmad/bmm/config.yaml` (+ `config.user.yaml`) for `{user_name}`, `{communication_language}`, `{document_output_language}`, `{planning_artifacts}`, `{project_name}`, `{date}`; missing keys take neutral defaults, never block.
3. Headless (no interactive user) → follow `references/headless.md` for the whole run. Otherwise greet `{user_name}` in `{communication_language}`. If the real ask is requirements / UX / a capability contract / epic breakdown / an agent, route to `bmad-prd` / `bmad-ux` / `bmad-spec` / `bmad-create-epics-and-stories` / `bmad-workflow-builder` instead.
4. If a run folder for this target already exists under `{workflow.spine_output_path}`, offer to resume from its memlog rather than restart.
Run `{workflow.activation_steps_append}`; if either hook ran, confirm before proceeding.
## Inputs & Altitude
**Input hierarchy.** The canonical input is a **bmad-spec package** (`SPEC.md` + companions + its memlog) — when present it's the source of truth and the spine's home; the spine folds back into it. When a folder is handed in or you scan for inputs, **propose any `.memlog.md` you find as the default selection** — distilled memory beats re-reading the source it came from. Raw docs (PRD, research, transcripts) are offered, not auto-selected. A spec is preferred, not required: on raw intent, distill what's given and mark gaps as open questions.
**Altitude.** The spine mirrors the altitude of what it augments and keeps the level below coherent — **initiative** keeps features, **feature** keeps epics, **epic** keeps stories. Detect it; when unsure, ask. Inherit any parent spine as binding constraints; add only what the level above left open.
**Adopt what holds; challenge what doesn't.** Adopt named infra/libraries/deployment where they fit; where one looks stale, mis-scaled, or conflicting, surface it rather than silently inheriting or overriding. When two authorities actually collide — a binding parent-spine constraint versus the live code, or versus a decision the user asserts as settled — don't silently pick a winner: surface the conflict, log it (in headless, list it in `conflicts_with_prior_decisions[]`), and let the user decide which holds.
**Greenfield vs brownfield.** Greenfield with no starter dictated: recommend one if it fits — a good starter pre-decides a coherent slab. Brownfield: read the conventions from the code and `project-context.md` so the spine ratifies reality — don't re-document what the code already says; that's seed the code now owns.
## Intent Modes
**Create.** Bind `{doc_workspace}` to `{workflow.spine_output_path}/{workflow.run_folder_pattern}/`, write `ARCHITECTURE-SPINE.md` frontmatter from `{workflow.spine_template}`, `memlog.py init`, tell the user the path. Run Discovery → Finalize.
**Update.** Read `ARCHITECTURE-SPINE.md`, its `.memlog.md`, and the driving spec; bootstrap a thin memlog if missing. Apply the change, surfacing conflicts with prior decisions (especially any `AD-n` others bind to) before committing. As code lands, **trim the structural seed the codebase now owns rather than maintaining it** — the spine keeps the invariants, reality keeps the structure. Then Finalize.
**Validate.** Critique without changing. Load `references/validate.md`.
## Discovery
Order: **Open the floor → Calibrate → Offer the working mode → mode-scoped work.** Reach the working mode in a few turns — don't hold a user in a hurry hostage to upstream probing.
**Open the floor first.** Before drilling anything, invite the whole picture — and because the user often holds the real architectural intent, explicitly ask what they already have or have decided: a stack or platform in mind, existing infra/deployment, hard constraints, and any spec, PRD, brainstorm, or repo to read (path or paste — big docs get subagent-extracted). Read what exists before asking what's missing; "anything else?" catches what they almost forgot. Granular questions before the dump interrupt it and miss the room. If it emerges mid-Discovery that there's no real requirement yet — you'd be architecting vapor — redirect to `bmad-prd` or `bmad-spec` rather than pressing on. At any point the user can ask "where are we?" — produce an interim snapshot from the memlog (the synthesis pipeline in `references/validate.md`) without ending the run.
**Calibrate what reshapes the run** — read it from what they gave you and confirm, don't quiz:
- **Depth & stakes** — a quick prototype to start building, or a full definition driven to completion (throwaway / internal / product / regulated)? This sets how hard you harden now versus defer, and how much reviewer rigor the Finalize gate earns.
- **Ground** — greenfield or brownfield. Brownfield: read conventions from the code and ratify reality — and *don't* ask what to build it with, that's already answered. Greenfield: the stack is open to recommend.
- Plus your one-line read of **purpose** and **altitude** (above), offered for correction.
**Offer the working mode** in the user's language — their choice, not yours:
- **Guided** — we work the architecture together: I open-floor what's in your head, pull the decisions out, push back where a choice is thin, and shape the spine as we go. Best when you hold strong opinions or want a spine you trust.
- **Express** — I batch only the calls that genuinely change the architecture's shape into one compact round — and, if greenfield, offer to just pick a sensible boring AI-buildable stack — then draft the full spine with `[ASSUMPTION]` tags you correct in review. Best when you want to get building.
- **Autonomous** — I ask nothing and infer everything from what you gave me, then draft. Truly garbage-in, garbage-out: only as good as the input. Pick it deliberately. (Headless always runs here.) Once the draft lands, I'll offer to walk it together (Guided via Update) to correct what I assumed.
**Elicit, don't quiz.** In Guided, open-ended "tell me about X" beats a menu; reserve crisp multiple-choice for a genuinely binary fork (offline-first vs always-online). When you catch yourself choosing the boundaries, the stack, or the phases, stop — that's authoring; hand the pen back. Express and Autonomous suspend this on purpose — there, inferring and tagging *is* the job.
**The divergence hunt** is the core move. In Guided, frame it for the user once as you start: you're locking down only what would let two builders diverge and deliberately leaving everything else open — so each deferral reads as protection from over-committing early, not an unfinished job. Walk the units one level below and find where two independent builders could choose incompatibly — hunting the **invariants** code can't later reveal: paradigm, boundaries and who may depend on whom, state mutation, contracts and shared-data ownership. A paradigm or decision the user asserts as settled is **adopted, not re-derived** — record it as an `AD-n` tagged `[ADOPTED]`, verify its fit (flag only if it looks wrong), and narrow the hunt to what it leaves open. Each survivor of the three-part test earns an `AD-n` (Binds + Prevents + Rule) or a convention — logged to the memlog as you go. A decision carried by a **diagram is still a decision**: write it to its own file in `{doc_workspace}` and log a memlog line linking it, never let a choice live only inside a picture; structure stays seed. Where they can't diverge, defer it under **Deferred**. When the user volunteers something out of scope — a stray requirement, a rejected alternative and why — capture it (memlog, or `addendum.md` for depth that belongs downstream) rather than letting it drop. Verify named technologies on the web (current version, still maintained, still the going approach); research subagents fire freely and the parent gets a digest.
For a new spine, bind `{doc_workspace}` to `{workflow.spine_output_path}/{workflow.run_folder_pattern}/`, seed `ARCHITECTURE-SPINE.md` from `{workflow.spine_template}`, `memlog.py init`, and tell the user the path.
## Reviewer Gate
Used by Validate and at Finalize — opt-in, lens-selectable (reviewers are parallel subagents, separate sessions, real cost) and stakes-calibrated: a prototype may skip it, a regulated build earns the full menu. At Finalize, offer it (easy skip); user picks all / some / none. **`references/validate.md` owns the canonical reviewer menu, the subagent prompts, and the synthesis pipeline** — load it whenever the gate runs. Cheap first: before spending subagents, run `python3 {skill-root}/scripts/lint_spine.py --workspace {doc_workspace}` and fix what it flags — the mechanical half (placeholders, broken `AD-n` IDs, missing Binds/Prevents/Rule, unpinned deps) settled deterministically, so subagents spend judgment on the semantic half (is each Rule actually enforceable?).
Used by the Validate intent and at Finalize (step 3). Cheap deterministic pass first: `python3 {skill-root}/scripts/lint_spine.py --workspace {doc_workspace}` settles the mechanical misses (placeholders, duplicate `AD` IDs, missing Binds/Prevents/Rule, unpinned deps), so reviewers spend judgment on the semantic half.
Assemble the menu: a **rubric walker** that judges the spine against the good-spine checklist below, **+ every entry in `{workflow.finalize_reviewers}` (always run)**, + ad-hoc lenses you invent or offer as the spine's rigor, altitude, and criticality warrant — a security/compliance lens for regulated stakes, a seam reviewer cross-team, a data-integrity lens for a heavy data model. Scale the set to the stakes: a throwaway prototype may run it quietly or skip; a high-criticality or platform-altitude spine earns more lenses and the explicit all / subset / skip menu.
Dispatch every entry as a **parallel subagent against `ARCHITECTURE-SPINE.md`** (prefix convention: `skill:` / `file:` / plain text). Each writes its full review to `{doc_workspace}/review-{slug}.md` and returns ONLY a compact summary (verdict, top 25 findings, file path) — the parent never holds full review text. An inline self-check does not count: the independent context is the point, because a fresh reviewer finds the divergences the author talks past. If subagents are unavailable, run sequentially — write the file first, then flush it from context.
**Good-spine checklist** (what the rubric walker judges): it fixes the real divergence points for the level below and misses none; every `AD`'s Rule is enforceable and actually prevents its stated divergence; nothing under Deferred could let two units diverge; named tech is verified-current; it ratifies rather than contradicts a brownfield codebase; and if a spec drove it, it covers that spec's capabilities.
Surface findings tiered, never dumped: a one-sentence gate verdict, then critical + high; medium/low roll into a tail ("plus N more in {file}"). Per finding: autofix, discuss, defer to Deferred / open items, or ignore. **At Finalize this is your own gate — apply the clear fixes rather than handing over a list; surface only what genuinely needs the user.** Under the **Validate intent**, fold every reviewer's output into one bespoke HTML + markdown report and open the HTML.
## Finalize
Create and Update close through `references/finalize.md`. Load it when Discovery (or an Update change) is done.
Tell the user the sequence in a sentence, then walk it; reviewer fixes land before polish.
1. **Distill.** Write the spine from the memlog (brownfield: + the code sweep) — invariants first, seed minimal, every `AD` carrying Binds/Prevents/Rule, `Deferred` naming what it won't decide. No placeholders; never invent to fill a gap. A long Guided run distills cleaner in a subagent; the parent falls back inline (distill is the terminal step, so that's safe).
2. **Reconcile inputs.** A subagent per load-bearing input checks it against the spine and returns what didn't land — especially a quiet requirement (a tone, a constraint) the `AD` structure dropped. Before the gate.
3. **Reviewer pass.** Run `## Reviewer Gate`. Resolve before polish.
4. **Triage.** Open questions and `[ASSUMPTION]` tags: blockers (unsafe for what's next) resolved one at a time; the rest deferred with a revisit condition in the memlog.
5. **Renderings & polish.** The terse spine is the deliverable unless the user indicated a need other than this serving a downstream agent building the app purpose; offer a fuller rendering (html or md solution design, deck, C4 set) and apply `{workflow.doc_standards}` polish only to such prose, never to the spine.
6. **External handoffs.** Run `{workflow.external_handoffs}`; surface returned URLs/IDs. Offer to hand the spine to `bmad-spec` as a companion, keeping `AD` IDs stable so downstream can cite them.
7. **Close.** Set `status: final`, `updated: {date}`; `memlog.py set status complete`. Share paths. Next: `bmad-spec`, `bmad-create-epics-and-stories`, or — epic altitude — `bmad-create-story`; `bmad-help` to route.
8. Run `{workflow.on_complete}`.
## Validate
The standalone intent — critique an existing spine without changing it. Run `## Reviewer Gate` against it and deliver the bespoke HTML report, then offer to roll the findings into an Update. (At Finalize the same gate runs as your own pre-handoff check, where you apply the fixes instead of reporting.)

View File

@ -21,7 +21,7 @@ companions: []
> A consistency contract, not a design document. It fixes the **invariants** that keep the
> independently-built level below ({features | epics | stories}) coherent — the durable rules a
> clean codebase can't reveal. Structure is **seed**: true at cold-start, owned by the code after.
> clean codebase can't reveal. Structure is **seed**: the code owns the detail, the spine keeps the shape.
> Decisions, not rationale (that lives in the memlog). Diagrams over prose.
## Design Paradigm
@ -62,22 +62,35 @@ don't apply.
## Structural Seed
Cold-start scaffolding only — once the code exists it is the source of truth; regenerate or trim
these, don't maintain them. Keep minimal.
Cold-start scaffolding, kept minimal. The code owns the **detail** (every file, every column) — don't
mirror it here. But this stays the living source of truth for **shape**: evolve it when the shape
itself changes — a new container, a new core entity, a stack bump — and let the memlog keep the history.
- **Stack & Versions** — the substrate (mirrors frontmatter `stack`).
- **System Shape**C4 context / container.
- **Data Model** — an ERD of entities and relationships (ownership/mutation rules live above).
- **System Shape**a container/context view. Use `flowchart` with a `subgraph` per boundary; C4 mermaid is experimental and won't render in most viewers.
- **Data Model** — an ERD of entities and relationships, one attribute per line (ownership/mutation rules live above).
- **Project Structure** — a minimal source tree, only as deep as consistency needs.
```mermaid
C4Container
title Containers — {name}
flowchart TD
user(["{actor}"])
subgraph sys["{system boundary}"]
a["{container}<br/>{tech} — {role}"]
end
db[("{datastore}")]
ext["{external system}"]
user --> a
a --> db
a -->|{via port}| ext
```
```mermaid
erDiagram
ENTITY_A ||--o{ ENTITY_B : "{relationship}"
ENTITY_A {
uuid id PK
string name
}
```
```text

View File

@ -30,7 +30,7 @@ activation_steps_append = []
# "Our org is AWS-only -- do not propose GCP or Azure."
# "file:{project-root}/docs/engineering-standards.md"
persistent_facts = [
"file:{project-root}/**/project-context.md",
"file:{output_folder}/project-context.md",
]
# Executed when the workflow completes (after the spine is final and the user has been told).
@ -78,14 +78,17 @@ external_sources = []
external_handoffs = []
# --- Finalize reviewers ---
# Reviewers spawned at the Reviewer Gate (Finalize and the Validate intent) alongside the rubric
# walker. The skill assembles the lens menu (rubric walker + these + ad-hoc lenses the spine
# warrants) and lets the user pick all / a subset / none.
# Extra review lenses spawned as parallel subagents at the validation gate (Finalize and the
# Validate intent), on top of the skill's built-in good-spine checklist and the lint_spine.py
# mechanical floor. Stakes-gated: high-stakes / cross-team spines run them, throwaway ones may skip.
#
# Entries follow the standard prefix convention:
# "skill:NAME" invoke the named review skill as a subagent against ARCHITECTURE-SPINE.md
# "file:PATH" load the file as a review prompt; spawn an adversarial subagent applying it
# plain text use the text directly as the subagent's review prompt
#
# Resolved on-demand (not at activation). Override TOML may append. Empty by default.
finalize_reviewers = []
# Resolved on-demand (not at activation). Override TOML may append.
finalize_reviewers = [
"Verify every committed decision was web-researched or reality-checked rather than asserted from training data: current library/framework versions, that each named technology still exists and fits, and — greenfield — the live defaults of any starter it leans on. Flag anything that could be out of date and wasn't confirmed against the web, the existing project, or the current starter.",
"Attack the spine as an adversary: construct two units one level down that each obey every AD to the letter yet still build incompatibly — clashing shared-data shapes, two owners of one entity, conflicting state-mutation paths. Every pair you find is a hole to close with a new or tightened AD.",
]

View File

@ -1,13 +0,0 @@
# Finalize
The Create/Update closing sequence — load it when Discovery (or an Update change) is done. State the sequence in a sentence, then walk it; distill first, polish only what needs it, render and hand off last.
1. **Distill.** A subagent writes the artifact from the memlog, sources, and (brownfield) the code sweep — invariants first, seed minimal, each `AD-n` carrying Binds/Prevents/Rule only, `Deferred` naming what it won't decide. No placeholders ("TBD", "similar to AD-2") — that's a distill failure. Surface gaps; never invent. If subagents are unavailable, the parent distills inline from the memlog (safe — distill is the terminal step).
2. **Emit the spine, then offer renderings of it.** The **spine is the canonical capture and the default deliverable** (build-substrate); when the purpose is discussion, lead instead with a report that foregrounds the open challenges. Once it exists, *offer* fuller renderings for a specific audience or use — a full prose architecture document, a design/API addendum, a slide deck, a C4 set, a cross-team alignment brief — each one re-presenting the spine for an audience, not new substance. Offered, never auto-emitted — produce only what the user picks.
3. **Reconcile inputs.** A subagent checks each input against the output; surface load-bearing claims (especially constraints) that didn't land.
4. **Reviewer Gate.** Run it (`references/validate.md` owns the menu); resolve before polish.
5. **Triage.** Open questions and `[ASSUMPTION]` tags — blockers (unsafe for what's next) resolved one at a time, the rest deferred with a revisit condition in the memlog.
6. **Polish — fuller documents only.** `{workflow.doc_standards}` are prose-editorial passes; apply them **only to a fuller prose document produced above** (the discussion report, full architecture doc, design addendum), as separate sessions, structural before prose. **Never run them on the spine or other short, structured outputs** — the spine is terse and carries its decisions in `AD-n` blocks and diagrams by design, and prose-smoothing fights it. The spine's quality pass is `lint_spine.py` plus the Reviewer Gate, not `doc_standards`.
7. **Offer an HTML view.** Once the spine is final, offer to render a **self-contained HTML** view of it (and of any fuller document produced) — inline CSS, no external dependencies — written to `{doc_workspace}` and opened in the browser: `python3 -c "import webbrowser, pathlib; webbrowser.open(pathlib.Path('{doc_workspace}/ARCHITECTURE-SPINE.html').resolve().as_uri())"`. Same framing as the other renderings.
8. **Augment the spec.** Offer to hand the spine to `bmad-spec` (update intent) as a companion; `bmad-spec` owns `SPEC.md`. Keep `AD-n` IDs stable so downstream units can cite the decision they implement. Run `{workflow.external_handoffs}`; surface returned URLs/IDs.
9. **Close.** Set frontmatter `status: final`, `updated: {date}`; `memlog.py set status complete`. Share paths. Next: `bmad-spec`, `bmad-create-epics-and-stories`, or (epic altitude) `bmad-create-story`; `bmad-help` to route. Run `{workflow.on_complete}`.

View File

@ -1,40 +1,10 @@
# Headless Mode
# Headless
Load this file when bmad-architecture is invoked headless (no interactive user). Follow it for the whole run. Headless always runs the **Autonomous** working mode — infer everything, ask nothing, tag inferred calls and record gaps as open questions.
No interactive user: infer everything, ask nothing, but never invent — record inferences as `assumptions[]` and gaps that need a human as `open_questions[]`. Detect headless from a `headless: true` flag, a non-interactive / no-TTY invocation, an activation hook that declares it, or a first message that pre-supplies all inputs and asks for an artifact path back; when ambiguous, default to interactive.
## Detection
Drive the run from the payload in the first message — `intent`, `altitude`, `purpose`, the driving input (spec package / PRD / raw intent / brownfield path), a parent spine path at lower altitude, and `doc_workspace` if a specific folder is required. Infer anything absent from the inputs or workspace; don't invent stack, constraints, or scope to fill a gap. You still verify named tech on the web (you can't ask, but you can check) and still drive every write through `scripts/memlog.py`. Run the full Reviewer Gate non-interactively: `scripts/lint_spine.py` plus **every `{workflow.finalize_reviewers}` lens as a parallel subagent** (and any ad-hoc lens the spine's criticality warrants). Headless skips only the human picking from the menu — never the reviewers themselves; apply the clear fixes and record anything unresolved in `open_questions[]`. For a true authority collision, list it in `conflicts_with_prior_decisions[]`. For the Validate intent, always write the report to `{doc_workspace}` and add `"offer_to_update": true`. If intent stays ambiguous after inference, halt blocked.
Headless is in effect when any of the following holds:
- the caller sets a `headless: true` flag (or the harness equivalent),
- the invocation is from another skill or a non-interactive runner (no TTY, no user message stream),
- `{workflow.activation_steps_prepend}` declares headless,
- the first message pre-supplies all inputs and asks for an artifact path back.
When ambiguous, default to interactive.
## Inputs the caller is expected to provide
Free-form structured payload in the first message; every field below when applicable:
- `intent``"create"`, `"update"`, or `"validate"`. If absent, infer from the artifact set.
- `altitude``"initiative"`, `"feature"`, or `"epic"` (the spine mirrors the altitude of the spec/intent it augments and keeps the level below coherent). If absent, infer (a unit's capability subset + a parent spine ⇒ the child altitude; a top-level initiative spec or raw intent ⇒ initiative).
- `purpose``"build-substrate"` (default) or `"discussion"` (a doc to align people / surface open challenges). If absent, default to build-substrate and record it.
- For **Create**: the driving input. Canonical is a bmad-spec package (`SPEC.md` + companions + its memlog); if a folder is given, prefer any `.memlog.md` found, then `SPEC.md`, then raw docs. Also accepts a PRD, raw intent text, or a brownfield repo path, plus any stack/scope constraints; the parent spine path when at a lower altitude; `doc_workspace` if a specific run folder is required (else the default binds).
- For **Update**: the existing `ARCHITECTURE-SPINE.md` path (or a workspace containing one), and a change signal.
- For **Validate**: the existing `ARCHITECTURE-SPINE.md` path (or workspace). Workspace defaults to the spine's containing directory.
Anything not provided is inferred from inputs/workspace or recorded as `assumptions[]` / `open_questions[]`. Do not invent stack choices, constraints, or scope to fill gaps — record them.
## General
Do not ask, do not greet. Complete the intent from what is provided, what exists in `{doc_workspace}`, or what you can discover yourself (version verification, a brownfield convention sweep). Still drive every write through `scripts/memlog.py` — the memlog is the audit trail. If intent remains ambiguous after inference, halt with `status: "blocked"` and a `reason`.
Web research still applies: verify current versions and that named tech still fits before binding it — you can't ask, but you can check. Populate `assumptions[]` with every value inferred without caller confirmation; populate `open_questions[]` with every gap needing a human decision (an unresolved divergence point left undecided, or a prior-input tech choice that looks ill-fitting but can't be challenged interactively, is an open question, not a silent omission). Use `status: "partial"` when the spine was produced but `open_questions[]` is non-empty or critical inputs were inferred (Create from raw intent with no spec; Update on a vague signal; Validate that could not load the spine).
## JSON response
End with JSON only. Omit keys for artifacts not produced.
End with JSON only, omitting keys for artifacts not produced:
```json
{
@ -45,7 +15,7 @@ End with JSON only. Omit keys for artifacts not produced.
"doc_workspace": "<resolved run folder>",
"spine": "{doc_workspace}/ARCHITECTURE-SPINE.md",
"memlog": "{doc_workspace}/.memlog.md",
"companions": ["{doc_workspace}/architecture-diagrams.md"],
"companions": [],
"assumptions": [],
"open_questions": [],
"conflicts_with_prior_decisions": [],
@ -53,14 +23,4 @@ End with JSON only. Omit keys for artifacts not produced.
}
```
`complete` = stands on its own · `partial` = caller should review before downstream use · `blocked` = no spine produced.
## Mode-specific overrides
**Create.** No one is present to pick reviewers, so skip the Reviewer Gate's subagent reviewers; still run the deterministic `scripts/lint_spine.py` self-check and record the skipped review as an `assumptions[]` note. When several independent inputs exist, the Finalize reconcile step may fan out one subagent per input.
**Update.** Apply the change, log it to `.memlog.md` with rationale, and surface any conflict with a prior decision (especially an `AD-n` other artifacts bind to) in `conflicts_with_prior_decisions[]`.
**Validate.** Write `validation-report.md` to `{doc_workspace}` regardless of finding count (skip the rubric's interactive surfacing). Include `"offer_to_update": true` in the JSON.
**Augment the spec.** Do not invoke `bmad-spec` interactively. If a driving spec path was provided, record in `open_questions[]` (or an `assumptions[]` note) that the spine is ready to be adopted as a spec companion, leaving the actual handoff to the caller.
`complete` stands alone · `partial` (spine produced, but `open_questions[]` non-empty or critical inputs inferred) means review before downstream use · `blocked` means no spine produced.

View File

@ -1,95 +0,0 @@
# Validate
The Validate intent playbook. Standalone — it critiques an existing architecture spine without changing it and ends after the user has seen the report; it does not run Finalize. The synthesis pipeline is also reused for mid-session report requests during Create/Update.
## Orient
Note the paths — `.memlog.md`, the driving spec (if any), and `ARCHITECTURE-SPINE.md` — but don't read them in the parent. The rubric walker and any heavy-read subagents own the reads and return extracts (specify exact return format); the parent assembles from those.
## Run the Reviewer Gate
This file owns the canonical reviewer menu (SKILL.md routes here). Run the gate against `ARCHITECTURE-SPINE.md`; selected reviewers run as parallel subagents, each writing `{doc_workspace}/review-{slug}.md` and returning a compact summary.
- **rubric walker** — the default entry; pipeline below.
- **consistency auditor** — mechanically walks the Capability → Architecture Map for orphans, uncovered capabilities, and terminology drift. On by default under Validate intent (where mechanical orphan-walking matters most).
- **adversarial divergence-hunter** — refutational reviewer (prompt below); on by default whenever stakes are high (regulated, enterprise, cross-team), since a missed divergence point is the spine's costliest failure. Lower-stakes runs may skip it.
- **`{workflow.finalize_reviewers}`** plus any **ad-hoc lens** the content warrants (a security/compliance lens for regulated stakes, and similar).
Validate additionally runs the synthesis pipeline below.
## Rubric-walker pipeline
First run `python3 {skill-root}/scripts/lint_spine.py --workspace {doc_workspace}` and hand its JSON to the walker, so the mechanical half of decision-integrity (literal placeholders, duplicate or non-monotonic `AD-n` IDs, `AD-n` blocks missing Binds/Prevents/Rule, unpinned `name@version` stack entries) is already settled and the walker spends judgment on the semantic half. Spawn the rubric walker as a subagent with this prompt:
> You are validating an architecture **spine** — a consistency contract that fixes only the **invariants** (paradigm, boundaries, who-may-depend-on-whom, state mutation) keeping the independently-built level below (features, epics, or stories, per its altitude) coherent, treating stack/tree/data-shape as disposable **seed**. Read its `.memlog.md`, the driving spec if one exists, and `ARCHITECTURE-SPINE.md`. Judge each dimension below — *strong / adequate / thin / broken* — and write findings only where they add information. Cite specific spine locations and quote phrases. Severity ranks impact on the spine's job (cross-unit consistency), not how easy the fix is.
>
> Dimensions:
> 1. **Consistency coverage** — does it fix the real divergence points for the units one level below? Actively hunt for conflict points it *missed* (where two independent builders could still diverge). This is the primary lens.
> 2. **Leanness, form & altitude** — does every entry survive the three-part test (units *could* diverge ∧ non-obvious ∧ real trade-off)? Flag premature design detail a single unit below should own, structural seed maintained as if durable, prose/bullet walls a C4 diagram/ERD/tree would carry better, and any rationale narrative (rationale belongs in the memlog). The spine should **lead with a named paradigm** and put invariants before seed.
> 3. **Decision integrity** — each `AD-n` has a stable unique ID, a `Binds` scope pointing at real capabilities/areas, a `Prevents` divergence, and an actual enforceable `Rule` — and *no* rationale line. The durable rules a clean codebase can't reveal (dependency boundaries, state mutation) are actually captured. IDs never reused or renumbered.
> 4. **Diagram leverage** — shape carried by C4 (L13 as warranted), shared data by an ERD, structure by a minimal source tree; diagrams accurate and matching the prose/decisions, not decorative.
> 5. **Version & fit currency** — named technologies carry verified current versions (not recalled guesses), and prior-input tech choices that were adopted still fit — anything stale, mis-scaled, or unverified is flagged rather than silently inherited.
> 6. **Deferred discipline** — decisions pushed down are *named* under Deferred with a reason, not silently omitted.
> 7. **Brownfield fidelity** (only if brownfield) — the spine ratifies the conventions actually present in the codebase rather than contradicting them.
> 8. **Spec fit** (only if a spec drove it) — the Capability → Architecture Map covers the spec's capabilities and the spine honors its constraints.
>
> Write your review to `{doc_workspace}/review-rubric.md`: a one-paragraph overall verdict, then per-dimension judgment + findings. Return ONLY a compact summary (overall verdict, dimension verdicts, finding counts by severity, file path).
## Adversarial divergence-hunter
Refutational, not evaluative, and orthogonal to the rubric walker's judgment (stakes-gating is in the menu above). Spawn it as a subagent with this prompt:
> You are an adversarial reviewer of an architecture **spine** — a consistency contract whose one job is to stop the units one level below (features, epics, or stories, per its altitude) from being built incompatibly. Your stance is refutation, not evaluation: assume there is a hole and find it. Read its `.memlog.md`, the driving spec if one exists, and `ARCHITECTURE-SPINE.md`. Then attack:
> 1. **Hunt the missed divergence.** Walk the units one level down and try to construct two that, each obeying the spine to the letter, still build incompatibly — different shapes for shared data, two owners of the same entity, incompatible contracts across a boundary, conflicting state-mutation paths. Every pair you can construct is a hole the spine must close.
> 2. **Break the rules.** For each `AD-n`, try to satisfy its `Rule` while still causing the divergence its `Prevents` claims to stop. If you can, the Rule is not enforceable and the AD is theater.
> 3. **Probe the deferrals.** For each item under `Deferred`, check it is genuinely safe to defer — that two units could not diverge on it. If they could, it was deferred wrongly and belongs in the spine.
> 4. **Stress the seam** (cross-team / brownfield) — where the spine binds to existing services or team boundaries, find the integration or ownership assumption that does not actually hold.
>
> Report only real holes, each as: the two divergent builds you constructed, the spine location that should have prevented it, and the minimal fix (a new `AD-n`, a tightened `Rule`, or a deferral pulled back in). Do not restate what the spine got right; a confirmed hole is High or Critical severity. Write your review to `{doc_workspace}/review-divergence-hunter.md`; return ONLY a compact summary (hole count by severity, the sharpest one, file path).
## Synthesis pipeline
Once every selected reviewer has returned, the parent consolidates one markdown report. **Do not skip under Validate intent** — it is the persistent artifact the user opens.
1. Read every `{doc_workspace}/review-*.md`.
2. Get the grade from the script — don't derive it by hand. Pipe the rubric walker's per-dimension verdicts and each reviewer's severity counts to `python3 {skill-root}/scripts/grade_spine.py`; it returns `grade`, `severity_totals`, and the deciding `reason`. Payload shape: `{"dimensions": {"consistency": "strong", ...}, "reviewers": [{"slug": "rubric", "severity": {"critical": 0, "high": 1, ...}}, ...]}`.
3. Write `{doc_workspace}/validation-report.md`:
```markdown
# Architecture Spine Validation — {name}
- **Spine:** `{path}`
- **Altitude:** {system | epic}
- **Run at:** {ISO timestamp}
- **Grade:** {Excellent | Good | Fair | Poor}
## Overall verdict
{synthesis paragraph; add a second paragraph if extra reviewers materially shift the picture}
## Dimension verdicts
- Consistency coverage — {verdict}
- Leanness & altitude — {verdict}
- (etc. for each assessed dimension)
## Findings by severity
### Critical (n)
**[Dimension or Reviewer]** — Title (§ location)
{Note}
Fix: {suggested fix}
### High (n)
...
### Medium (n)
...
### Low (n)
...
## Reviewer files
- `review-rubric.md`
- (any extra `review-{slug}.md`)
```
Re-running validation overwrites the report in place; individual `review-*.md` files are preserved for drill-in.
## Close
Surface the report path. Always offer to roll findings into an Update.

View File

@ -1,92 +0,0 @@
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.10"
# ///
"""grade-spine — derive a validation grade deterministically from reviewer output.
The grade is a pure function of (per-dimension rubric verdicts, summed severity counts).
An LLM re-deriving that threshold ladder by hand every run can drift and miscount; a
script gives the same input the same grade every time. The synthesis prompt keeps the
judgment the verdict paragraph and hands the mechanical count-and-map here.
Input is JSON on stdin (or --input FILE):
{
"dimensions": {"consistency": "strong", "leanness": "thin", ...},
"reviewers": [{"slug": "rubric", "severity": {"critical": 0, "high": 1, "medium": 2, "low": 0}},
{"slug": "divergence-hunter", "severity": {"high": 1}}]
}
reviewers[].severity counts are summed; a bare top-level "severity" dict is accepted as an
alternative to a single-reviewer list. Output is JSON on stdout:
{"grade": "Fair", "severity_totals": {...}, "thin": 1, "broken": 0, "reason": "..."}
Grade ladder (most-severe wins):
Poor any broken dimension OR any critical finding
Fair any high finding OR two-plus thin dimensions
Good exactly one thin dimension, no high/critical
Excellent all dimensions strong/adequate, no high/critical
"""
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
SEVERITIES = ("critical", "high", "medium", "low")
def sum_severity(payload: dict) -> dict:
"""Sum severity counts across reviewers[], or fall back to a bare top-level `severity`."""
totals = {k: 0 for k in SEVERITIES}
reviewers = payload.get("reviewers")
sources = [r.get("severity") or {} for r in reviewers] if reviewers else [payload.get("severity") or {}]
for src in sources:
for k, v in src.items():
if k in totals:
totals[k] += int(v)
return totals
def grade(dimensions: dict, severity: dict) -> dict:
sev = {k: int(severity.get(k, 0)) for k in SEVERITIES}
verdicts = [str(v).strip().lower() for v in (dimensions or {}).values()]
broken = sum(1 for v in verdicts if v == "broken")
thin = sum(1 for v in verdicts if v == "thin")
if sev["critical"] > 0 or broken > 0:
g, reason = "Poor", "any critical finding or broken dimension caps the grade at Poor"
elif sev["high"] > 0 or thin >= 2:
g, reason = "Fair", "a high finding or two-plus thin dimensions caps the grade at Fair"
elif thin == 1:
g, reason = "Good", "one thin dimension, no high/critical"
else:
g, reason = "Excellent", "all dimensions strong/adequate, no high/critical"
return {"grade": g, "severity_totals": sev, "thin": thin, "broken": broken, "reason": reason}
def main(argv: list[str] | None = None) -> int:
ap = argparse.ArgumentParser(description="Derive an architecture-spine validation grade from reviewer output.")
ap.add_argument("-i", "--input", help="read the JSON payload from this file instead of stdin")
ap.add_argument("-o", "--output", help="write JSON here instead of stdout")
args = ap.parse_args(argv)
raw = Path(args.input).read_text(encoding="utf-8") if args.input else sys.stdin.read()
try:
payload = json.loads(raw)
except json.JSONDecodeError as e:
print(json.dumps({"error": f"invalid JSON input: {e}"}), file=sys.stderr)
return 2
result = grade(payload.get("dimensions", {}), sum_severity(payload))
out = json.dumps(result, indent=2)
if args.output:
Path(args.output).write_text(out + "\n", encoding="utf-8")
else:
print(out)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -1,74 +0,0 @@
# /// script
# requires-python = ">=3.10"
# dependencies = ["pytest>=8.0"]
# ///
"""Tests for grade_spine.py. Run: uv run --with pytest pytest scripts/tests/test_grade_spine.py
The grade is a pure function of dimension verdicts and summed severity counts; each test
pins one branch of the ladder, plus the most-severe-wins precedence and the reviewer sum.
"""
import importlib.util
import sys
from pathlib import Path
import pytest
_SPEC = importlib.util.spec_from_file_location(
"grade_spine", Path(__file__).resolve().parent.parent / "grade_spine.py"
)
grade_spine = importlib.util.module_from_spec(_SPEC)
sys.modules["grade_spine"] = grade_spine
_SPEC.loader.exec_module(grade_spine)
grade = grade_spine.grade
sum_severity = grade_spine.sum_severity
ALL_STRONG = {"consistency": "strong", "leanness": "adequate", "decisions": "strong"}
def test_excellent_all_strong_no_findings():
assert grade(ALL_STRONG, {})["grade"] == "Excellent"
def test_good_one_thin():
assert grade({"a": "strong", "b": "thin"}, {})["grade"] == "Good"
def test_fair_two_thin():
assert grade({"a": "thin", "b": "thin"}, {})["grade"] == "Fair"
def test_fair_any_high():
assert grade(ALL_STRONG, {"high": 1})["grade"] == "Fair"
def test_poor_any_critical():
assert grade(ALL_STRONG, {"critical": 1})["grade"] == "Poor"
def test_poor_broken_dimension():
assert grade({"a": "strong", "b": "broken"}, {})["grade"] == "Poor"
def test_critical_outranks_high_and_thin():
assert grade({"a": "thin"}, {"critical": 1, "high": 3})["grade"] == "Poor"
def test_medium_and_low_do_not_lower_grade():
assert grade(ALL_STRONG, {"medium": 5, "low": 9})["grade"] == "Excellent"
def test_sum_severity_across_reviewers():
payload = {"reviewers": [
{"slug": "rubric", "severity": {"high": 1}},
{"slug": "divergence", "severity": {"high": 2, "critical": 1}},
]}
assert sum_severity(payload) == {"critical": 1, "high": 3, "medium": 0, "low": 0}
def test_sum_severity_bare_fallback():
assert sum_severity({"severity": {"medium": 2}}) == {"critical": 0, "high": 0, "medium": 2, "low": 0}
if __name__ == "__main__":
sys.exit(pytest.main([__file__, "-q"]))