feat: add bmad-checkpoint skill for guided human change review

Copies the av-human-review experiment skill into BMAD-METHOD as bmad-checkpoint, following established multi-step skill conventions (SKILL.md → workflow.md → step chain). Registered in module-help.csv under 4-implementation phase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 07:40:58 -06:00 · 2026-03-27 07:40:58 -06:00 · 92a7dc203c
parent 2c5436f672
commit 92a7dc203c
7 changed files with 436 additions and 0 deletions
--- a/src/bmm-skills/4-implementation/bmad-checkpoint/SKILL.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint/SKILL.md
@ -0,0 +1,6 @@
+---
+name: bmad-checkpoint
+description: 'Guided walkthrough of a change, from purpose and context into details. Use when the user says "walk me through this change", "human review", or "review walkthrough".'
+---
+
+Follow the instructions in ./workflow.md.
--- a/src/bmm-skills/4-implementation/bmad-checkpoint/step-01-orientation.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-01-orientation.md
@ -0,0 +1,119 @@
+# Step 1: Orientation
+
+Display: `[Orientation] → Walkthrough → Detail Pass → Testing`
+
+## RULES
+
+- The conversation context before this skill was triggered IS your starting point — not a blank slate.
+- The user may have already mentioned a spec file, a commit, a branch, a PR, or described the change. Use what they gave you.
+- For PR references, resolve to a commit or branch using whatever tools are available (e.g., `gh pr view` for GitHub). If resolution fails, ask for a SHA or branch.
+
+## FIND THE CHANGE
+
+1. **What did the user already tell you?** Look at the conversation for a spec path, commit ref, branch, PR, or description of the change.
+2. **Fill in the other half.** If they gave a spec, look for `baseline_commit` in its frontmatter. If they gave a commit, check `{implementation_artifacts}` for a spec whose `baseline_commit` matches. If they gave both, use both.
+3. **If they gave nothing**, check current branch and HEAD. Confirm: "I see HEAD is `<short-sha>` on `<branch>` — is this the change you want to review?"
+4. **If you still can't identify a change**, ask:
+   - What changed and why?
+   - Which commit, branch, or PR should I look at?
+   - Do you have a spec, bug report, or anything else that explains what this change is supposed to do?
+
+   If after 3 exchanges you still can't identify a change, HALT.
+
+## DETERMINE WHAT YOU HAVE
+
+Set `change_type` based on how the user referred to the change:
+
+| User referenced | `change_type` |
+|---|---|
+| A pull request | `PR` |
+| A commit | `commit` |
+| A branch | `branch` |
+| A description (e.g. "the auth refactor") | use their words (e.g. `auth refactor`) |
+| Nothing specific / ambiguous | `change` |
+
+| What you have | Review Mode | Intent Source |
+|---|---|---|
+| Spec with `## Suggested Review Order` | `full-trail` | Spec's Intent section |
+| Spec without trail | `spec-only` | Spec's Intent section |
+| Commit ref, no spec | `bare-commit` | Commit message |
+
+**Terse commit messages** (under 10 words): Scan the diff for the primary change pattern and draft a one-sentence intent. Confirm with the user before proceeding.
+
+## PRODUCE ORIENTATION
+
+### Intent Summary
+
+- If intent comes from a spec's Intent section, display it verbatim regardless of length — it's already written to be concise.
+- For other sources (commit messages, bug reports, user description): if ≤200 tokens, display verbatim. If longer, distill to ≤200 tokens and link to the full source.
+- Format: `> **Intent:** {summary}`
+
+### Surface Area Stats
+
+Compute from `git diff --stat` against the appropriate baseline:
+
+- **With spec**: Use `baseline_commit` from frontmatter. If missing, diff `HEAD~1..HEAD` and tell the user stats reflect only the latest commit.
+- **Bare commit**: Diff against parent (`commit~1..commit`). For merge commits, use `--first-parent`. For initial commits, diff against the empty tree (`4b825dc..commit`).
+
+Display as:
+
+```
+N files changed · M modules touched · ~L lines of logic · B boundary crossings · P new public interfaces
+```
+
+- **Files changed**: from `git diff --stat`
+- **Modules touched**: distinct top-level directories with changes
+- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. `~` because approximate.
+- **Boundary crossings**: changes spanning more than one top-level module. `0` if single module.
+- **New public interfaces**: new exports, endpoints, public methods. `0` if none.
+
+If git is unavailable or a command fails, show what you can and note what's missing.
+
+### Present
+
+```
+[Orientation] → Walkthrough → Detail Pass → Testing
+
+> **Intent:** {intent_summary}
+
+{stats line}
+```
+
+## FALLBACK TRAIL GENERATION
+
+**Skip this section if review mode is `full-trail`.**
+
+When no Suggested Review Order exists, generate one from the diff and codebase context. A generated trail is lower quality than an author-produced one, but far better than none.
+
+1. Get the full diff against the appropriate baseline (same rules as Surface Area Stats).
+2. Read changed files in full — not just diff hunks. Surrounding code reveals intent that hunks alone miss.
+3. If a spec exists, use its Intent section to anchor concern identification.
+4. Identify 2–5 concerns: cohesive design intents that each explain *why* behind a cluster of changes. Prefer functional groupings and architectural boundaries over file-level splits. A single-concern change is fine — don't invent groupings.
+5. For each concern, select 1–4 `path:line` stops — locations where the concern is most visible. Prefer entry points, decision points, and boundary crossings over mechanical changes.
+6. Lead with the entry point — the highest-leverage stop a reviewer should see first. Inside each concern, order stops so each builds on the previous. End with peripherals (tests, config, types).
+7. Format each stop as a workspace-relative markdown link — basename and line as link text, path with `#L` anchor as target:
+
+```
+**{Concern name}**
+
+- {one-line framing, ≤15 words}
+  [`file.ts:42`](/src/path/to/file.ts#L42)
+```
+
+When there is only one concern, omit the bold label — just list the stops directly.
+
+Present after the orientation output:
+
+```
+I built a review trail for this {change_type} (no author-produced trail was found):
+
+{generated trail}
+```
+
+Set review mode to `full-trail`. The generated trail is the Suggested Review Order for subsequent steps.
+
+If git is unavailable or the diff cannot be retrieved, skip this section and proceed with the original review mode. Note: "Could not generate trail — git unavailable."
+
+## NEXT
+
+Read fully and follow `./step-02-walkthrough.md`
--- a/src/bmm-skills/4-implementation/bmad-checkpoint/step-02-walkthrough.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-02-walkthrough.md
@ -0,0 +1,88 @@
+# Step 2: Walkthrough
+
+Display: `Orientation → [Walkthrough] → Detail Pass → Testing`
+
+## RULES
+
+- Organize by **concern**, not by file. A concern is a cohesive design intent — e.g., "input validation," "state management," "API contract." One file may appear under multiple concerns; one concern may span multiple files.
+- Every code reference uses clickable `path:line` format per the standing rule.
+- **Front-load then shut up.** Present the entire walkthrough in a single message. Do not ask questions mid-walkthrough. Do not pause between concerns. After presenting, go quiet — the human reads, clicks, navigates at their own pace.
+- The walkthrough activates **design judgment**, not correctness checking. Frame each concern as "here's what this change does and why" — the human evaluates whether it's the right approach for the system.
+
+## BUILD THE WALKTHROUGH
+
+### Identify Concerns
+
+**With Suggested Review Order** (`full-trail` mode):
+
+1. Read the Suggested Review Order stops from the spec (or from conversation context if generated by step-01 fallback).
+2. Resolve each stop to a file in the current repo. Output in `path:line` format per the standing rule.
+3. Read the diff to understand what each stop actually does.
+4. Group stops by concern. Stops that share a design intent belong together even if they're in different files. A stop may appear under multiple concerns if it serves multiple purposes.
+
+**Without Suggested Review Order** (`spec-only` or `bare-commit` mode):
+
+1. Get the diff against the appropriate baseline (same rules as step 1).
+2. Identify concerns by reading the diff for cohesive design intents:
+   - Functional groupings — what user-facing behavior does each cluster of changes support?
+   - Architectural layers — does the change cross boundaries (API → service → data)?
+   - Design decisions — where did the author choose between alternatives?
+3. For each concern, identify the key code locations as `path:line` stops.
+
+### Order for Comprehension
+
+Sequence concerns top-down: start with the highest-level intent (the "what and why"), then drill into supporting implementation. Within each concern, order stops so each one builds on the previous. The reader should never encounter a reference to something they haven't seen yet.
+
+If the change has a natural entry point (e.g., a new public API, a config change, a UI entry point), lead with it.
+
+### Write Each Concern
+
+For each concern, produce:
+
+1. **Heading** — a short phrase naming the design intent (not a file name, not a module name).
+2. **Why** — 1–2 sentences: what problem this concern addresses, why this approach was chosen over alternatives. If the spec documents rejected alternatives, reference them here.
+3. **Stops** — each stop on its own line: `path:line` followed by a brief phrase (not a sentence) describing what this location does for the concern. Keep framing under 15 words per stop.
+
+Target 2–5 concerns for a typical change. A single-concern change is fine — don't invent groupings. A change with more than 7 concerns is a signal the scope may be too large, but present it anyway.
+
+## PRESENT
+
+Output the full walkthrough as a single message with this structure:
+
+```
+Orientation → [Walkthrough] → Detail Pass → Testing
+```
+
+Then each concern group using this format:
+
+```
+### {Concern Heading}
+
+{Why — 1–2 sentences}
+
+- `path:line` — {brief framing}
+- `path:line` — {brief framing}
+- ...
+```
+
+End the message with:
+
+```
+---
+
+Take your time — click through the stops, read the diff, trace the logic. While you are reviewing, you can:
+- "run advanced elicitation on the error handling"
+- "party mode on whether this schema migration is safe"
+- or just ask anything
+
+When you are done with the walkthrough:
+- **"detail pass"** — I'll surface the highest-risk spots and run code review
+- **"testing"** — I'll suggest how to observe the behavior
+- **"I've seen enough"** — tell me what to do about this {change_type} and we wrap up
+```
+
+**Do NOT speak again until the human responds.**
+
+## NEXT
+
+When the human signals readiness for the next phase, read fully and follow `./step-03-detail-pass.md`
--- a/src/bmm-skills/4-implementation/bmad-checkpoint/step-03-detail-pass.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-03-detail-pass.md
@ -0,0 +1,103 @@
+# Step 3: Detail Pass
+
+Display: `Orientation → Walkthrough → [Detail Pass] → Testing`
+
+## RULES
+
+- Every code reference uses clickable `path:line` format per the standing rule.
+- **Front-load then shut up.** Present all findings in a single message. Do not drip-feed.
+- The detail pass surfaces what the human should **think about**, not what the code got wrong. Machine hardening already handled correctness. This activates risk awareness.
+- The LLM detects risk category by pattern. The human judges significance. Do not assign severity scores or numeric rankings — ordering by blast radius (below) is sequencing for readability, not a severity judgment.
+- If no high-risk spots exist, say so explicitly. Do not invent findings.
+
+## IDENTIFY RISK SPOTS
+
+Scan the diff for changes touching risk-sensitive patterns. Look for 2–5 spots where a mistake would have the highest blast radius — not the most complex code, but the code where being wrong costs the most.
+
+Risk categories to detect:
+
+- `[auth]` — authentication, authorization, session, token, permission, access control
+- `[public API]` — new/changed endpoints, exports, public methods, interface contracts
+- `[schema]` — database migrations, schema changes, data model modifications, serialization
+- `[billing]` — payment, pricing, subscription, metering, usage tracking
+- `[infra]` — deployment, CI/CD, environment variables, config files, infrastructure
+- `[security]` — input validation, sanitization, crypto, secrets, CORS, CSP
+- `[config]` — feature flags, environment-dependent behavior, defaults
+- `[other]` — anything risk-sensitive that doesn't fit the above (e.g., concurrency, data privacy, backwards compatibility). Use a descriptive tag.
+
+Sequence spots so the highest blast radius comes first (how much breaks if this is wrong), not by diff order or file order. If more than 5 spots qualify, show the top 5 and note: "N additional spots omitted — ask if you want the full list."
+
+If the change has no spots matching these patterns, state: "No high-risk spots found in this change — the diff speaks for itself." Do not force findings.
+
+## SURFACE MACHINE HARDENING FINDINGS
+
+Check whether the spec has a `## Spec Change Log` section with entries (populated by adversarial review loops).
+
+- **If entries exist:** Read them. Surface findings that are instructive for the human reviewer — not bugs that were already fixed, but decisions the review loop flagged that the human should be aware of. Format: brief summary of what was flagged and what was decided.
+- **If no entries or no spec:** Skip this section entirely. Do not mention it.
+
+## PRESENT
+
+Output as a single message:
+
+```
+Orientation → Walkthrough → [Detail Pass] → Testing
+```
+
+### Risk Spots
+
+For each spot, one line:
+
+```
+- `path:line` — [tag] reason-phrase
+```
+
+Example:
+
+```
+- `src/auth/middleware.ts:42` — [auth] New token validation bypasses rate limiter
+- `migrations/003_add_index.sql:7` — [schema] Index on high-write table, check lock behavior
+- `api/routes/billing.ts:118` — [billing] Metering calculation changed, verify idempotency
+```
+
+### Machine Hardening (only if findings exist)
+
+```
+### Machine Hardening
+
+- Finding summary — what was flagged, what was decided
+- ...
+```
+
+### Closing menu
+
+End the message with:
+
+```
+---
+
+You've seen the design and the risk landscape. From here:
+- **"dig into [area]"** — I'll deep-dive that specific area with correctness focus
+- **"testing"** — I'll suggest how to observe the behavior
+- **"I've seen enough"** — tell me what to do about this {change_type} and we wrap up
+```
+
+**Do NOT speak again until the human responds.**
+
+## TARGETED RE-REVIEW
+
+When the human says "dig into [area]" (e.g., "dig into the auth changes", "dig into the schema migration"):
+
+1. If the specified area does not map to any code in the diff, say so: "I don't see [area] in this change — did you mean something else?" Return to the closing menu.
+2. Identify all code locations in the diff relevant to the specified area.
+3. Read each location in full context (not just the diff hunk — read surrounding code).
+4. Shift to **correctness mode**: trace edge cases, check boundary conditions, verify error handling, look for off-by-one errors, race conditions, resource leaks.
+5. Present findings as a compact list — each finding is `path:line` + what you found + why it matters.
+6. If nothing concerning is found, say so: "Looked closely at [area] — nothing concerning. The implementation is solid."
+7. After presenting, show only the closing menu (not the full risk spots list again).
+
+The human can trigger multiple targeted re-reviews. Each time, present new findings and the closing menu only.
+
+## NEXT
+
+When the human signals readiness for testing, read fully and follow `./step-04-testing.md`
--- a/src/bmm-skills/4-implementation/bmad-checkpoint/step-04-testing.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-04-testing.md
@ -0,0 +1,95 @@
+# Step 4: Testing
+
+Display: `Orientation → Walkthrough → Detail Pass → [Testing]`
+
+## RULES
+
+- Every code reference uses clickable `path:line` format per the standing rule.
+- **Front-load then shut up.** Present all suggestions in a single message. Do not drip-feed.
+- This is **experiential**, not analytical. The detail pass asked "did you think about X?" — this says "you could see X with your own eyes."
+- Do not prescribe. The human decides whether observing the behavior is worth their time. Frame suggestions as options, not obligations.
+- Do not duplicate CI, test suites, or automated checks. Assume those exist and work. This is about manual observation — the kind of confidence-building no automated test provides.
+- If the change has no user-visible behavior, say so explicitly. Do not invent observations.
+
+## IDENTIFY OBSERVABLE BEHAVIOR
+
+Scan the diff and spec for changes that produce behavior a human could directly observe. Categories to look for:
+
+- **UI changes** — new screens, modified layouts, changed interactions, error states
+- **CLI/terminal output** — new commands, changed output, new flags or options
+- **API responses** — new endpoints, changed payloads, different status codes
+- **State changes** — database records, file system artifacts, config effects
+- **Error paths** — bad input, missing dependencies, edge conditions
+
+For each observable behavior, determine:
+
+1. **What to do** — the specific action (command to run, button to click, request to send)
+2. **What to expect** — the observable result that confirms the change works
+3. **Why bother** — one phrase connecting this observation to the change's intent (omit if obvious from context)
+
+Target 2–5 suggestions for a typical change. If more than 5 qualify, prioritize by how much confidence the observation provides relative to effort. A change with zero observable behavior is fine — do not pad with trivial observations.
+
+## PRESENT
+
+Output as a single message:
+
+```
+Orientation → Walkthrough → Detail Pass → [Testing]
+```
+
+Then the testing suggestions using this format:
+
+```
+### How to See It Working
+
+**{Brief description}**
+Do: {specific action}
+Expect: {observable result}
+
+**{Brief description}**
+Do: {specific action}
+Expect: {observable result}
+```
+
+Include code blocks for commands or requests where helpful.
+
+If the change has no observable behavior, replace the suggestions with:
+
+```
+### How to See It Working
+
+This change is internal — no user-visible behavior to observe. The diff and tests tell the full story.
+```
+
+### Closing
+
+End the message with:
+
+```
+---
+
+You've seen the change and how to verify it. From here:
+- **"I've seen enough"** — we wrap up this {change_type}
+- or just ask anything
+```
+
+**Do NOT speak again until the human responds.**
+
+## WRAP-UP
+
+When the human says "I've seen enough" or signals they're done:
+
+```
+---
+
+Review complete. What's the call on this {change_type}?
+- **Approve** — ship it
+- **Request changes** — I'll help draft feedback
+- **Discuss** — something's still on your mind
+```
+
+Act on their decision:
+
+- **Approve**: Acknowledge briefly. If reviewing a PR, offer to approve via `gh pr review --approve` — but confirm with the human before executing, since this is a visible action on a shared resource.
+- **Request changes**: Ask what needs changing. Help draft specific, actionable feedback tied to `path:line` locations.
+- **Discuss**: Open conversation — answer questions, explore concerns, dig into any aspect.
--- a/src/bmm-skills/4-implementation/bmad-checkpoint/workflow.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint/workflow.md
@ -0,0 +1,24 @@
+---
+main_config: '{project-root}/_bmad/bmm/config.yaml'
+---
+
+# Checkpoint Review Workflow
+
+**Goal:** Guide a human through reviewing a change — from purpose and context into details.
+
+You are assisting the user in reviewing a change.
+
+**Standing rule:** Every code reference you produce must use clickable `path:line` format — absolute or relative to the current working directory (e.g., `src/auth/middleware.ts:42`).
+
+## INITIALIZATION
+
+Load and read full config from `{main_config}` and resolve:
+
+- `implementation_artifacts`
+- `communication_language`
+
+YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`.
+
+## FIRST STEP
+
+Read fully and follow `./step-01-orientation.md` to begin.
--- a/src/bmm-skills/module-help.csv
+++ b/src/bmm-skills/module-help.csv
@ -27,5 +27,6 @@ BMad Method,bmad-create-story,Create Story,CS,"Story cycle start: Prepare first
 BMad Method,bmad-create-story,Validate Story,VS,Validates story readiness and completeness before development work begins.,validate,,4-implementation,bmad-create-story:create,bmad-dev-story,false,implementation_artifacts,story validation report
 BMad Method,bmad-dev-story,Dev Story,DS,Story cycle: Execute story implementation tasks and tests then CR then back to DS if fixes needed.,,4-implementation,bmad-create-story:validate,,true,,
 BMad Method,bmad-code-review,Code Review,CR,Story cycle: If issues back to DS if approved then next CS or ER if epic complete.,,4-implementation,bmad-dev-story,,false,,
+BMad Method,bmad-checkpoint,Checkpoint,CK,Guided walkthrough of a change from purpose and context into details. Use for human review of commits branches or PRs.,,4-implementation,,,false,,
 BMad Method,bmad-qa-generate-e2e-tests,QA Automation Test,QA,Generate automated API and E2E tests for implemented code. NOT for code review or story validation — use CR for that.,,4-implementation,bmad-dev-story,,false,implementation_artifacts,test suite
 BMad Method,bmad-retrospective,Retrospective,ER,Optional at epic end: Review completed work lessons learned and next epic or if major issues consider CC.,,4-implementation,bmad-code-review,,false,implementation_artifacts,retrospective