feat: add bmad-checkpoint skill for guided human change review
Copies the av-human-review experiment skill into BMAD-METHOD as bmad-checkpoint, following established multi-step skill conventions (SKILL.md → workflow.md → step chain). Registered in module-help.csv under 4-implementation phase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
2c5436f672
commit
92a7dc203c
|
|
@ -0,0 +1,6 @@
|
|||
---
|
||||
name: bmad-checkpoint
|
||||
description: 'Guided walkthrough of a change, from purpose and context into details. Use when the user says "walk me through this change", "human review", or "review walkthrough".'
|
||||
---
|
||||
|
||||
Follow the instructions in ./workflow.md.
|
||||
|
|
@ -0,0 +1,119 @@
|
|||
# Step 1: Orientation
|
||||
|
||||
Display: `[Orientation] → Walkthrough → Detail Pass → Testing`
|
||||
|
||||
## RULES
|
||||
|
||||
- The conversation context before this skill was triggered IS your starting point — not a blank slate.
|
||||
- The user may have already mentioned a spec file, a commit, a branch, a PR, or described the change. Use what they gave you.
|
||||
- For PR references, resolve to a commit or branch using whatever tools are available (e.g., `gh pr view` for GitHub). If resolution fails, ask for a SHA or branch.
|
||||
|
||||
## FIND THE CHANGE
|
||||
|
||||
1. **What did the user already tell you?** Look at the conversation for a spec path, commit ref, branch, PR, or description of the change.
|
||||
2. **Fill in the other half.** If they gave a spec, look for `baseline_commit` in its frontmatter. If they gave a commit, check `{implementation_artifacts}` for a spec whose `baseline_commit` matches. If they gave both, use both.
|
||||
3. **If they gave nothing**, check current branch and HEAD. Confirm: "I see HEAD is `<short-sha>` on `<branch>` — is this the change you want to review?"
|
||||
4. **If you still can't identify a change**, ask:
|
||||
- What changed and why?
|
||||
- Which commit, branch, or PR should I look at?
|
||||
- Do you have a spec, bug report, or anything else that explains what this change is supposed to do?
|
||||
|
||||
If after 3 exchanges you still can't identify a change, HALT.
|
||||
|
||||
## DETERMINE WHAT YOU HAVE
|
||||
|
||||
Set `change_type` based on how the user referred to the change:
|
||||
|
||||
| User referenced | `change_type` |
|
||||
|---|---|
|
||||
| A pull request | `PR` |
|
||||
| A commit | `commit` |
|
||||
| A branch | `branch` |
|
||||
| A description (e.g. "the auth refactor") | use their words (e.g. `auth refactor`) |
|
||||
| Nothing specific / ambiguous | `change` |
|
||||
|
||||
| What you have | Review Mode | Intent Source |
|
||||
|---|---|---|
|
||||
| Spec with `## Suggested Review Order` | `full-trail` | Spec's Intent section |
|
||||
| Spec without trail | `spec-only` | Spec's Intent section |
|
||||
| Commit ref, no spec | `bare-commit` | Commit message |
|
||||
|
||||
**Terse commit messages** (under 10 words): Scan the diff for the primary change pattern and draft a one-sentence intent. Confirm with the user before proceeding.
|
||||
|
||||
## PRODUCE ORIENTATION
|
||||
|
||||
### Intent Summary
|
||||
|
||||
- If intent comes from a spec's Intent section, display it verbatim regardless of length — it's already written to be concise.
|
||||
- For other sources (commit messages, bug reports, user description): if ≤200 tokens, display verbatim. If longer, distill to ≤200 tokens and link to the full source.
|
||||
- Format: `> **Intent:** {summary}`
|
||||
|
||||
### Surface Area Stats
|
||||
|
||||
Compute from `git diff --stat` against the appropriate baseline:
|
||||
|
||||
- **With spec**: Use `baseline_commit` from frontmatter. If missing, diff `HEAD~1..HEAD` and tell the user stats reflect only the latest commit.
|
||||
- **Bare commit**: Diff against parent (`commit~1..commit`). For merge commits, use `--first-parent`. For initial commits, diff against the empty tree (`4b825dc..commit`).
|
||||
|
||||
Display as:
|
||||
|
||||
```
|
||||
N files changed · M modules touched · ~L lines of logic · B boundary crossings · P new public interfaces
|
||||
```
|
||||
|
||||
- **Files changed**: from `git diff --stat`
|
||||
- **Modules touched**: distinct top-level directories with changes
|
||||
- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. `~` because approximate.
|
||||
- **Boundary crossings**: changes spanning more than one top-level module. `0` if single module.
|
||||
- **New public interfaces**: new exports, endpoints, public methods. `0` if none.
|
||||
|
||||
If git is unavailable or a command fails, show what you can and note what's missing.
|
||||
|
||||
### Present
|
||||
|
||||
```
|
||||
[Orientation] → Walkthrough → Detail Pass → Testing
|
||||
|
||||
> **Intent:** {intent_summary}
|
||||
|
||||
{stats line}
|
||||
```
|
||||
|
||||
## FALLBACK TRAIL GENERATION
|
||||
|
||||
**Skip this section if review mode is `full-trail`.**
|
||||
|
||||
When no Suggested Review Order exists, generate one from the diff and codebase context. A generated trail is lower quality than an author-produced one, but far better than none.
|
||||
|
||||
1. Get the full diff against the appropriate baseline (same rules as Surface Area Stats).
|
||||
2. Read changed files in full — not just diff hunks. Surrounding code reveals intent that hunks alone miss.
|
||||
3. If a spec exists, use its Intent section to anchor concern identification.
|
||||
4. Identify 2–5 concerns: cohesive design intents that each explain *why* behind a cluster of changes. Prefer functional groupings and architectural boundaries over file-level splits. A single-concern change is fine — don't invent groupings.
|
||||
5. For each concern, select 1–4 `path:line` stops — locations where the concern is most visible. Prefer entry points, decision points, and boundary crossings over mechanical changes.
|
||||
6. Lead with the entry point — the highest-leverage stop a reviewer should see first. Inside each concern, order stops so each builds on the previous. End with peripherals (tests, config, types).
|
||||
7. Format each stop as a workspace-relative markdown link — basename and line as link text, path with `#L` anchor as target:
|
||||
|
||||
```
|
||||
**{Concern name}**
|
||||
|
||||
- {one-line framing, ≤15 words}
|
||||
[`file.ts:42`](/src/path/to/file.ts#L42)
|
||||
```
|
||||
|
||||
When there is only one concern, omit the bold label — just list the stops directly.
|
||||
|
||||
Present after the orientation output:
|
||||
|
||||
```
|
||||
I built a review trail for this {change_type} (no author-produced trail was found):
|
||||
|
||||
{generated trail}
|
||||
```
|
||||
|
||||
Set review mode to `full-trail`. The generated trail is the Suggested Review Order for subsequent steps.
|
||||
|
||||
If git is unavailable or the diff cannot be retrieved, skip this section and proceed with the original review mode. Note: "Could not generate trail — git unavailable."
|
||||
|
||||
## NEXT
|
||||
|
||||
Read fully and follow `./step-02-walkthrough.md`
|
||||
|
|
@ -0,0 +1,88 @@
|
|||
# Step 2: Walkthrough
|
||||
|
||||
Display: `Orientation → [Walkthrough] → Detail Pass → Testing`
|
||||
|
||||
## RULES
|
||||
|
||||
- Organize by **concern**, not by file. A concern is a cohesive design intent — e.g., "input validation," "state management," "API contract." One file may appear under multiple concerns; one concern may span multiple files.
|
||||
- Every code reference uses clickable `path:line` format per the standing rule.
|
||||
- **Front-load then shut up.** Present the entire walkthrough in a single message. Do not ask questions mid-walkthrough. Do not pause between concerns. After presenting, go quiet — the human reads, clicks, navigates at their own pace.
|
||||
- The walkthrough activates **design judgment**, not correctness checking. Frame each concern as "here's what this change does and why" — the human evaluates whether it's the right approach for the system.
|
||||
|
||||
## BUILD THE WALKTHROUGH
|
||||
|
||||
### Identify Concerns
|
||||
|
||||
**With Suggested Review Order** (`full-trail` mode):
|
||||
|
||||
1. Read the Suggested Review Order stops from the spec (or from conversation context if generated by step-01 fallback).
|
||||
2. Resolve each stop to a file in the current repo. Output in `path:line` format per the standing rule.
|
||||
3. Read the diff to understand what each stop actually does.
|
||||
4. Group stops by concern. Stops that share a design intent belong together even if they're in different files. A stop may appear under multiple concerns if it serves multiple purposes.
|
||||
|
||||
**Without Suggested Review Order** (`spec-only` or `bare-commit` mode):
|
||||
|
||||
1. Get the diff against the appropriate baseline (same rules as step 1).
|
||||
2. Identify concerns by reading the diff for cohesive design intents:
|
||||
- Functional groupings — what user-facing behavior does each cluster of changes support?
|
||||
- Architectural layers — does the change cross boundaries (API → service → data)?
|
||||
- Design decisions — where did the author choose between alternatives?
|
||||
3. For each concern, identify the key code locations as `path:line` stops.
|
||||
|
||||
### Order for Comprehension
|
||||
|
||||
Sequence concerns top-down: start with the highest-level intent (the "what and why"), then drill into supporting implementation. Within each concern, order stops so each one builds on the previous. The reader should never encounter a reference to something they haven't seen yet.
|
||||
|
||||
If the change has a natural entry point (e.g., a new public API, a config change, a UI entry point), lead with it.
|
||||
|
||||
### Write Each Concern
|
||||
|
||||
For each concern, produce:
|
||||
|
||||
1. **Heading** — a short phrase naming the design intent (not a file name, not a module name).
|
||||
2. **Why** — 1–2 sentences: what problem this concern addresses, why this approach was chosen over alternatives. If the spec documents rejected alternatives, reference them here.
|
||||
3. **Stops** — each stop on its own line: `path:line` followed by a brief phrase (not a sentence) describing what this location does for the concern. Keep framing under 15 words per stop.
|
||||
|
||||
Target 2–5 concerns for a typical change. A single-concern change is fine — don't invent groupings. A change with more than 7 concerns is a signal the scope may be too large, but present it anyway.
|
||||
|
||||
## PRESENT
|
||||
|
||||
Output the full walkthrough as a single message with this structure:
|
||||
|
||||
```
|
||||
Orientation → [Walkthrough] → Detail Pass → Testing
|
||||
```
|
||||
|
||||
Then each concern group using this format:
|
||||
|
||||
```
|
||||
### {Concern Heading}
|
||||
|
||||
{Why — 1–2 sentences}
|
||||
|
||||
- `path:line` — {brief framing}
|
||||
- `path:line` — {brief framing}
|
||||
- ...
|
||||
```
|
||||
|
||||
End the message with:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
Take your time — click through the stops, read the diff, trace the logic. While you are reviewing, you can:
|
||||
- "run advanced elicitation on the error handling"
|
||||
- "party mode on whether this schema migration is safe"
|
||||
- or just ask anything
|
||||
|
||||
When you are done with the walkthrough:
|
||||
- **"detail pass"** — I'll surface the highest-risk spots and run code review
|
||||
- **"testing"** — I'll suggest how to observe the behavior
|
||||
- **"I've seen enough"** — tell me what to do about this {change_type} and we wrap up
|
||||
```
|
||||
|
||||
**Do NOT speak again until the human responds.**
|
||||
|
||||
## NEXT
|
||||
|
||||
When the human signals readiness for the next phase, read fully and follow `./step-03-detail-pass.md`
|
||||
|
|
@ -0,0 +1,103 @@
|
|||
# Step 3: Detail Pass
|
||||
|
||||
Display: `Orientation → Walkthrough → [Detail Pass] → Testing`
|
||||
|
||||
## RULES
|
||||
|
||||
- Every code reference uses clickable `path:line` format per the standing rule.
|
||||
- **Front-load then shut up.** Present all findings in a single message. Do not drip-feed.
|
||||
- The detail pass surfaces what the human should **think about**, not what the code got wrong. Machine hardening already handled correctness. This activates risk awareness.
|
||||
- The LLM detects risk category by pattern. The human judges significance. Do not assign severity scores or numeric rankings — ordering by blast radius (below) is sequencing for readability, not a severity judgment.
|
||||
- If no high-risk spots exist, say so explicitly. Do not invent findings.
|
||||
|
||||
## IDENTIFY RISK SPOTS
|
||||
|
||||
Scan the diff for changes touching risk-sensitive patterns. Look for 2–5 spots where a mistake would have the highest blast radius — not the most complex code, but the code where being wrong costs the most.
|
||||
|
||||
Risk categories to detect:
|
||||
|
||||
- `[auth]` — authentication, authorization, session, token, permission, access control
|
||||
- `[public API]` — new/changed endpoints, exports, public methods, interface contracts
|
||||
- `[schema]` — database migrations, schema changes, data model modifications, serialization
|
||||
- `[billing]` — payment, pricing, subscription, metering, usage tracking
|
||||
- `[infra]` — deployment, CI/CD, environment variables, config files, infrastructure
|
||||
- `[security]` — input validation, sanitization, crypto, secrets, CORS, CSP
|
||||
- `[config]` — feature flags, environment-dependent behavior, defaults
|
||||
- `[other]` — anything risk-sensitive that doesn't fit the above (e.g., concurrency, data privacy, backwards compatibility). Use a descriptive tag.
|
||||
|
||||
Sequence spots so the highest blast radius comes first (how much breaks if this is wrong), not by diff order or file order. If more than 5 spots qualify, show the top 5 and note: "N additional spots omitted — ask if you want the full list."
|
||||
|
||||
If the change has no spots matching these patterns, state: "No high-risk spots found in this change — the diff speaks for itself." Do not force findings.
|
||||
|
||||
## SURFACE MACHINE HARDENING FINDINGS
|
||||
|
||||
Check whether the spec has a `## Spec Change Log` section with entries (populated by adversarial review loops).
|
||||
|
||||
- **If entries exist:** Read them. Surface findings that are instructive for the human reviewer — not bugs that were already fixed, but decisions the review loop flagged that the human should be aware of. Format: brief summary of what was flagged and what was decided.
|
||||
- **If no entries or no spec:** Skip this section entirely. Do not mention it.
|
||||
|
||||
## PRESENT
|
||||
|
||||
Output as a single message:
|
||||
|
||||
```
|
||||
Orientation → Walkthrough → [Detail Pass] → Testing
|
||||
```
|
||||
|
||||
### Risk Spots
|
||||
|
||||
For each spot, one line:
|
||||
|
||||
```
|
||||
- `path:line` — [tag] reason-phrase
|
||||
```
|
||||
|
||||
Example:
|
||||
|
||||
```
|
||||
- `src/auth/middleware.ts:42` — [auth] New token validation bypasses rate limiter
|
||||
- `migrations/003_add_index.sql:7` — [schema] Index on high-write table, check lock behavior
|
||||
- `api/routes/billing.ts:118` — [billing] Metering calculation changed, verify idempotency
|
||||
```
|
||||
|
||||
### Machine Hardening (only if findings exist)
|
||||
|
||||
```
|
||||
### Machine Hardening
|
||||
|
||||
- Finding summary — what was flagged, what was decided
|
||||
- ...
|
||||
```
|
||||
|
||||
### Closing menu
|
||||
|
||||
End the message with:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
You've seen the design and the risk landscape. From here:
|
||||
- **"dig into [area]"** — I'll deep-dive that specific area with correctness focus
|
||||
- **"testing"** — I'll suggest how to observe the behavior
|
||||
- **"I've seen enough"** — tell me what to do about this {change_type} and we wrap up
|
||||
```
|
||||
|
||||
**Do NOT speak again until the human responds.**
|
||||
|
||||
## TARGETED RE-REVIEW
|
||||
|
||||
When the human says "dig into [area]" (e.g., "dig into the auth changes", "dig into the schema migration"):
|
||||
|
||||
1. If the specified area does not map to any code in the diff, say so: "I don't see [area] in this change — did you mean something else?" Return to the closing menu.
|
||||
2. Identify all code locations in the diff relevant to the specified area.
|
||||
3. Read each location in full context (not just the diff hunk — read surrounding code).
|
||||
4. Shift to **correctness mode**: trace edge cases, check boundary conditions, verify error handling, look for off-by-one errors, race conditions, resource leaks.
|
||||
5. Present findings as a compact list — each finding is `path:line` + what you found + why it matters.
|
||||
6. If nothing concerning is found, say so: "Looked closely at [area] — nothing concerning. The implementation is solid."
|
||||
7. After presenting, show only the closing menu (not the full risk spots list again).
|
||||
|
||||
The human can trigger multiple targeted re-reviews. Each time, present new findings and the closing menu only.
|
||||
|
||||
## NEXT
|
||||
|
||||
When the human signals readiness for testing, read fully and follow `./step-04-testing.md`
|
||||
|
|
@ -0,0 +1,95 @@
|
|||
# Step 4: Testing
|
||||
|
||||
Display: `Orientation → Walkthrough → Detail Pass → [Testing]`
|
||||
|
||||
## RULES
|
||||
|
||||
- Every code reference uses clickable `path:line` format per the standing rule.
|
||||
- **Front-load then shut up.** Present all suggestions in a single message. Do not drip-feed.
|
||||
- This is **experiential**, not analytical. The detail pass asked "did you think about X?" — this says "you could see X with your own eyes."
|
||||
- Do not prescribe. The human decides whether observing the behavior is worth their time. Frame suggestions as options, not obligations.
|
||||
- Do not duplicate CI, test suites, or automated checks. Assume those exist and work. This is about manual observation — the kind of confidence-building no automated test provides.
|
||||
- If the change has no user-visible behavior, say so explicitly. Do not invent observations.
|
||||
|
||||
## IDENTIFY OBSERVABLE BEHAVIOR
|
||||
|
||||
Scan the diff and spec for changes that produce behavior a human could directly observe. Categories to look for:
|
||||
|
||||
- **UI changes** — new screens, modified layouts, changed interactions, error states
|
||||
- **CLI/terminal output** — new commands, changed output, new flags or options
|
||||
- **API responses** — new endpoints, changed payloads, different status codes
|
||||
- **State changes** — database records, file system artifacts, config effects
|
||||
- **Error paths** — bad input, missing dependencies, edge conditions
|
||||
|
||||
For each observable behavior, determine:
|
||||
|
||||
1. **What to do** — the specific action (command to run, button to click, request to send)
|
||||
2. **What to expect** — the observable result that confirms the change works
|
||||
3. **Why bother** — one phrase connecting this observation to the change's intent (omit if obvious from context)
|
||||
|
||||
Target 2–5 suggestions for a typical change. If more than 5 qualify, prioritize by how much confidence the observation provides relative to effort. A change with zero observable behavior is fine — do not pad with trivial observations.
|
||||
|
||||
## PRESENT
|
||||
|
||||
Output as a single message:
|
||||
|
||||
```
|
||||
Orientation → Walkthrough → Detail Pass → [Testing]
|
||||
```
|
||||
|
||||
Then the testing suggestions using this format:
|
||||
|
||||
```
|
||||
### How to See It Working
|
||||
|
||||
**{Brief description}**
|
||||
Do: {specific action}
|
||||
Expect: {observable result}
|
||||
|
||||
**{Brief description}**
|
||||
Do: {specific action}
|
||||
Expect: {observable result}
|
||||
```
|
||||
|
||||
Include code blocks for commands or requests where helpful.
|
||||
|
||||
If the change has no observable behavior, replace the suggestions with:
|
||||
|
||||
```
|
||||
### How to See It Working
|
||||
|
||||
This change is internal — no user-visible behavior to observe. The diff and tests tell the full story.
|
||||
```
|
||||
|
||||
### Closing
|
||||
|
||||
End the message with:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
You've seen the change and how to verify it. From here:
|
||||
- **"I've seen enough"** — we wrap up this {change_type}
|
||||
- or just ask anything
|
||||
```
|
||||
|
||||
**Do NOT speak again until the human responds.**
|
||||
|
||||
## WRAP-UP
|
||||
|
||||
When the human says "I've seen enough" or signals they're done:
|
||||
|
||||
```
|
||||
---
|
||||
|
||||
Review complete. What's the call on this {change_type}?
|
||||
- **Approve** — ship it
|
||||
- **Request changes** — I'll help draft feedback
|
||||
- **Discuss** — something's still on your mind
|
||||
```
|
||||
|
||||
Act on their decision:
|
||||
|
||||
- **Approve**: Acknowledge briefly. If reviewing a PR, offer to approve via `gh pr review --approve` — but confirm with the human before executing, since this is a visible action on a shared resource.
|
||||
- **Request changes**: Ask what needs changing. Help draft specific, actionable feedback tied to `path:line` locations.
|
||||
- **Discuss**: Open conversation — answer questions, explore concerns, dig into any aspect.
|
||||
|
|
@ -0,0 +1,24 @@
|
|||
---
|
||||
main_config: '{project-root}/_bmad/bmm/config.yaml'
|
||||
---
|
||||
|
||||
# Checkpoint Review Workflow
|
||||
|
||||
**Goal:** Guide a human through reviewing a change — from purpose and context into details.
|
||||
|
||||
You are assisting the user in reviewing a change.
|
||||
|
||||
**Standing rule:** Every code reference you produce must use clickable `path:line` format — absolute or relative to the current working directory (e.g., `src/auth/middleware.ts:42`).
|
||||
|
||||
## INITIALIZATION
|
||||
|
||||
Load and read full config from `{main_config}` and resolve:
|
||||
|
||||
- `implementation_artifacts`
|
||||
- `communication_language`
|
||||
|
||||
YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`.
|
||||
|
||||
## FIRST STEP
|
||||
|
||||
Read fully and follow `./step-01-orientation.md` to begin.
|
||||
|
|
@ -27,5 +27,6 @@ BMad Method,bmad-create-story,Create Story,CS,"Story cycle start: Prepare first
|
|||
BMad Method,bmad-create-story,Validate Story,VS,Validates story readiness and completeness before development work begins.,validate,,4-implementation,bmad-create-story:create,bmad-dev-story,false,implementation_artifacts,story validation report
|
||||
BMad Method,bmad-dev-story,Dev Story,DS,Story cycle: Execute story implementation tasks and tests then CR then back to DS if fixes needed.,,4-implementation,bmad-create-story:validate,,true,,
|
||||
BMad Method,bmad-code-review,Code Review,CR,Story cycle: If issues back to DS if approved then next CS or ER if epic complete.,,4-implementation,bmad-dev-story,,false,,
|
||||
BMad Method,bmad-checkpoint,Checkpoint,CK,Guided walkthrough of a change from purpose and context into details. Use for human review of commits branches or PRs.,,4-implementation,,,false,,
|
||||
BMad Method,bmad-qa-generate-e2e-tests,QA Automation Test,QA,Generate automated API and E2E tests for implemented code. NOT for code review or story validation — use CR for that.,,4-implementation,bmad-dev-story,,false,implementation_artifacts,test suite
|
||||
BMad Method,bmad-retrospective,Retrospective,ER,Optional at epic end: Review completed work lessons learned and next epic or if major issues consider CC.,,4-implementation,bmad-code-review,,false,implementation_artifacts,retrospective
|
||||
|
|
|
|||
|
Can't render this file because it has a wrong number of fields in line 2.
|
Loading…
Reference in New Issue