diff --git a/src/bmm-skills/4-implementation/bmad-checkpoint/SKILL.md b/src/bmm-skills/4-implementation/bmad-checkpoint/SKILL.md new file mode 100644 index 000000000..facf95289 --- /dev/null +++ b/src/bmm-skills/4-implementation/bmad-checkpoint/SKILL.md @@ -0,0 +1,6 @@ +--- +name: bmad-checkpoint +description: 'Guided walkthrough of a change, from purpose and context into details. Use when the user says "walk me through this change", "human review", or "review walkthrough".' +--- + +Follow the instructions in ./workflow.md. diff --git a/src/bmm-skills/4-implementation/bmad-checkpoint/step-01-orientation.md b/src/bmm-skills/4-implementation/bmad-checkpoint/step-01-orientation.md new file mode 100644 index 000000000..cb4beac93 --- /dev/null +++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-01-orientation.md @@ -0,0 +1,119 @@ +# Step 1: Orientation + +Display: `[Orientation] → Walkthrough → Detail Pass → Testing` + +## RULES + +- The conversation context before this skill was triggered IS your starting point — not a blank slate. +- The user may have already mentioned a spec file, a commit, a branch, a PR, or described the change. Use what they gave you. +- For PR references, resolve to a commit or branch using whatever tools are available (e.g., `gh pr view` for GitHub). If resolution fails, ask for a SHA or branch. + +## FIND THE CHANGE + +1. **What did the user already tell you?** Look at the conversation for a spec path, commit ref, branch, PR, or description of the change. +2. **Fill in the other half.** If they gave a spec, look for `baseline_commit` in its frontmatter. If they gave a commit, check `{implementation_artifacts}` for a spec whose `baseline_commit` matches. If they gave both, use both. +3. **If they gave nothing**, check current branch and HEAD. Confirm: "I see HEAD is `` on `` — is this the change you want to review?" +4. **If you still can't identify a change**, ask: + - What changed and why? + - Which commit, branch, or PR should I look at? + - Do you have a spec, bug report, or anything else that explains what this change is supposed to do? + + If after 3 exchanges you still can't identify a change, HALT. + +## DETERMINE WHAT YOU HAVE + +Set `change_type` based on how the user referred to the change: + +| User referenced | `change_type` | +|---|---| +| A pull request | `PR` | +| A commit | `commit` | +| A branch | `branch` | +| A description (e.g. "the auth refactor") | use their words (e.g. `auth refactor`) | +| Nothing specific / ambiguous | `change` | + +| What you have | Review Mode | Intent Source | +|---|---|---| +| Spec with `## Suggested Review Order` | `full-trail` | Spec's Intent section | +| Spec without trail | `spec-only` | Spec's Intent section | +| Commit ref, no spec | `bare-commit` | Commit message | + +**Terse commit messages** (under 10 words): Scan the diff for the primary change pattern and draft a one-sentence intent. Confirm with the user before proceeding. + +## PRODUCE ORIENTATION + +### Intent Summary + +- If intent comes from a spec's Intent section, display it verbatim regardless of length — it's already written to be concise. +- For other sources (commit messages, bug reports, user description): if ≤200 tokens, display verbatim. If longer, distill to ≤200 tokens and link to the full source. +- Format: `> **Intent:** {summary}` + +### Surface Area Stats + +Compute from `git diff --stat` against the appropriate baseline: + +- **With spec**: Use `baseline_commit` from frontmatter. If missing, diff `HEAD~1..HEAD` and tell the user stats reflect only the latest commit. +- **Bare commit**: Diff against parent (`commit~1..commit`). For merge commits, use `--first-parent`. For initial commits, diff against the empty tree (`4b825dc..commit`). + +Display as: + +``` +N files changed · M modules touched · ~L lines of logic · B boundary crossings · P new public interfaces +``` + +- **Files changed**: from `git diff --stat` +- **Modules touched**: distinct top-level directories with changes +- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. `~` because approximate. +- **Boundary crossings**: changes spanning more than one top-level module. `0` if single module. +- **New public interfaces**: new exports, endpoints, public methods. `0` if none. + +If git is unavailable or a command fails, show what you can and note what's missing. + +### Present + +``` +[Orientation] → Walkthrough → Detail Pass → Testing + +> **Intent:** {intent_summary} + +{stats line} +``` + +## FALLBACK TRAIL GENERATION + +**Skip this section if review mode is `full-trail`.** + +When no Suggested Review Order exists, generate one from the diff and codebase context. A generated trail is lower quality than an author-produced one, but far better than none. + +1. Get the full diff against the appropriate baseline (same rules as Surface Area Stats). +2. Read changed files in full — not just diff hunks. Surrounding code reveals intent that hunks alone miss. +3. If a spec exists, use its Intent section to anchor concern identification. +4. Identify 2–5 concerns: cohesive design intents that each explain *why* behind a cluster of changes. Prefer functional groupings and architectural boundaries over file-level splits. A single-concern change is fine — don't invent groupings. +5. For each concern, select 1–4 `path:line` stops — locations where the concern is most visible. Prefer entry points, decision points, and boundary crossings over mechanical changes. +6. Lead with the entry point — the highest-leverage stop a reviewer should see first. Inside each concern, order stops so each builds on the previous. End with peripherals (tests, config, types). +7. Format each stop as a workspace-relative markdown link — basename and line as link text, path with `#L` anchor as target: + +``` +**{Concern name}** + +- {one-line framing, ≤15 words} + [`file.ts:42`](/src/path/to/file.ts#L42) +``` + +When there is only one concern, omit the bold label — just list the stops directly. + +Present after the orientation output: + +``` +I built a review trail for this {change_type} (no author-produced trail was found): + +{generated trail} +``` + +Set review mode to `full-trail`. The generated trail is the Suggested Review Order for subsequent steps. + +If git is unavailable or the diff cannot be retrieved, skip this section and proceed with the original review mode. Note: "Could not generate trail — git unavailable." + +## NEXT + +Read fully and follow `./step-02-walkthrough.md` diff --git a/src/bmm-skills/4-implementation/bmad-checkpoint/step-02-walkthrough.md b/src/bmm-skills/4-implementation/bmad-checkpoint/step-02-walkthrough.md new file mode 100644 index 000000000..ffe8b138c --- /dev/null +++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-02-walkthrough.md @@ -0,0 +1,88 @@ +# Step 2: Walkthrough + +Display: `Orientation → [Walkthrough] → Detail Pass → Testing` + +## RULES + +- Organize by **concern**, not by file. A concern is a cohesive design intent — e.g., "input validation," "state management," "API contract." One file may appear under multiple concerns; one concern may span multiple files. +- Every code reference uses clickable `path:line` format per the standing rule. +- **Front-load then shut up.** Present the entire walkthrough in a single message. Do not ask questions mid-walkthrough. Do not pause between concerns. After presenting, go quiet — the human reads, clicks, navigates at their own pace. +- The walkthrough activates **design judgment**, not correctness checking. Frame each concern as "here's what this change does and why" — the human evaluates whether it's the right approach for the system. + +## BUILD THE WALKTHROUGH + +### Identify Concerns + +**With Suggested Review Order** (`full-trail` mode): + +1. Read the Suggested Review Order stops from the spec (or from conversation context if generated by step-01 fallback). +2. Resolve each stop to a file in the current repo. Output in `path:line` format per the standing rule. +3. Read the diff to understand what each stop actually does. +4. Group stops by concern. Stops that share a design intent belong together even if they're in different files. A stop may appear under multiple concerns if it serves multiple purposes. + +**Without Suggested Review Order** (`spec-only` or `bare-commit` mode): + +1. Get the diff against the appropriate baseline (same rules as step 1). +2. Identify concerns by reading the diff for cohesive design intents: + - Functional groupings — what user-facing behavior does each cluster of changes support? + - Architectural layers — does the change cross boundaries (API → service → data)? + - Design decisions — where did the author choose between alternatives? +3. For each concern, identify the key code locations as `path:line` stops. + +### Order for Comprehension + +Sequence concerns top-down: start with the highest-level intent (the "what and why"), then drill into supporting implementation. Within each concern, order stops so each one builds on the previous. The reader should never encounter a reference to something they haven't seen yet. + +If the change has a natural entry point (e.g., a new public API, a config change, a UI entry point), lead with it. + +### Write Each Concern + +For each concern, produce: + +1. **Heading** — a short phrase naming the design intent (not a file name, not a module name). +2. **Why** — 1–2 sentences: what problem this concern addresses, why this approach was chosen over alternatives. If the spec documents rejected alternatives, reference them here. +3. **Stops** — each stop on its own line: `path:line` followed by a brief phrase (not a sentence) describing what this location does for the concern. Keep framing under 15 words per stop. + +Target 2–5 concerns for a typical change. A single-concern change is fine — don't invent groupings. A change with more than 7 concerns is a signal the scope may be too large, but present it anyway. + +## PRESENT + +Output the full walkthrough as a single message with this structure: + +``` +Orientation → [Walkthrough] → Detail Pass → Testing +``` + +Then each concern group using this format: + +``` +### {Concern Heading} + +{Why — 1–2 sentences} + +- `path:line` — {brief framing} +- `path:line` — {brief framing} +- ... +``` + +End the message with: + +``` +--- + +Take your time — click through the stops, read the diff, trace the logic. While you are reviewing, you can: +- "run advanced elicitation on the error handling" +- "party mode on whether this schema migration is safe" +- or just ask anything + +When you are done with the walkthrough: +- **"detail pass"** — I'll surface the highest-risk spots and run code review +- **"testing"** — I'll suggest how to observe the behavior +- **"I've seen enough"** — tell me what to do about this {change_type} and we wrap up +``` + +**Do NOT speak again until the human responds.** + +## NEXT + +When the human signals readiness for the next phase, read fully and follow `./step-03-detail-pass.md` diff --git a/src/bmm-skills/4-implementation/bmad-checkpoint/step-03-detail-pass.md b/src/bmm-skills/4-implementation/bmad-checkpoint/step-03-detail-pass.md new file mode 100644 index 000000000..5f42139d8 --- /dev/null +++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-03-detail-pass.md @@ -0,0 +1,103 @@ +# Step 3: Detail Pass + +Display: `Orientation → Walkthrough → [Detail Pass] → Testing` + +## RULES + +- Every code reference uses clickable `path:line` format per the standing rule. +- **Front-load then shut up.** Present all findings in a single message. Do not drip-feed. +- The detail pass surfaces what the human should **think about**, not what the code got wrong. Machine hardening already handled correctness. This activates risk awareness. +- The LLM detects risk category by pattern. The human judges significance. Do not assign severity scores or numeric rankings — ordering by blast radius (below) is sequencing for readability, not a severity judgment. +- If no high-risk spots exist, say so explicitly. Do not invent findings. + +## IDENTIFY RISK SPOTS + +Scan the diff for changes touching risk-sensitive patterns. Look for 2–5 spots where a mistake would have the highest blast radius — not the most complex code, but the code where being wrong costs the most. + +Risk categories to detect: + +- `[auth]` — authentication, authorization, session, token, permission, access control +- `[public API]` — new/changed endpoints, exports, public methods, interface contracts +- `[schema]` — database migrations, schema changes, data model modifications, serialization +- `[billing]` — payment, pricing, subscription, metering, usage tracking +- `[infra]` — deployment, CI/CD, environment variables, config files, infrastructure +- `[security]` — input validation, sanitization, crypto, secrets, CORS, CSP +- `[config]` — feature flags, environment-dependent behavior, defaults +- `[other]` — anything risk-sensitive that doesn't fit the above (e.g., concurrency, data privacy, backwards compatibility). Use a descriptive tag. + +Sequence spots so the highest blast radius comes first (how much breaks if this is wrong), not by diff order or file order. If more than 5 spots qualify, show the top 5 and note: "N additional spots omitted — ask if you want the full list." + +If the change has no spots matching these patterns, state: "No high-risk spots found in this change — the diff speaks for itself." Do not force findings. + +## SURFACE MACHINE HARDENING FINDINGS + +Check whether the spec has a `## Spec Change Log` section with entries (populated by adversarial review loops). + +- **If entries exist:** Read them. Surface findings that are instructive for the human reviewer — not bugs that were already fixed, but decisions the review loop flagged that the human should be aware of. Format: brief summary of what was flagged and what was decided. +- **If no entries or no spec:** Skip this section entirely. Do not mention it. + +## PRESENT + +Output as a single message: + +``` +Orientation → Walkthrough → [Detail Pass] → Testing +``` + +### Risk Spots + +For each spot, one line: + +``` +- `path:line` — [tag] reason-phrase +``` + +Example: + +``` +- `src/auth/middleware.ts:42` — [auth] New token validation bypasses rate limiter +- `migrations/003_add_index.sql:7` — [schema] Index on high-write table, check lock behavior +- `api/routes/billing.ts:118` — [billing] Metering calculation changed, verify idempotency +``` + +### Machine Hardening (only if findings exist) + +``` +### Machine Hardening + +- Finding summary — what was flagged, what was decided +- ... +``` + +### Closing menu + +End the message with: + +``` +--- + +You've seen the design and the risk landscape. From here: +- **"dig into [area]"** — I'll deep-dive that specific area with correctness focus +- **"testing"** — I'll suggest how to observe the behavior +- **"I've seen enough"** — tell me what to do about this {change_type} and we wrap up +``` + +**Do NOT speak again until the human responds.** + +## TARGETED RE-REVIEW + +When the human says "dig into [area]" (e.g., "dig into the auth changes", "dig into the schema migration"): + +1. If the specified area does not map to any code in the diff, say so: "I don't see [area] in this change — did you mean something else?" Return to the closing menu. +2. Identify all code locations in the diff relevant to the specified area. +3. Read each location in full context (not just the diff hunk — read surrounding code). +4. Shift to **correctness mode**: trace edge cases, check boundary conditions, verify error handling, look for off-by-one errors, race conditions, resource leaks. +5. Present findings as a compact list — each finding is `path:line` + what you found + why it matters. +6. If nothing concerning is found, say so: "Looked closely at [area] — nothing concerning. The implementation is solid." +7. After presenting, show only the closing menu (not the full risk spots list again). + +The human can trigger multiple targeted re-reviews. Each time, present new findings and the closing menu only. + +## NEXT + +When the human signals readiness for testing, read fully and follow `./step-04-testing.md` diff --git a/src/bmm-skills/4-implementation/bmad-checkpoint/step-04-testing.md b/src/bmm-skills/4-implementation/bmad-checkpoint/step-04-testing.md new file mode 100644 index 000000000..327d0a4ec --- /dev/null +++ b/src/bmm-skills/4-implementation/bmad-checkpoint/step-04-testing.md @@ -0,0 +1,95 @@ +# Step 4: Testing + +Display: `Orientation → Walkthrough → Detail Pass → [Testing]` + +## RULES + +- Every code reference uses clickable `path:line` format per the standing rule. +- **Front-load then shut up.** Present all suggestions in a single message. Do not drip-feed. +- This is **experiential**, not analytical. The detail pass asked "did you think about X?" — this says "you could see X with your own eyes." +- Do not prescribe. The human decides whether observing the behavior is worth their time. Frame suggestions as options, not obligations. +- Do not duplicate CI, test suites, or automated checks. Assume those exist and work. This is about manual observation — the kind of confidence-building no automated test provides. +- If the change has no user-visible behavior, say so explicitly. Do not invent observations. + +## IDENTIFY OBSERVABLE BEHAVIOR + +Scan the diff and spec for changes that produce behavior a human could directly observe. Categories to look for: + +- **UI changes** — new screens, modified layouts, changed interactions, error states +- **CLI/terminal output** — new commands, changed output, new flags or options +- **API responses** — new endpoints, changed payloads, different status codes +- **State changes** — database records, file system artifacts, config effects +- **Error paths** — bad input, missing dependencies, edge conditions + +For each observable behavior, determine: + +1. **What to do** — the specific action (command to run, button to click, request to send) +2. **What to expect** — the observable result that confirms the change works +3. **Why bother** — one phrase connecting this observation to the change's intent (omit if obvious from context) + +Target 2–5 suggestions for a typical change. If more than 5 qualify, prioritize by how much confidence the observation provides relative to effort. A change with zero observable behavior is fine — do not pad with trivial observations. + +## PRESENT + +Output as a single message: + +``` +Orientation → Walkthrough → Detail Pass → [Testing] +``` + +Then the testing suggestions using this format: + +``` +### How to See It Working + +**{Brief description}** +Do: {specific action} +Expect: {observable result} + +**{Brief description}** +Do: {specific action} +Expect: {observable result} +``` + +Include code blocks for commands or requests where helpful. + +If the change has no observable behavior, replace the suggestions with: + +``` +### How to See It Working + +This change is internal — no user-visible behavior to observe. The diff and tests tell the full story. +``` + +### Closing + +End the message with: + +``` +--- + +You've seen the change and how to verify it. From here: +- **"I've seen enough"** — we wrap up this {change_type} +- or just ask anything +``` + +**Do NOT speak again until the human responds.** + +## WRAP-UP + +When the human says "I've seen enough" or signals they're done: + +``` +--- + +Review complete. What's the call on this {change_type}? +- **Approve** — ship it +- **Request changes** — I'll help draft feedback +- **Discuss** — something's still on your mind +``` + +Act on their decision: + +- **Approve**: Acknowledge briefly. If reviewing a PR, offer to approve via `gh pr review --approve` — but confirm with the human before executing, since this is a visible action on a shared resource. +- **Request changes**: Ask what needs changing. Help draft specific, actionable feedback tied to `path:line` locations. +- **Discuss**: Open conversation — answer questions, explore concerns, dig into any aspect. diff --git a/src/bmm-skills/4-implementation/bmad-checkpoint/workflow.md b/src/bmm-skills/4-implementation/bmad-checkpoint/workflow.md new file mode 100644 index 000000000..cb07e1272 --- /dev/null +++ b/src/bmm-skills/4-implementation/bmad-checkpoint/workflow.md @@ -0,0 +1,24 @@ +--- +main_config: '{project-root}/_bmad/bmm/config.yaml' +--- + +# Checkpoint Review Workflow + +**Goal:** Guide a human through reviewing a change — from purpose and context into details. + +You are assisting the user in reviewing a change. + +**Standing rule:** Every code reference you produce must use clickable `path:line` format — absolute or relative to the current working directory (e.g., `src/auth/middleware.ts:42`). + +## INITIALIZATION + +Load and read full config from `{main_config}` and resolve: + +- `implementation_artifacts` +- `communication_language` + +YOU MUST ALWAYS SPEAK OUTPUT in your Agent communication style with the config `{communication_language}`. + +## FIRST STEP + +Read fully and follow `./step-01-orientation.md` to begin. diff --git a/src/bmm-skills/module-help.csv b/src/bmm-skills/module-help.csv index 899dfd8e2..baf016b27 100644 --- a/src/bmm-skills/module-help.csv +++ b/src/bmm-skills/module-help.csv @@ -27,5 +27,6 @@ BMad Method,bmad-create-story,Create Story,CS,"Story cycle start: Prepare first BMad Method,bmad-create-story,Validate Story,VS,Validates story readiness and completeness before development work begins.,validate,,4-implementation,bmad-create-story:create,bmad-dev-story,false,implementation_artifacts,story validation report BMad Method,bmad-dev-story,Dev Story,DS,Story cycle: Execute story implementation tasks and tests then CR then back to DS if fixes needed.,,4-implementation,bmad-create-story:validate,,true,, BMad Method,bmad-code-review,Code Review,CR,Story cycle: If issues back to DS if approved then next CS or ER if epic complete.,,4-implementation,bmad-dev-story,,false,, +BMad Method,bmad-checkpoint,Checkpoint,CK,Guided walkthrough of a change from purpose and context into details. Use for human review of commits branches or PRs.,,4-implementation,,,false,, BMad Method,bmad-qa-generate-e2e-tests,QA Automation Test,QA,Generate automated API and E2E tests for implemented code. NOT for code review or story validation — use CR for that.,,4-implementation,bmad-dev-story,,false,implementation_artifacts,test suite BMad Method,bmad-retrospective,Retrospective,ER,Optional at epic end: Review completed work lessons learned and next epic or if major issues consider CC.,,4-implementation,bmad-code-review,,false,implementation_artifacts,retrospective