Merge 64c2ae3e8c into 07d72394fd

Merge branch 'main' into main
fix(checkpoint): add explicit HALT before decision menu in wrapup step (#2184 )
2026-04-02 12:20:44 +07:00 · 2026-04-02 12:20:42 +07:00 · 2026-04-01 22:52:46 -06:00 · 2026-04-01 22:20:48 -06:00 · 2026-04-01 10:43:08 -07:00
7 changed files with 108 additions and 11 deletions
--- a/docs/explanation/checkpoint-preview.md
+++ b/docs/explanation/checkpoint-preview.md
@ -0,0 +1,92 @@
+---
+title: "Checkpoint Preview"
+description: LLM-assisted human-in-the-loop review that guides you through a change from purpose to details
+sidebar:
+  order: 3
+---
+
+`bmad-checkpoint-preview` is an interactive, LLM-assisted human-in-the-loop review workflow. It walks you through a code change — from purpose and context into details — so you can make an informed decision about whether to ship, rework, or dig deeper.
+
+![Checkpoint Preview workflow diagram](/diagrams/checkpoint-preview-diagram.png)
+
+## The Typical Flow
+
+You run `bmad-quick-dev`. It clarifies your intent, builds a spec, implements the change, and when it's done it appends a review trail to the spec file and opens it in your editor. You look at the spec and see the change touched 20 files across several modules.
+
+You could eyeball the diff. But 20 files is where eyeballing starts to fail — you lose the thread, miss a connection between two distant changes, or approve something you didn't fully understand. So instead, you say "checkpoint" and the LLM walks you through it.
+
+That handoff — from autonomous implementation back to human judgment — is the primary use case. Quick-dev runs long with minimal supervision. Checkpoint Preview is where you take back the wheel.
+
+## Why It Exists
+
+Code review has two failure modes. In one, the reviewer skims the diff, nothing jumps out, and they approve. In the other, they methodically read every file but lose the thread — they see the trees and miss the forest. Both result in the same outcome: the review didn't catch the thing that mattered.
+
+The underlying issue is sequencing. A raw diff presents changes in file order, which is almost never the order that builds understanding. You see a helper function before you know why it exists. You see a schema change before you understand what feature it supports. The reviewer has to reconstruct the author's intent from scattered clues, and that reconstruction is where attention fails.
+
+Checkpoint Preview solves this by making the LLM do the reconstruction work. It reads the diff, the spec (if one exists), and the surrounding codebase, then presents the change in an order designed for comprehension — not for `git diff`.
+
+## How It Works
+
+The workflow has five steps. Each step builds on the previous one, progressively shifting from "what is this?" toward "should we ship it?"
+
+### 1. Orientation
+
+The workflow identifies the change (from a PR, commit, branch, spec file, or the current git state) and produces a one-line intent summary plus surface area stats: files changed, modules touched, lines of logic, boundary crossings, and new public interfaces.
+
+This is the "is this what I think it is?" moment. Before reading any code, the reviewer confirms they're looking at the right thing and calibrates their expectations for scope.
+
+### 2. Walkthrough
+
+The change is organized by **concern** — cohesive design intents like "input validation" or "API contract" — not by file. Each concern gets a short explanation of *why* this approach was chosen, followed by clickable `path:line` stops that the reviewer can follow through the code.
+
+This is the design judgment step. The reviewer evaluates whether the approach is right for the system, not whether the code is correct. Concerns are sequenced top-down: the highest-level intent first, then supporting implementation. The reviewer never encounters a reference to something they haven't seen yet.
+
+### 3. Detail Pass
+
+After the reviewer understands the design, the workflow surfaces 2-5 spots where a mistake would have the highest blast radius. These are tagged by risk category — `[auth]`, `[schema]`, `[billing]`, `[public API]`, `[security]`, and others — and ordered by how much breaks if they're wrong.
+
+This is not a bug hunt. Automated tests and CI handle correctness. The detail pass activates risk awareness: "here are the places where being wrong costs the most." If the reviewer wants to go deeper on a specific area, they can say "dig into [area]" for a targeted correctness-focused re-review.
+
+If the spec went through adversarial review loops (machine hardening), those findings are surfaced here too — not the bugs that were fixed, but the decisions that the review loop flagged that the reviewer should be aware of.
+
+### 4. Testing
+
+Suggests 2-5 ways to manually observe the change working. Not automated test commands — manual observations that build confidence no test suite provides. A UI interaction to try, a CLI command to run, an API request to send, with expected results for each.
+
+If the change has no user-visible behavior, it says so. No invented busywork.
+
+### 5. Wrap-Up
+
+The reviewer makes the call: approve, rework, or keep discussing. If approving a PR, the workflow can help with `gh pr review --approve`. If reworking, it helps diagnose whether the problem was the approach, the spec, or the implementation, and helps draft actionable feedback tied to specific code locations.
+
+## It's a Conversation, Not a Report
+
+The workflow presents each step as a starting point, not a final word. Between steps — or in the middle of one — you can talk to the LLM, ask questions, challenge its framing, or pull in other skills to get a different perspective:
+
+- **"run advanced elicitation on the error handling"** — push the LLM to reconsider and refine its analysis of a specific area
+- **"party mode on whether this schema migration is safe"** — bring multiple agent perspectives into a focused debate
+- **"run code review"** — generate structured agentic findings with adversarial and edge-case analysis
+
+The checkpoint workflow doesn't lock you into a linear path. It gives you structure when you want it and gets out of the way when you want to explore. The five steps are there to make sure you see the whole picture, but how deep you go at each step — and what tools you bring in — is entirely up to you.
+
+## The Review Trail
+
+The walkthrough step works best when it has a **Suggested Review Order** — a list of stops the spec author wrote to guide reviewers through the change. When a spec includes this, the workflow uses it directly.
+
+When no author-produced trail exists, the workflow generates one from the diff and codebase context. A generated trail is lower quality than an author-produced one, but far better than reading changes in file order.
+
+## When to Use It
+
+The primary scenario is the handoff from `bmad-quick-dev`: the implementation is done, the spec file is open in your editor with a review trail appended, and you need to decide whether to ship. Say "checkpoint" and go.
+
+It also works standalone:
+
+- **Reviewing a PR** — especially one with more than a handful of files or cross-cutting changes
+- **Onboarding to a change** — when you need to understand what happened on a branch you didn't write
+- **Sprint review** — the workflow can pick up stories marked `review` in your sprint status file
+
+Invoke it by saying "checkpoint" or "walk me through this change." It works in any terminal, but you'll get more out of it inside an IDE — VS Code, Cursor, or similar — because the workflow produces `path:line` references at every step. In an IDE-embedded terminal those are clickable, so you can jump from file to file as you follow the review trail.
+
+## What It Is Not
+
+Checkpoint Preview is not a substitute for automated review. It does not run linters, type checkers, or test suites. It does not assign severity scores or produce pass/fail verdicts. It is a reading guide that helps a human apply their judgment where it matters most.
--- a/src/bmm-skills/4-implementation/bmad-checkpoint-preview/SKILL.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint-preview/SKILL.md
@ -13,7 +13,7 @@ You are assisting the user in reviewing a change.

 - **Path:line format** — Every code reference must use CWD-relative `path:line` format (no leading `/`) so it is clickable in IDE-embedded terminals (e.g., `src/auth/middleware.ts:42`).
 - **Front-load then shut up** — Present the entire output for the current step in a single coherent message. Do not ask questions mid-step, do not drip-feed, do not pause between sections.
- **Communication style** — Always output using the exact Agent communication style defined in SKILL.md and the loaded config.
+- **Language** — Speak in `{communication_language}`. Write any file output in `{document_output_language}`.

 ## INITIALIZATION

@ -22,6 +22,7 @@ Load and read full config from `{project-root}/_bmad/bmm/config.yaml` and resolv
 - `implementation_artifacts`
 - `planning_artifacts`
 - `communication_language`
+- `document_output_language`

 ## FIRST STEP

--- a/src/bmm-skills/4-implementation/bmad-checkpoint-preview/generate-trail.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint-preview/generate-trail.md
@ -33,6 +33,6 @@ I built a review trail for this {change_type} (no author-produced trail was foun
 {generated trail}
 ```

-Set review mode to `full-trail`. The generated trail is the Suggested Review Order for subsequent steps.
+The generated trail serves as the Suggested Review Order for subsequent steps. Set `review_mode` to `full-trail` — a trail now exists, so all downstream steps should treat it as one.

 If git is unavailable or the diff cannot be retrieved, return to step-01 with: "Could not generate trail — git unavailable."
--- a/src/bmm-skills/4-implementation/bmad-checkpoint-preview/step-01-orientation.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint-preview/step-01-orientation.md
@ -51,7 +51,7 @@ Set `review_mode` — pick the first match:

 1. **`full-trail`** — ENRICH found a spec with a `## Suggested Review Order` section. Intent source: spec's Intent section.
 2. **`spec-only`** — ENRICH found a spec but it has no Suggested Review Order. Intent source: spec's Intent section.
-3. **`bare-commit`** — no spec found. Intent source: commit message. If the commit message is terse (under 10 words), scan the diff for the primary change pattern and draft a one-sentence intent. Confirm with the user before proceeding.
+3. **`bare-commit`** — no spec found. Intent source: commit message. If the commit message is terse (under 10 words), scan the diff for the primary change pattern and draft a one-sentence intent. Flag it as `[inferred]` in the output so the user can correct it.

 ## PRODUCE ORIENTATION

@ -63,24 +63,26 @@ Set `review_mode` — pick the first match:

 ### Surface Area Stats

-Best-effort stats from `git diff --stat`. Try these baselines in order:
+Best-effort stats derived from the diff. Try these baselines in order:

 1. `baseline_commit` from the spec's frontmatter.
 2. Branch merge-base against `main` (or the default branch).
 3. `HEAD~1..HEAD` (latest commit only — tell the user).
 4. If git is unavailable or all of the above fail, skip stats and note: "Could not compute stats."

+Use `git diff --stat` and `git diff --numstat` for file-level counts, and scan the full diff content for the richer metrics.
+
 Display as:

 ```
 N files changed · M modules touched · ~L lines of logic · B boundary crossings · P new public interfaces
 ```

- **Files changed**: from `git diff --stat`.
- **Modules touched**: distinct top-level directories with changes.
- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. `~` because approximate.
+- **Files changed**: count from `git diff --stat`.
+- **Modules touched**: distinct top-level directories with changes (from `--stat` file paths).
+- **Lines of logic**: added/modified lines excluding blanks, imports, formatting. Scan diff content; `~` because approximate.
 - **Boundary crossings**: changes spanning more than one top-level module. `0` if single module.
- **New public interfaces**: new exports, endpoints, public methods. `0` if none.
+- **New public interfaces**: new exports, endpoints, public methods found in the diff. `0` if none.

 Omit any metric you cannot compute rather than guessing.

@ -96,7 +98,7 @@ Omit any metric you cannot compute rather than guessing.

 ## FALLBACK TRAIL GENERATION

-If review mode is not `full-trail`, read fully and follow `./generate-trail.md` to build one from the diff. Then return here and continue to NEXT.
+If review mode is not `full-trail`, read fully and follow `./generate-trail.md` to build one from the diff. Then return here and continue to NEXT. If trail generation fails (e.g., git unavailable), the original review mode is preserved — step-02 handles this with its non-trail path.

 ## NEXT

--- a/src/bmm-skills/4-implementation/bmad-checkpoint-preview/step-02-walkthrough.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint-preview/step-02-walkthrough.md
@ -11,14 +11,14 @@ Display: `Orientation → [Walkthrough] → Detail Pass → Testing`

 ### Identify Concerns

-**With Suggested Review Order** (`full-trail` mode):
+**With Suggested Review Order** (`full-trail` mode — the normal path, including when step-01 generated a trail):

 1. Read the Suggested Review Order stops from the spec (or from conversation context if generated by step-01 fallback).
 2. Resolve each stop to a file in the current repo. Output in `path:line` format per the standing rule.
 3. Read the diff to understand what each stop actually does.
 4. Group stops by concern. Stops that share a design intent belong together even if they're in different files. A stop may appear under multiple concerns if it serves multiple purposes.

-**Without Suggested Review Order** (`spec-only` or `bare-commit` mode):
+**Without Suggested Review Order** (fallback when trail generation failed, e.g., git unavailable):

 1. Get the diff against the appropriate baseline (same rules as step 1).
 2. Identify concerns by reading the diff for cohesive design intents:
--- a/src/bmm-skills/4-implementation/bmad-checkpoint-preview/step-05-wrapup.md
+++ b/src/bmm-skills/4-implementation/bmad-checkpoint-preview/step-05-wrapup.md
@ -15,6 +15,8 @@ Review complete. What's the call on this {change_type}?
 - **Discuss** — something's still on your mind
 ```

+HALT — do not proceed until the user makes their choice.
+
 ## ACT ON DECISION

 - **Approve**: Acknowledge briefly. If the human wants to patch something before shipping, help apply the fix interactively. If reviewing a PR, offer to approve via `gh pr review --approve` — but confirm with the human before executing, since this is a visible action on a shared resource.
--- a/website/public/diagrams/checkpoint-preview-diagram.png
+++ b/website/public/diagrams/checkpoint-preview-diagram.png
Author	SHA1	Message	Date
miendinh	d77691fe90	Merge `64c2ae3e8c` into `07d72394fd`	2026-04-02 12:20:44 +07:00
miendinh	64c2ae3e8c	Merge branch 'main' into main	2026-04-02 12:20:42 +07:00
Alex Verkhovsky	07d72394fd	fix(checkpoint): add explicit HALT before decision menu in wrapup step (#2184 ) Skill validator (STEP-04) flagged the decision menu in step-05 as missing an explicit halt instruction between presenting the menu and acting on the user's choice, risking LLM auto-advance.	2026-04-01 22:52:46 -06:00
Alex Verkhovsky	7ef45d472c	docs(checkpoint): add explainer page and workflow diagram (#2183 ) * docs(checkpoint): add explainer page and workflow diagram Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(checkpoint): replace excalidraw source with exported PNG diagram Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 22:20:48 -06:00
Alex Verkhovsky	2ea917ef5c	fix(checkpoint): address review findings from adversarial triage (#2180 ) Clarify review_mode state transition intent in generate-trail, label step-02 walkthrough branches as normal vs fallback, replace circular communication style rule with config variable refs, swap confirm gate for [inferred] flag, and clarify stats data source as full diff.	2026-04-01 10:43:08 -07:00