docs: add QD2 run logs and reusable capture-run prompt

Second QD2 skeleton test run (add plan-review step to task-01) with raw JSONL log and analysis. Add CAPTURE-RUN.md as a repeatable prompt for post-run log capture and summarization.
2026-02-22 18:28:55 -07:00 · 2026-02-22 18:28:55 -07:00 · 1f33bd3395
parent 7dcd260067
commit 1f33bd3395
4 changed files with 262 additions and 0 deletions
--- a/_experiment/planning/roadmap/task-01-test-skeleton.md
+++ b/_experiment/planning/roadmap/task-01-test-skeleton.md
@ -16,9 +16,13 @@ Run QD2 as-is (bare one-liner prompts + BMM plumbing) on a real small task. Docu
   - What questions did it ask that it shouldn't have?
   - What did it fail to do that it should have?

+5. Run an adversarial review of the test findings against the plan file (`_experiment/planning/redesign-plan.md`). For each gap or plumbing issue, trace whether the plan specified the behavior that was missing — classify as **Plan Gap** (plan didn't cover it) or **Execution Gap** (plan covered it but the step file didn't deliver).
+
 ## Output

 A findings document: `_experiment/results/skeleton-test-findings.md` with per-step observations classified as:
 - **Works** — training handled it fine, no tightening needed
 - **Gap** — specific behavior missing or wrong, needs prompt tightening
 - **Plumbing** — structural issue with the BMM infrastructure itself
+- **Plan Gap** — plan didn't specify the expected behavior
+- **Execution Gap** — plan specified it but step file failed to deliver
--- a/_experiment/runs/2026-02-22-add-plan-review-to-task01.jsonl
+++ b/_experiment/runs/2026-02-22-add-plan-review-to-task01.jsonl
--- a/_experiment/runs/2026-02-22-add-plan-review-to-task01.md
+++ b/_experiment/runs/2026-02-22-add-plan-review-to-task01.md
@ -0,0 +1,46 @@
+# QD2 Run: Add Plan Review Step to Task-01
+
+**Date:** 2026-02-22
+**Workflow:** quick-dev2 (experimental)
+**Branch:** exp/quick-flow-redesign
+
+---
+
+## Intent
+
+User invoked `/bmad-bmm-quick-dev2` then pointed at `_experiment/planning/roadmap/task-01-test-skeleton.md` and requested: "Add an instruction to this step to run a review of the test result against the plan file under experiment directory."
+
+## Routing
+
+- **Route chosen:** One-shot (implicit — agent acted immediately without explicit routing)
+- **Rationale:** Single file edit, clear intent.
+
+## What Happened
+
+Another one-shot. The agent captured the intent — add a review step that compares run results against the plan — but **implemented it wrong** in two ways:
+
+### Error 1: "Adversarial review of test findings against the plan file"
+
+The agent wrote: _"Run an adversarial review of the test findings against the plan file."_
+
+This is incoherent. If it's an **adversarial review**, the target should be an artifact produced by the run — the diff, the code changes, the spec. An adversarial review operates on deliverables, not on findings (which are themselves review output).
+
+### Error 2: Wrong framing of what gets compared to the plan
+
+If the goal is to compare something **against the plan file**, the right input is the **run results** — what the agent actually did (its routing decisions, its intent capture, its behavior at each step) — not the "test findings." The plan describes intended workflow behavior; you'd check whether the agent's behavior matched the plan's design intent.
+
+### What Was Actually Requested
+
+A step that reviews the **results of the QD2 test run** (what happened, what the agent did) against the **plan file** (`_experiment/planning/redesign-plan.md`) to identify where behavior diverged from design. This is a conformance check, not an adversarial review.
+
+## Diff Produced
+
+- Added method step 5: adversarial review of findings against plan (wrong framing)
+- Added two output classifications: **Plan Gap** and **Execution Gap** (reasonable categories, but derived from the wrong framing)
+
+## Observations
+
+- Intent capture succeeded directionally — the agent understood "review against plan file" and correctly located the plan path.
+- The implementation conflated two distinct review types: adversarial review (attacks an artifact for flaws) vs. conformance review (checks behavior against a specification).
+- The agent did not ask any clarifying questions about what "review of the test result against the plan file" meant — it assumed and got it wrong.
+- This is the second consecutive one-shot where the agent captured the gist but mangled the specifics. Pattern: one-shot route works for mechanical changes but fails when the intent requires domain understanding of review methodology.
--- a/_experiment/runs/CAPTURE-RUN.md
+++ b/_experiment/runs/CAPTURE-RUN.md
@ -0,0 +1,74 @@
+# Capture QD2 Run Log
+
+**Read and follow these instructions after completing a QD2 test run.**
+
+---
+
+## 1. Save the Raw Log
+
+Copy the most recent JSONL conversation log to this directory:
+
+```bash
+# Find the most recent conversation log
+ls -lt ~/.claude/projects/-Users-alex-src-bmad-quick-flow-redesign/*.jsonl | head -1
+
+# Copy it with the naming convention: YYYY-MM-DD-<short-slug>.jsonl
+cp <path-from-above> _experiment/runs/YYYY-MM-DD-<short-slug>.jsonl
+```
+
+The slug should be a few hyphenated words capturing what the run attempted (e.g., `eliminate-apc-menu-gates`, `add-plan-review-to-task01`).
+
+## 2. Summarize the Run
+
+Create a matching `.md` file with the same name. Use this structure:
+
+```markdown
+# QD2 Run: <Title>
+
+**Date:** YYYY-MM-DD
+**Workflow:** quick-dev2 (experimental)
+**Branch:** <branch name>
+
+---
+
+## Intent
+
+What was requested? Quote the user's actual words or paraphrase closely.
+
+## Routing
+
+- **Route chosen:** One-shot / Plan-code-review / Full BMM
+- **Rationale:** Why did the agent pick this route?
+
+## What Happened
+
+Narrative of what the agent actually did. Focus on:
+- Did it follow the workflow plumbing? (config loading, step transitions)
+- Did it capture intent correctly?
+- Where did it drift, assume, or get things wrong?
+- What clarifying questions did it ask (or fail to ask)?
+
+## Diff Produced
+
+Summarize the actual changes made (or note if no changes were made).
+
+## Human Notes
+
+<LEAVE BLANK — human fills this in>
+
+## Observations
+
+Bullet list of patterns, surprises, or insights worth tracking.
+```
+
+## 3. Get Human Notes
+
+Ask the human: **"Any notes on this run? How did it feel, what went wrong, what surprised you?"**
+
+Write their response into the **Human Notes** section verbatim or lightly edited for clarity. Do not summarize away their meaning.
+
+## 4. Verify
+
+Confirm both files exist:
+- `_experiment/runs/YYYY-MM-DD-<slug>.jsonl`
+- `_experiment/runs/YYYY-MM-DD-<slug>.md`