docs: add QD2 run logs and reusable capture-run prompt

Second QD2 skeleton test run (add plan-review step to task-01) with
raw JSONL log and analysis. Add CAPTURE-RUN.md as a repeatable prompt
for post-run log capture and summarization.
This commit is contained in:
Alex Verkhovsky 2026-02-22 18:28:55 -07:00
parent 7dcd260067
commit 1f33bd3395
4 changed files with 262 additions and 0 deletions

View File

@ -16,9 +16,13 @@ Run QD2 as-is (bare one-liner prompts + BMM plumbing) on a real small task. Docu
- What questions did it ask that it shouldn't have?
- What did it fail to do that it should have?
5. Run an adversarial review of the test findings against the plan file (`_experiment/planning/redesign-plan.md`). For each gap or plumbing issue, trace whether the plan specified the behavior that was missing — classify as **Plan Gap** (plan didn't cover it) or **Execution Gap** (plan covered it but the step file didn't deliver).
## Output
A findings document: `_experiment/results/skeleton-test-findings.md` with per-step observations classified as:
- **Works** — training handled it fine, no tightening needed
- **Gap** — specific behavior missing or wrong, needs prompt tightening
- **Plumbing** — structural issue with the BMM infrastructure itself
- **Plan Gap** — plan didn't specify the expected behavior
- **Execution Gap** — plan specified it but step file failed to deliver

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,46 @@
# QD2 Run: Add Plan Review Step to Task-01
**Date:** 2026-02-22
**Workflow:** quick-dev2 (experimental)
**Branch:** exp/quick-flow-redesign
---
## Intent
User invoked `/bmad-bmm-quick-dev2` then pointed at `_experiment/planning/roadmap/task-01-test-skeleton.md` and requested: "Add an instruction to this step to run a review of the test result against the plan file under experiment directory."
## Routing
- **Route chosen:** One-shot (implicit — agent acted immediately without explicit routing)
- **Rationale:** Single file edit, clear intent.
## What Happened
Another one-shot. The agent captured the intent — add a review step that compares run results against the plan — but **implemented it wrong** in two ways:
### Error 1: "Adversarial review of test findings against the plan file"
The agent wrote: _"Run an adversarial review of the test findings against the plan file."_
This is incoherent. If it's an **adversarial review**, the target should be an artifact produced by the run — the diff, the code changes, the spec. An adversarial review operates on deliverables, not on findings (which are themselves review output).
### Error 2: Wrong framing of what gets compared to the plan
If the goal is to compare something **against the plan file**, the right input is the **run results** — what the agent actually did (its routing decisions, its intent capture, its behavior at each step) — not the "test findings." The plan describes intended workflow behavior; you'd check whether the agent's behavior matched the plan's design intent.
### What Was Actually Requested
A step that reviews the **results of the QD2 test run** (what happened, what the agent did) against the **plan file** (`_experiment/planning/redesign-plan.md`) to identify where behavior diverged from design. This is a conformance check, not an adversarial review.
## Diff Produced
- Added method step 5: adversarial review of findings against plan (wrong framing)
- Added two output classifications: **Plan Gap** and **Execution Gap** (reasonable categories, but derived from the wrong framing)
## Observations
- Intent capture succeeded directionally — the agent understood "review against plan file" and correctly located the plan path.
- The implementation conflated two distinct review types: adversarial review (attacks an artifact for flaws) vs. conformance review (checks behavior against a specification).
- The agent did not ask any clarifying questions about what "review of the test result against the plan file" meant — it assumed and got it wrong.
- This is the second consecutive one-shot where the agent captured the gist but mangled the specifics. Pattern: one-shot route works for mechanical changes but fails when the intent requires domain understanding of review methodology.

View File

@ -0,0 +1,74 @@
# Capture QD2 Run Log
**Read and follow these instructions after completing a QD2 test run.**
---
## 1. Save the Raw Log
Copy the most recent JSONL conversation log to this directory:
```bash
# Find the most recent conversation log
ls -lt ~/.claude/projects/-Users-alex-src-bmad-quick-flow-redesign/*.jsonl | head -1
# Copy it with the naming convention: YYYY-MM-DD-<short-slug>.jsonl
cp <path-from-above> _experiment/runs/YYYY-MM-DD-<short-slug>.jsonl
```
The slug should be a few hyphenated words capturing what the run attempted (e.g., `eliminate-apc-menu-gates`, `add-plan-review-to-task01`).
## 2. Summarize the Run
Create a matching `.md` file with the same name. Use this structure:
```markdown
# QD2 Run: <Title>
**Date:** YYYY-MM-DD
**Workflow:** quick-dev2 (experimental)
**Branch:** <branch name>
---
## Intent
What was requested? Quote the user's actual words or paraphrase closely.
## Routing
- **Route chosen:** One-shot / Plan-code-review / Full BMM
- **Rationale:** Why did the agent pick this route?
## What Happened
Narrative of what the agent actually did. Focus on:
- Did it follow the workflow plumbing? (config loading, step transitions)
- Did it capture intent correctly?
- Where did it drift, assume, or get things wrong?
- What clarifying questions did it ask (or fail to ask)?
## Diff Produced
Summarize the actual changes made (or note if no changes were made).
## Human Notes
<LEAVE BLANK human fills this in>
## Observations
Bullet list of patterns, surprises, or insights worth tracking.
```
## 3. Get Human Notes
Ask the human: **"Any notes on this run? How did it feel, what went wrong, what surprised you?"**
Write their response into the **Human Notes** section verbatim or lightly edited for clarity. Do not summarize away their meaning.
## 4. Verify
Confirm both files exist:
- `_experiment/runs/YYYY-MM-DD-<slug>.jsonl`
- `_experiment/runs/YYYY-MM-DD-<slug>.md`