refactor(workflows): simplify QD2 steps and enforce ask-dont-fantasize

Strip task sharding, rigid approval menu, and VC backfill. Add intent_gap-first classification cascade to review. Use slug-based spec naming in plan step.
2026-02-22 20:42:53 -07:00 · 2026-02-22 20:42:53 -07:00 · 3126b9c4ee
parent f152121761
commit 3126b9c4ee
7 changed files with 759 additions and 26 deletions
--- a/_experiment/runs/2026-02-22-remove-with-argument-help-chaining.jsonl
+++ b/_experiment/runs/2026-02-22-remove-with-argument-help-chaining.jsonl
--- a/_experiment/runs/2026-02-22-remove-with-argument-help-chaining.md
+++ b/_experiment/runs/2026-02-22-remove-with-argument-help-chaining.md
@ -0,0 +1,51 @@
+# QD2 Run: Remove ambiguous "with argument" from help task chaining
+
+**Date:** 2026-02-22
+**Workflow:** quick-dev2 (experimental)
+**Branch:** exp/quick-flow-redesign
+
+---
+
+## Intent
+
+User provided GitHub issue #1637: remove the `with argument` clause from 7 workflow completion steps that chain to `help.md`. The clause caused LLMs to interpret the file-read instruction as a skill/function invocation.
+
+## Routing
+
+- **Route chosen:** Plan-code-review (after correction)
+- **Rationale:** Initially routed as one-shot. User challenged this ("One shot, really?") — the issue says "figure out the right approach," which requires a design decision, plus there are two sub-bugs to investigate (File #1 wrong argument, File #7 casing mismatch). Re-routed to plan-code-review.
+
+## What Happened
+
+**Step 1 (Clarify & Route):** Loaded issue #1637 via `gh issue view`. Captured intent correctly. Initially routed as one-shot — user corrected this. Re-routed to plan-code-review.
+
+**Step 2 (Plan):** Investigated all 7 files, `help.md`, and the CSV catalog. Generated spec proposing to drop `with argument` entirely since the preceding "[Workflow] complete." text provides sufficient context for `help.md`'s inference. User questioned whether dropping it was truly consequence-free. Agent initially said "yes, really" with a confident explanation. User pushed back: "I don't really believe that. I'd like to see where in git history these were added." Agent traced the full commit history:
+- `37346079` created `whats-after.md` with `Execute task ... with argument` as a deliberate input mechanism
+- `ec8ab0c6` renamed to `bmad-help.md`, preserved `with argument`
+- `def8da0a` changed `Execute` → `Read fully and follow:` but left `with argument` dangling
+
+This revealed it was a legacy input channel from `whats-after.md`, not meaningless decoration. Agent corrected its earlier dismissive framing. User approved the spec.
+
+**Step 3 (Implement):** Applied 7 identical single-line edits. Validated with grep — zero remaining instances.
+
+**Step 4 (Review):** Ran adversarial code review in context-free subagent. 10 findings returned. Classified: 0 intent_gap, 0 bad_spec, 0 patch, 3 defer (pre-existing structural issues), 5 reject (out of scope or already documented).
+
+**Step 5 (Present):** Summarized work, committed, user pushed, PR #1739 created against upstream main.
+
+## Diff Produced
+
+7 files, each with a single-line change removing ` with argument \`[workflow name]\`` from the help.md chaining line. Net: 7 insertions, 7 deletions.
+
+## Human Notes
+
+This flowed reasonably well. Took four minutes give or take and tens of thousands of tokens to complete an adversarial review run. Against what is a fairly simple, trivial change.
+
+## Observations
+
+- Agent initially over-confidently dismissed the user's skepticism about dropping the argument. The git history investigation was the user's idea, not the agent's. The agent should have investigated provenance before asserting "no consequences."
+- Routing correction worked well — user challenged one-shot, agent accepted and re-routed without friction.
+- The workflow plumbing (config loading, step transitions, checkpoint halts) was followed correctly through all 5 steps.
+- This is the first QD2 run to complete the full plan-code-review path (previous runs were one-shots).
+- The adversarial review found real pre-existing issues (duplicate next-steps sections, missing step numbers) that are worth separate tickets but correctly classified as out of scope for this change.
+- The spec was approved on first presentation (no edit cycles at Checkpoint 1).
+- No spec-loop iterations were needed (no bad_spec findings from review).
--- a/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-01-clarify-and-route.md
+++ b/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-01-clarify-and-route.md
@ -24,13 +24,12 @@ wipFile: '{implementation_artifacts}/tech-spec-wip.md'

 ## INSTRUCTIONS

-1. Clarify intent until: problem unambiguous, scope clear, no contradictions, you can explain back what you'll do.
-2. Backfill VC conventions to project-context if unknown.
-3. Route:
+1. Clarify intent. Do not fantasize, do not leave open questions Keep asking the human until clear enough to implement. 
+2. Route:
   - **One-shot** — trivial (~3 files). `{execution_mode}` = "one-shot". → Step 3.
   - **Plan-code-review** — normal. → Step 2.
   - **Full BMM** — too big. Recommend and exit.
-   - Ambiguous? Default plan-code-review.
+   - Ambiguous? Default to plan-code-review.

 ---

--- a/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-02-plan.md
+++ b/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-02-plan.md
@ -2,7 +2,8 @@
 name: 'step-02-plan'
 description: 'Investigate, generate spec, present for approval'

-wipFile: '{implementation_artifacts}/tech-spec-wip.md'
+slug: kebab-cased strings are valid as a file name, based on the intent 
+wipFile: '{implementation_artifacts}/tech-spec-[slug].md'
 templateFile: '{installed_path}/tech-spec-template.md'
 ---

@ -20,8 +21,9 @@ templateFile: '{installed_path}/tech-spec-template.md'
 ## INSTRUCTIONS

 1. Investigate codebase.
-2. Generate spec from `{templateFile}` → `{wipFile}`.
+2. Generate spec from `{templateFile}` into `{wipFile}`.
 3. Self-review against READY FOR DEVELOPMENT standard.
+4. If intent gaps exist, do not fantasize, do not leave open questions, ask the human.

 ### CHECKPOINT 1

--- a/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-03-implement.md
+++ b/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-03-implement.md
@ -1,9 +1,6 @@
 ---
 name: 'step-03-implement'
 description: 'Branch, shard tasks, execute, commit. Local only.'
-
-tasksDir: '{implementation_artifacts}/tasks'
-sequenceFile: '{implementation_artifacts}/tasks/sequence.md'
 ---

 # Step 3: Implement
@ -20,14 +17,8 @@ sequenceFile: '{implementation_artifacts}/tasks/sequence.md'

 ## INSTRUCTIONS

-1. Baseline commit. Branch. Assert clean tree.
-2. Shard spec tasks → `{tasksDir}/task-NN.md`. Track in `{sequenceFile}`.
-3. Execute sequentially: read task fresh → implement → verify AC → mark complete → next.
-4. Self-check. Commit.
-
-One-shot: skip sharding, work from mental plan.
-
-Halt after 3 failures on same task, or blocking ambiguity.
+Implement the spec.
+One-shot: implement the intent.

 ---

--- a/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-04-review.md
+++ b/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-04-review.md
@ -3,6 +3,7 @@ name: 'step-04-review'
 description: 'Adversarial review, classify findings, optional spec loop'

 adversarial_review_task: '{project-root}/_bmad/core/tasks/review-adversarial-general.xml'
+deferred_findings_file: '{output_dir}/deferred-findings.md'
 specLoopCap: 5
 ---

@ -19,10 +20,16 @@ specLoopCap: 5

 ## INSTRUCTIONS

-1. Diff from `{baseline_commit}`. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`.
-2. Classify findings: intent > spec > patch > defer > reject.
-3. Spec-class? Amend spec, re-derive, re-review. Max `{specLoopCap}` iterations.
-4. Auto-fix patches. Commit.
+1. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`.
+2. deduplicate and classify their findings as one of (in cascading order): 
+- intent_gap - finding caused by the change and wouldn't happen if intent was clear,
+- bad_spec - all other findings caused by the change, including direct deviations from spec; the spec had to be clear enough to prevent that 
+- patch - all other findings that are real and can be trivially fixed
+- defer - all other findings that are real or uncertain
+- reject - all other findings that are noise
+3. have intent_gap findings? Do not fantasize, ask the user.
+4. have bad_spec findings? See if any of them still stand after intent_gap findings are resolved. If yes, discard lower level findings, amend spec, re-implement, re-review. Max `{specLoopCap}` iterations.
+4. Auto-fix patches. Write deferred findings to `{deferred_findings_file}`. Forget about rejected findings. Commit.

 ---

--- a/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-05-present.md
+++ b/src/bmm/workflows/bmad-quick-flow/quick-dev2/steps/step-05-present.md
@ -16,11 +16,7 @@ description: 'Present findings, get approval, create PR'

 ## INSTRUCTIONS

-Present classified findings. `[A] Approve  [E] Edit  [R] Reject`. HALT.
-
- **A**: Commit patches. Print push command. Wait. Create PR.
- **E**: Apply changes, re-present.
- **R**: Route back.
+Display summary of your work to the user. Advice on how to review the changes. Offer to make a plan for manual testing, or to push and/or create a pull request.

 ---