refactor(workflows): simplify QD2 steps and enforce ask-dont-fantasize

Strip task sharding, rigid approval menu, and VC backfill. Add
intent_gap-first classification cascade to review. Use slug-based
spec naming in plan step.
This commit is contained in:
Alex Verkhovsky 2026-02-22 20:42:53 -07:00
parent f152121761
commit 3126b9c4ee
7 changed files with 759 additions and 26 deletions

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,51 @@
# QD2 Run: Remove ambiguous "with argument" from help task chaining
**Date:** 2026-02-22
**Workflow:** quick-dev2 (experimental)
**Branch:** exp/quick-flow-redesign
---
## Intent
User provided GitHub issue #1637: remove the `with argument` clause from 7 workflow completion steps that chain to `help.md`. The clause caused LLMs to interpret the file-read instruction as a skill/function invocation.
## Routing
- **Route chosen:** Plan-code-review (after correction)
- **Rationale:** Initially routed as one-shot. User challenged this ("One shot, really?") — the issue says "figure out the right approach," which requires a design decision, plus there are two sub-bugs to investigate (File #1 wrong argument, File #7 casing mismatch). Re-routed to plan-code-review.
## What Happened
**Step 1 (Clarify & Route):** Loaded issue #1637 via `gh issue view`. Captured intent correctly. Initially routed as one-shot — user corrected this. Re-routed to plan-code-review.
**Step 2 (Plan):** Investigated all 7 files, `help.md`, and the CSV catalog. Generated spec proposing to drop `with argument` entirely since the preceding "[Workflow] complete." text provides sufficient context for `help.md`'s inference. User questioned whether dropping it was truly consequence-free. Agent initially said "yes, really" with a confident explanation. User pushed back: "I don't really believe that. I'd like to see where in git history these were added." Agent traced the full commit history:
- `37346079` created `whats-after.md` with `Execute task ... with argument` as a deliberate input mechanism
- `ec8ab0c6` renamed to `bmad-help.md`, preserved `with argument`
- `def8da0a` changed `Execute``Read fully and follow:` but left `with argument` dangling
This revealed it was a legacy input channel from `whats-after.md`, not meaningless decoration. Agent corrected its earlier dismissive framing. User approved the spec.
**Step 3 (Implement):** Applied 7 identical single-line edits. Validated with grep — zero remaining instances.
**Step 4 (Review):** Ran adversarial code review in context-free subagent. 10 findings returned. Classified: 0 intent_gap, 0 bad_spec, 0 patch, 3 defer (pre-existing structural issues), 5 reject (out of scope or already documented).
**Step 5 (Present):** Summarized work, committed, user pushed, PR #1739 created against upstream main.
## Diff Produced
7 files, each with a single-line change removing ` with argument \`[workflow name]\`` from the help.md chaining line. Net: 7 insertions, 7 deletions.
## Human Notes
This flowed reasonably well. Took four minutes give or take and tens of thousands of tokens to complete an adversarial review run. Against what is a fairly simple, trivial change.
## Observations
- Agent initially over-confidently dismissed the user's skepticism about dropping the argument. The git history investigation was the user's idea, not the agent's. The agent should have investigated provenance before asserting "no consequences."
- Routing correction worked well — user challenged one-shot, agent accepted and re-routed without friction.
- The workflow plumbing (config loading, step transitions, checkpoint halts) was followed correctly through all 5 steps.
- This is the first QD2 run to complete the full plan-code-review path (previous runs were one-shots).
- The adversarial review found real pre-existing issues (duplicate next-steps sections, missing step numbers) that are worth separate tickets but correctly classified as out of scope for this change.
- The spec was approved on first presentation (no edit cycles at Checkpoint 1).
- No spec-loop iterations were needed (no bad_spec findings from review).

View File

@ -24,13 +24,12 @@ wipFile: '{implementation_artifacts}/tech-spec-wip.md'
## INSTRUCTIONS ## INSTRUCTIONS
1. Clarify intent until: problem unambiguous, scope clear, no contradictions, you can explain back what you'll do. 1. Clarify intent. Do not fantasize, do not leave open questions Keep asking the human until clear enough to implement.
2. Backfill VC conventions to project-context if unknown. 2. Route:
3. Route:
- **One-shot** — trivial (~3 files). `{execution_mode}` = "one-shot". → Step 3. - **One-shot** — trivial (~3 files). `{execution_mode}` = "one-shot". → Step 3.
- **Plan-code-review** — normal. → Step 2. - **Plan-code-review** — normal. → Step 2.
- **Full BMM** — too big. Recommend and exit. - **Full BMM** — too big. Recommend and exit.
- Ambiguous? Default plan-code-review. - Ambiguous? Default to plan-code-review.
--- ---

View File

@ -2,7 +2,8 @@
name: 'step-02-plan' name: 'step-02-plan'
description: 'Investigate, generate spec, present for approval' description: 'Investigate, generate spec, present for approval'
wipFile: '{implementation_artifacts}/tech-spec-wip.md' slug: kebab-cased strings are valid as a file name, based on the intent
wipFile: '{implementation_artifacts}/tech-spec-[slug].md'
templateFile: '{installed_path}/tech-spec-template.md' templateFile: '{installed_path}/tech-spec-template.md'
--- ---
@ -20,8 +21,9 @@ templateFile: '{installed_path}/tech-spec-template.md'
## INSTRUCTIONS ## INSTRUCTIONS
1. Investigate codebase. 1. Investigate codebase.
2. Generate spec from `{templateFile}` `{wipFile}`. 2. Generate spec from `{templateFile}` into `{wipFile}`.
3. Self-review against READY FOR DEVELOPMENT standard. 3. Self-review against READY FOR DEVELOPMENT standard.
4. If intent gaps exist, do not fantasize, do not leave open questions, ask the human.
### CHECKPOINT 1 ### CHECKPOINT 1

View File

@ -1,9 +1,6 @@
--- ---
name: 'step-03-implement' name: 'step-03-implement'
description: 'Branch, shard tasks, execute, commit. Local only.' description: 'Branch, shard tasks, execute, commit. Local only.'
tasksDir: '{implementation_artifacts}/tasks'
sequenceFile: '{implementation_artifacts}/tasks/sequence.md'
--- ---
# Step 3: Implement # Step 3: Implement
@ -20,14 +17,8 @@ sequenceFile: '{implementation_artifacts}/tasks/sequence.md'
## INSTRUCTIONS ## INSTRUCTIONS
1. Baseline commit. Branch. Assert clean tree. Implement the spec.
2. Shard spec tasks → `{tasksDir}/task-NN.md`. Track in `{sequenceFile}`. One-shot: implement the intent.
3. Execute sequentially: read task fresh → implement → verify AC → mark complete → next.
4. Self-check. Commit.
One-shot: skip sharding, work from mental plan.
Halt after 3 failures on same task, or blocking ambiguity.
--- ---

View File

@ -3,6 +3,7 @@ name: 'step-04-review'
description: 'Adversarial review, classify findings, optional spec loop' description: 'Adversarial review, classify findings, optional spec loop'
adversarial_review_task: '{project-root}/_bmad/core/tasks/review-adversarial-general.xml' adversarial_review_task: '{project-root}/_bmad/core/tasks/review-adversarial-general.xml'
deferred_findings_file: '{output_dir}/deferred-findings.md'
specLoopCap: 5 specLoopCap: 5
--- ---
@ -19,10 +20,16 @@ specLoopCap: 5
## INSTRUCTIONS ## INSTRUCTIONS
1. Diff from `{baseline_commit}`. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`. 1. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`.
2. Classify findings: intent > spec > patch > defer > reject. 2. deduplicate and classify their findings as one of (in cascading order):
3. Spec-class? Amend spec, re-derive, re-review. Max `{specLoopCap}` iterations. - intent_gap - finding caused by the change and wouldn't happen if intent was clear,
4. Auto-fix patches. Commit. - bad_spec - all other findings caused by the change, including direct deviations from spec; the spec had to be clear enough to prevent that
- patch - all other findings that are real and can be trivially fixed
- defer - all other findings that are real or uncertain
- reject - all other findings that are noise
3. have intent_gap findings? Do not fantasize, ask the user.
4. have bad_spec findings? See if any of them still stand after intent_gap findings are resolved. If yes, discard lower level findings, amend spec, re-implement, re-review. Max `{specLoopCap}` iterations.
4. Auto-fix patches. Write deferred findings to `{deferred_findings_file}`. Forget about rejected findings. Commit.
--- ---

View File

@ -16,11 +16,7 @@ description: 'Present findings, get approval, create PR'
## INSTRUCTIONS ## INSTRUCTIONS
Present classified findings. `[A] Approve [E] Edit [R] Reject`. HALT. Display summary of your work to the user. Advice on how to review the changes. Offer to make a plan for manual testing, or to push and/or create a pull request.
- **A**: Commit patches. Print push command. Wait. Create PR.
- **E**: Apply changes, re-present.
- **R**: Route back.
--- ---