refactor(workflows): simplify QD2 steps and enforce ask-dont-fantasize

Strip task sharding, rigid approval menu, and VC backfill. Add
intent_gap-first classification cascade to review. Use slug-based
spec naming in plan step.
This commit is contained in:
Alex Verkhovsky 2026-02-22 20:42:53 -07:00
parent f152121761
commit 3126b9c4ee
7 changed files with 759 additions and 26 deletions

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,51 @@
# QD2 Run: Remove ambiguous "with argument" from help task chaining
**Date:** 2026-02-22
**Workflow:** quick-dev2 (experimental)
**Branch:** exp/quick-flow-redesign
---
## Intent
User provided GitHub issue #1637: remove the `with argument` clause from 7 workflow completion steps that chain to `help.md`. The clause caused LLMs to interpret the file-read instruction as a skill/function invocation.
## Routing
- **Route chosen:** Plan-code-review (after correction)
- **Rationale:** Initially routed as one-shot. User challenged this ("One shot, really?") — the issue says "figure out the right approach," which requires a design decision, plus there are two sub-bugs to investigate (File #1 wrong argument, File #7 casing mismatch). Re-routed to plan-code-review.
## What Happened
**Step 1 (Clarify & Route):** Loaded issue #1637 via `gh issue view`. Captured intent correctly. Initially routed as one-shot — user corrected this. Re-routed to plan-code-review.
**Step 2 (Plan):** Investigated all 7 files, `help.md`, and the CSV catalog. Generated spec proposing to drop `with argument` entirely since the preceding "[Workflow] complete." text provides sufficient context for `help.md`'s inference. User questioned whether dropping it was truly consequence-free. Agent initially said "yes, really" with a confident explanation. User pushed back: "I don't really believe that. I'd like to see where in git history these were added." Agent traced the full commit history:
- `37346079` created `whats-after.md` with `Execute task ... with argument` as a deliberate input mechanism
- `ec8ab0c6` renamed to `bmad-help.md`, preserved `with argument`
- `def8da0a` changed `Execute``Read fully and follow:` but left `with argument` dangling
This revealed it was a legacy input channel from `whats-after.md`, not meaningless decoration. Agent corrected its earlier dismissive framing. User approved the spec.
**Step 3 (Implement):** Applied 7 identical single-line edits. Validated with grep — zero remaining instances.
**Step 4 (Review):** Ran adversarial code review in context-free subagent. 10 findings returned. Classified: 0 intent_gap, 0 bad_spec, 0 patch, 3 defer (pre-existing structural issues), 5 reject (out of scope or already documented).
**Step 5 (Present):** Summarized work, committed, user pushed, PR #1739 created against upstream main.
## Diff Produced
7 files, each with a single-line change removing ` with argument \`[workflow name]\`` from the help.md chaining line. Net: 7 insertions, 7 deletions.
## Human Notes
This flowed reasonably well. Took four minutes give or take and tens of thousands of tokens to complete an adversarial review run. Against what is a fairly simple, trivial change.
## Observations
- Agent initially over-confidently dismissed the user's skepticism about dropping the argument. The git history investigation was the user's idea, not the agent's. The agent should have investigated provenance before asserting "no consequences."
- Routing correction worked well — user challenged one-shot, agent accepted and re-routed without friction.
- The workflow plumbing (config loading, step transitions, checkpoint halts) was followed correctly through all 5 steps.
- This is the first QD2 run to complete the full plan-code-review path (previous runs were one-shots).
- The adversarial review found real pre-existing issues (duplicate next-steps sections, missing step numbers) that are worth separate tickets but correctly classified as out of scope for this change.
- The spec was approved on first presentation (no edit cycles at Checkpoint 1).
- No spec-loop iterations were needed (no bad_spec findings from review).

View File

@ -24,13 +24,12 @@ wipFile: '{implementation_artifacts}/tech-spec-wip.md'
## INSTRUCTIONS
1. Clarify intent until: problem unambiguous, scope clear, no contradictions, you can explain back what you'll do.
2. Backfill VC conventions to project-context if unknown.
3. Route:
1. Clarify intent. Do not fantasize, do not leave open questions Keep asking the human until clear enough to implement.
2. Route:
- **One-shot** — trivial (~3 files). `{execution_mode}` = "one-shot". → Step 3.
- **Plan-code-review** — normal. → Step 2.
- **Full BMM** — too big. Recommend and exit.
- Ambiguous? Default plan-code-review.
- Ambiguous? Default to plan-code-review.
---

View File

@ -2,7 +2,8 @@
name: 'step-02-plan'
description: 'Investigate, generate spec, present for approval'
wipFile: '{implementation_artifacts}/tech-spec-wip.md'
slug: kebab-cased strings are valid as a file name, based on the intent
wipFile: '{implementation_artifacts}/tech-spec-[slug].md'
templateFile: '{installed_path}/tech-spec-template.md'
---
@ -20,8 +21,9 @@ templateFile: '{installed_path}/tech-spec-template.md'
## INSTRUCTIONS
1. Investigate codebase.
2. Generate spec from `{templateFile}` `{wipFile}`.
2. Generate spec from `{templateFile}` into `{wipFile}`.
3. Self-review against READY FOR DEVELOPMENT standard.
4. If intent gaps exist, do not fantasize, do not leave open questions, ask the human.
### CHECKPOINT 1

View File

@ -1,9 +1,6 @@
---
name: 'step-03-implement'
description: 'Branch, shard tasks, execute, commit. Local only.'
tasksDir: '{implementation_artifacts}/tasks'
sequenceFile: '{implementation_artifacts}/tasks/sequence.md'
---
# Step 3: Implement
@ -20,14 +17,8 @@ sequenceFile: '{implementation_artifacts}/tasks/sequence.md'
## INSTRUCTIONS
1. Baseline commit. Branch. Assert clean tree.
2. Shard spec tasks → `{tasksDir}/task-NN.md`. Track in `{sequenceFile}`.
3. Execute sequentially: read task fresh → implement → verify AC → mark complete → next.
4. Self-check. Commit.
One-shot: skip sharding, work from mental plan.
Halt after 3 failures on same task, or blocking ambiguity.
Implement the spec.
One-shot: implement the intent.
---

View File

@ -3,6 +3,7 @@ name: 'step-04-review'
description: 'Adversarial review, classify findings, optional spec loop'
adversarial_review_task: '{project-root}/_bmad/core/tasks/review-adversarial-general.xml'
deferred_findings_file: '{output_dir}/deferred-findings.md'
specLoopCap: 5
---
@ -19,10 +20,16 @@ specLoopCap: 5
## INSTRUCTIONS
1. Diff from `{baseline_commit}`. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`.
2. Classify findings: intent > spec > patch > defer > reject.
3. Spec-class? Amend spec, re-derive, re-review. Max `{specLoopCap}` iterations.
4. Auto-fix patches. Commit.
1. Review in context-free subagents: intent audit (skip for one-shot) + adversarial code review via `{adversarial_review_task}`.
2. deduplicate and classify their findings as one of (in cascading order):
- intent_gap - finding caused by the change and wouldn't happen if intent was clear,
- bad_spec - all other findings caused by the change, including direct deviations from spec; the spec had to be clear enough to prevent that
- patch - all other findings that are real and can be trivially fixed
- defer - all other findings that are real or uncertain
- reject - all other findings that are noise
3. have intent_gap findings? Do not fantasize, ask the user.
4. have bad_spec findings? See if any of them still stand after intent_gap findings are resolved. If yes, discard lower level findings, amend spec, re-implement, re-review. Max `{specLoopCap}` iterations.
4. Auto-fix patches. Write deferred findings to `{deferred_findings_file}`. Forget about rejected findings. Commit.
---

View File

@ -16,11 +16,7 @@ description: 'Present findings, get approval, create PR'
## INSTRUCTIONS
Present classified findings. `[A] Approve [E] Edit [R] Reject`. HALT.
- **A**: Commit patches. Print push command. Wait. Create PR.
- **E**: Apply changes, re-present.
- **R**: Route back.
Display summary of your work to the user. Advice on how to review the changes. Offer to make a plan for manual testing, or to push and/or create a pull request.
---