refine workflow contracts for review findings, halt protocol, and sprint tracking

This commit is contained in:
Dicky Moore 2026-02-08 15:01:58 +00:00
parent e7d7bbc3ea
commit a1c054006a
6 changed files with 195 additions and 78 deletions

View File

@ -12,9 +12,10 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
<action>Initialize findings artifacts:
- Set {{review_findings}} = [] (in-memory array)
- Set {{review_findings_file}} = {reviewFindingsFile}
- Set {{review_findings_schema}} = "id,severity,type,summary,detail,file_line,proof,suggested_fix,reviewer,timestamp"
- Each finding record MUST contain:
id, severity, type, summary, detail, file_line, proof, suggested_fix, reviewer, timestamp
- `file_line` format MUST be `path/to/file:line`
- `file_line` is the required `file:line` locator field and MUST use `path/to/file:line` format
- `reviewer` value MUST be `senior-dev-review`
- `timestamp` MUST use system ISO datetime
</action>
@ -35,28 +36,67 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
1. Read the AC requirement
2. Search implementation files for evidence
3. Determine: IMPLEMENTED, PARTIAL, or MISSING using this algorithm:
- Parse each AC into explicit clauses (single requirement statements).
- Evaluate each clause independently, then derive overall AC status from clause outcomes.
- IMPLEMENTED:
- Direct code evidence exists for ALL AC clauses, and
- At least one corroborating test OR deterministic runtime verification exists, and
- Any docs/comments are supported by code/test evidence.
- EVERY clause has direct code evidence tied to the story execution path, and
- Evidence includes at least one strong corroborator for AC behavior (automated test, integration test, or reproducible runtime proof), and
- Weak evidence (docs/comments/README) is only supplemental.
- PARTIAL:
- Some AC clauses have direct implementation evidence but one or more clauses are missing OR only indirectly covered, or
- Evidence is helper/utility code not clearly wired to the story path, or
- Evidence is docs/comments only without strong corroboration.
- One or more clauses have direct evidence, but at least one clause lacks direct evidence, OR
- Coverage is only indirect (helper/generic utility not proven wired), OR
- Evidence is mostly weak and not corroborated by code/tests.
- MISSING:
- No credible code/test/docs evidence addresses the AC clauses.
4. Evidence-strength rules:
- Code + tests = strong evidence
- Code only = medium evidence
- Docs/comments/README only = weak evidence (cannot justify IMPLEMENTED alone)
- No clause has credible direct implementation evidence, and
- No test/runtime proof demonstrates AC behavior.
4. Evidence-type rules:
- Strong evidence: implementation code plus validating tests/runtime proof.
- Medium evidence: implementation code without validating tests.
- Weak evidence: comments, README/docs, design notes, screenshots, or unverifiable logs.
- Weak evidence alone cannot qualify an AC as IMPLEMENTED.
5. Indirect evidence rules:
- Generic helpers/utilities count as PARTIAL unless explicitly wired by call sites OR integration tests.
- Helper functions/utilities count as indirect until explicit call sites or integration coverage prove the AC path.
- Generic capability not wired to this story remains PARTIAL.
6. Severity mapping for AC gaps:
- MISSING critical-path AC → HIGH
- MISSING non-critical AC → MEDIUM
- PARTIAL critical-path AC → HIGH
- PARTIAL non-critical AC → MEDIUM
7. If AC is PARTIAL or MISSING, append a finding object to {{review_findings}}.
- MISSING + security/data-loss/compliance/core user flow risk -> HIGH.
- MISSING + non-core behavior or secondary UX/documentation requirement -> MEDIUM.
- PARTIAL + security/data-integrity/compliance risk -> HIGH.
- PARTIAL + degraded core behavior -> MEDIUM.
- PARTIAL + optional/non-critical behavior gap with safe fallback -> LOW.
7. Classification examples:
- IMPLEMENTED example: AC requires validation + error response, and code path plus passing test covers all clauses.
- PARTIAL example: helper exists and one clause passes, but integration path for another clause is unproven.
- MISSING example: AC text exists, but no matching code path or tests are found.
8. If AC is PARTIAL or MISSING, append a finding object to {{review_findings}} with status, severity, and clause-level proof.
</action>
<action>When creating findings from any action above, populate fields using this mapping:
- id:
- Git discrepancy: `GIT-DIFF-{{index}}`
- AC gap: `AC-{{ac_id}}-{{status}}-{{index}}`
- Task mismatch: `TASK-{{task_id}}-MISMATCH-{{index}}`
- Code-quality issue: `CQ-{{category}}-{{index}}`
- severity:
- Use explicit severity rule from the originating action block
- type:
- `story-sync` for git/story discrepancies
- `acceptance-criteria` for AC gaps
- `task-audit` for task completion mismatches
- `code-quality` for quality/security/performance/test issues
- summary:
- One-line, user-facing issue statement
- detail:
- Include violated expectation plus observed behavior
- file_line:
- `path/to/file:line` evidence anchor (use most relevant file and line)
- proof:
- Concrete evidence snippet (code, test output, or git command result)
- suggested_fix:
- Actionable implementation guidance
- reviewer:
- `senior-dev-review`
- timestamp:
- System ISO datetime at finding creation time
</action>
<!-- Task Completion Audit -->
@ -94,7 +134,8 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
<action>Persist findings contract for downstream step:
- Save {{review_findings}} as JSON array to {{review_findings_file}}
- Ensure JSON is valid and each finding includes all required fields
- Set {{findings_contract}} = "JSON array at {{review_findings_file}}"
- Set {{findings_contract}} = "JSON array at {{review_findings_file}} with schema {{review_findings_schema}}"
- Step 4 MUST load findings from {{review_findings_file}} and validate against {{review_findings_schema}} before presenting or resolving
</action>
<action>Example finding record (must match real records):

View File

@ -6,11 +6,20 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
---
<step n="4" goal="Present findings and fix them">
<action>Load structured findings from {reviewFindingsFile}</action>
<action>Resolve findings artifact input:
- Use {{review_findings_file}} from step 3 when present
- Otherwise fallback to {reviewFindingsFile}
- Set {{review_findings_schema}} = "id,severity,type,summary,detail,file_line,proof,suggested_fix,reviewer,timestamp" if not already set
</action>
<action>Load structured findings JSON array from {{review_findings_file}}</action>
<action>Validate findings schema for each entry:
id, severity, type, summary, detail, file_line, proof, suggested_fix, reviewer, timestamp
</action>
<action>If findings file missing or malformed: HALT with explicit error and return to step 3 generation</action>
<action>Validation contract:
- `file_line` is the required `file:line` locator in `path/to/file:line` format
- Reject non-array JSON, missing required keys, or invalid file_line formatting
- If findings file missing/unreadable/malformed: HALT with explicit error and return to step 3 generation
</action>
<action>Categorize findings: HIGH (must fix), MEDIUM (should fix), LOW (nice to fix)</action>
<action>Set {{fixed_count}} = 0</action>
<action>Set {{action_count}} = 0</action>

View File

@ -18,17 +18,24 @@ web_bundle: false
- `sprint_status` = `{implementation_artifacts}/sprint-status.yaml`
- `date` (system-generated)
- `installed_path` = `{project-root}/_bmad/bmm/workflows/4-implementation/correct-course`
- Note: `installed_path` targets the installed runtime tree under `_bmad/...`; source authoring files are in `src/bmm/workflows/4-implementation/correct-course/...`.
- `source_path` = `{project-root}/src/bmm/workflows/4-implementation/correct-course`
- Note: `installed_path` targets the installed runtime tree under `_bmad/...`; `source_path` is the repository authoring path.
- `default_output_file` = `{planning_artifacts}/sprint-change-proposal-{date}.md`
<workflow>
<critical>Communicate all responses in {communication_language} and generate all documents in {document_output_language}</critical>
<step n="1" goal="Analyze changes and propose corrective actions">
<action>Read and follow instructions at: {installed_path}/instructions.md</action>
<action>Resolve workflow content path:
- If `{installed_path}/instructions.md` exists and is readable, set {{workflow_path}} = `{installed_path}`
- Else if `{source_path}/instructions.md` exists and is readable, set {{workflow_path}} = `{source_path}`
- Else emit an error listing both paths and HALT
</action>
<action>Read and follow instructions at: {{workflow_path}}/instructions.md</action>
</step>
<step n="2" goal="Validate proposal quality">
<invoke-task>Validate against checklist at {installed_path}/checklist.md using {project-root}/_bmad/core/tasks/validate-workflow.md</invoke-task>
<action>If {{workflow_path}} is not set from step 1, repeat path resolution using checklist.md</action>
<invoke-task>Validate against checklist at {{workflow_path}}/checklist.md using {project-root}/_bmad/core/tasks/validate-workflow.md</invoke-task>
</step>
</workflow>

View File

@ -10,8 +10,14 @@ nextStepFile: './step-09-mark-review-ready.md'
<action>Initialize review-tracking variables before checks:
- If {{resolved_review_items}} is undefined: set {{resolved_review_items}} = []
- If {{unresolved_review_items}} is undefined: set {{unresolved_review_items}} = []
- Set {{review_continuation}} by checking current task title/original task list for prefix "[AI-Review]"
- Set {{date}} from system-generated timestamp in project date format
- Set {{review_continuation}} = false
- If current {{task_title}} starts with "[AI-Review]", set {{review_continuation}} = true
- Else scan {{original_task_list}}; if any item starts with "[AI-Review]", set {{review_continuation}} = true
- Set {{date}} from system-generated timestamp formatted for project change log entries
- Set {{resolved_count}} = length({{resolved_review_items}})
- Set {{review_match_threshold}} = 0.60
- Define normalize(text): lowercase, trim, remove "[AI-Review]" prefix and punctuation, collapse whitespace
- Define token_set(text): unique whitespace-separated normalized tokens
</action>
<!-- VALIDATION GATES -->
@ -23,28 +29,58 @@ nextStepFile: './step-09-mark-review-ready.md'
<!-- REVIEW FOLLOW-UP HANDLING -->
<check if="task is review follow-up (has [AI-Review] prefix)">
<action>Extract review item details (severity, description, related AC/file)</action>
<action>Add current review task to resolution tracking list: append structured entry to {{resolved_review_items}}</action>
<action>Load all items from "Senior Developer Review (AI) → Action Items" as candidate list {{review_action_items}}</action>
<action>Set {{task_text_norm}} = normalize(current review follow-up task description)</action>
<action>Initialize {{best_match}} = null, {{best_score}} = 0, {{best_shared_tokens}} = 0, {{tie_candidates}} = []</action>
<action>For each candidate action item:
1. Set {{candidate_text_norm}} = normalize(candidate text)
2. If {{task_text_norm}} == {{candidate_text_norm}} OR either contains the other:
- set {{candidate_score}} = 1.0 and mark as strong match
3. Else:
- compute Jaccard score = |token_set(task) ∩ token_set(candidate)| / |token_set(task) token_set(candidate)|
- set {{candidate_score}} to computed score
4. Track shared-token count for tie-breaking
5. Keep highest score candidate; if same score, keep candidate with more shared tokens
6. If score and shared-token count both tie, add candidate to {{tie_candidates}}
</action>
<action>Set {{match_found}} = true only if {{best_score}} >= {{review_match_threshold}}</action>
<!-- Mark task in Review Follow-ups section -->
<!-- Mark task in Review Follow-ups section (always, regardless of action-item match result) -->
<action>Mark task checkbox [x] in "Tasks/Subtasks → Review Follow-ups (AI)" section</action>
<!-- CRITICAL: Also mark corresponding action item in review section -->
<action>Find matching action item in "Senior Developer Review (AI) → Action Items" using fuzzy matching:
1. Normalize strings (lowercase, trim, remove "[AI-Review]" prefix/punctuation)
2. Try exact and substring matches first
3. If none, compute token-overlap/Jaccard score per candidate
4. Select highest-scoring candidate when score >= 0.60
5. If tie at best score, prefer the candidate with more shared tokens; log ambiguity
</action>
<check if="matching action item found">
<action>Mark that action item checkbox [x] as resolved</action>
</check>
<check if="no candidate meets threshold">
<action>Log warning and append task to {{unresolved_review_items}}</action>
<action>Add resolution note in Dev Agent Record that no corresponding action item was found</action>
<check if="{{match_found}} == true">
<action>Mark matched action item checkbox [x] in "Senior Developer Review (AI) → Action Items"</action>
<action>Append structured entry to {{resolved_review_items}}:
- task: current review follow-up task
- matched_action_item: {{best_match}}
- match_score: {{best_score}}
- resolved_at: {{date}}
- status: "matched"
</action>
<check if="{{tie_candidates}} is not empty">
<action>Log ambiguity warning with tied candidates and selected best_match</action>
</check>
<action>Add to Dev Agent Record → Completion Notes: "✅ Resolved review finding [{{severity}}]: {{description}} (matched action item, score {{best_score}})"</action>
</check>
<action>Add to Dev Agent Record → Completion Notes: "✅ Resolved review finding [{{severity}}]: {{description}}"</action>
<check if="{{match_found}} == false">
<action>Log warning: no candidate met threshold {{review_match_threshold}} for task "{{task_text_norm}}"</action>
<action>Append structured entry to {{resolved_review_items}}:
- task: current review follow-up task
- matched_action_item: null
- match_score: {{best_score}}
- resolved_at: {{date}}
- status: "unmatched"
</action>
<action>Append structured entry to {{unresolved_review_items}}:
- task: current review follow-up task
- reason: "No corresponding action item met fuzzy-match threshold"
- best_candidate: {{best_match}}
- best_score: {{best_score}}
- recorded_at: {{date}}
</action>
<action>Add resolution note in Dev Agent Record that no corresponding action item was found, while follow-up checkbox was still marked complete</action>
</check>
</check>
<!-- ONLY MARK COMPLETE IF ALL VALIDATION PASS -->
@ -56,7 +92,12 @@ nextStepFile: './step-09-mark-review-ready.md'
<check if="ANY validation fails">
<action>DO NOT mark task complete - fix issues first</action>
<action>HALT if unable to fix validation failures</action>
<action>If unable to fix validation failures, invoke HALT protocol from dev-story/workflow.md with:
- reason_code: DEV-STORY-STEP-08-VALIDATION-FAIL
- step_id: step-08-mark-task-complete
- message: "Task completion validation failed and remediation was unsuccessful."
- required_action: "Fix failing validations/tests, then resume."
</action>
</check>
<check if="review_continuation == true and {{resolved_review_items}} is not empty">

View File

@ -10,9 +10,14 @@ nextStepFile: './step-10-closeout.md'
<action>Confirm File List includes every changed file</action>
<action>Execute enhanced definition-of-done validation</action>
<action>Update the story Status to: "review"</action>
<action>Initialize sprint tracking state:
- If {sprint_status} exists and is readable, load file and set {{current_sprint_status}} from tracking mode/content
- If file does not exist, unreadable, or indicates no sprint tracking, set {{current_sprint_status}} = "no-sprint-tracking"
<action>Initialize sprint tracking state deterministically before any sprint-status check:
- Set {{current_sprint_status}} = "no-sprint-tracking"
- Set {{sprint_tracking_enabled}} = false
- If {sprint_status} exists and is readable:
- Load the FULL file: {sprint_status}
- If file content indicates tracking disabled OR development_status section is missing, keep "no-sprint-tracking"
- Else set {{current_sprint_status}} = "enabled" and {{sprint_tracking_enabled}} = true
- If file missing/unreadable, keep defaults and continue with story-only status update
</action>
<!-- Enhanced Definition of Done Validation -->
@ -31,31 +36,31 @@ nextStepFile: './step-10-closeout.md'
</action>
<!-- Mark story ready for review - sprint status conditional -->
<check if="{sprint_status} file exists AND {{current_sprint_status}} != 'no-sprint-tracking'">
<action>Load the FULL file: {sprint_status}</action>
<check if="{{sprint_tracking_enabled}} == true">
<action>Find development_status key matching {{story_key}}</action>
<action>Verify current status is "in-progress" (expected previous state)</action>
<action>Update development_status[{{story_key}}] = "review"</action>
<action>Save file, preserving ALL comments and structure including STATUS DEFINITIONS</action>
<output>✅ Story status updated to "review" in sprint-status.yaml</output>
<check if="story key found in sprint status">
<action>Verify current status is "in-progress" (expected previous state)</action>
<action>Update development_status[{{story_key}}] = "review"</action>
<action>Save file, preserving ALL comments and structure including STATUS DEFINITIONS</action>
<output>✅ Story status updated to "review" in sprint-status.yaml</output>
</check>
<check if="story key not found in sprint status">
<output>⚠️ Story file updated, but sprint-status update failed: {{story_key}} not found
Story status is set to "review" in file, but sprint-status.yaml may be out of sync.
</output>
</check>
</check>
<check if="{sprint_status} file does NOT exist OR {{current_sprint_status}} == 'no-sprint-tracking'">
<check if="{{sprint_tracking_enabled}} == false">
<output> Story status updated to "review" in story file (no sprint tracking configured)</output>
</check>
<check if="story key not found in sprint status">
<output>⚠️ Story file updated, but sprint-status update failed: {{story_key}} not found
Story status is set to "review" in file, but sprint-status.yaml may be out of sync.
</output>
</check>
<!-- Final validation gates -->
<action if="any task is incomplete">HALT - Complete remaining tasks before marking ready for review</action>
<action if="regression failures exist">HALT - Fix regression issues before completing</action>
<action if="File List is incomplete">HALT - Update File List with all changed files</action>
<action if="definition-of-done validation fails">HALT - Address DoD failures before completing</action>
<action if="any task is incomplete">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-INCOMPLETE-TASKS, step_id: step-09-mark-review-ready, message: "Incomplete tasks remain before review-ready transition.", required_action: "Complete all tasks/subtasks and rerun validations.")</action>
<action if="regression failures exist">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-REGRESSION-FAIL, step_id: step-09-mark-review-ready, message: "Regression suite has failures.", required_action: "Fix failing tests and rerun full regression suite.")</action>
<action if="File List is incomplete">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-FILE-LIST-INCOMPLETE, step_id: step-09-mark-review-ready, message: "File List does not include all changed files.", required_action: "Update File List with all added/modified/deleted paths.")</action>
<action if="definition-of-done validation fails">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-DOD-FAIL, step_id: step-09-mark-review-ready, message: "Definition-of-done checks failed.", required_action: "Address DoD failures and rerun validation.")</action>
</step>
## Next

View File

@ -45,22 +45,36 @@ Implement a ready story end-to-end with strict validation gates, accurate progre
- Change Log
- Status
- Execute steps in order and do not skip validation gates.
- Continue until the story is complete unless a defined HALT condition triggers.
- Continue until the story is complete unless the HALT protocol below is triggered.
## HALT Definition
- HALT triggers:
- Required inputs/files are missing or unreadable.
- Validation gates fail and cannot be remediated in current step.
- Test/regression failures persist after fix attempts.
- Story state becomes inconsistent (e.g., malformed task structure preventing safe updates).
## HALT Protocol (Normative)
- Scope:
- Every `HALT` instruction in this workflow and all `steps/*.md` files MUST use this protocol.
- Operational definition:
- HALT is a deterministic hard-stop event raised when execution cannot safely continue.
- A HALT event MUST include:
- `reason_code` (stable machine-readable code)
- `step_id` (current step file + step number)
- `message` (human-readable failure summary)
- `required_action` (what user/operator must do before resume)
- Trigger criteria:
- Required inputs/files are missing, unreadable, or malformed.
- Validation gates fail and cannot be remediated in the current step.
- Test/regression failures persist after attempted fixes.
- Story state is inconsistent (for example malformed task structure preventing safe updates).
- HALT behavior:
- Stop executing further steps immediately.
- Persist current story-file edits and workflow state safely.
- Emit explicit user-facing error message describing trigger and remediation needed.
- Do not apply partial completion marks after HALT.
- Stop execution immediately and skip all downstream steps.
- Persist workflow checkpoint: current step id, resolved variables, and pending task context.
- Persist only already-applied safe edits; do not apply new partial completion marks after HALT.
- Emit logger event exactly in this format:
- `HALT[{reason_code}] step={step_id} story={story_key|unknown} detail=\"{message}\"`
- Emit user-facing prompt exactly in this format:
- `Workflow HALTED at {step_id} ({reason_code}): {message}. Required action: {required_action}. Reply RESUME after remediation.`
- Resume semantics:
- Manual resume only after user confirms the blocking issue is resolved.
- Resume from the last incomplete step checkpoint, re-running validations before progressing.
- Manual resume only (no automatic retry loop).
- Resume is checkpoint-based: restart from the halted step after user confirms remediation.
- Re-run the failed validation/input check before executing further actions.
- If the same HALT condition repeats, stop again with updated evidence.
## Execution
Read fully and follow: `steps/step-01-find-story.md`.