refine workflow contracts for review findings, halt protocol, and sprint tracking

2026-02-08 15:01:58 +00:00 · 2026-02-08 15:01:58 +00:00 · a1c054006a
parent e7d7bbc3ea
commit a1c054006a
6 changed files with 195 additions and 78 deletions
--- a/src/bmm/workflows/4-implementation/code-review/steps/step-03-execute-review.md
+++ b/src/bmm/workflows/4-implementation/code-review/steps/step-03-execute-review.md
@ -12,9 +12,10 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
    <action>Initialize findings artifacts:
      - Set {{review_findings}} = [] (in-memory array)
      - Set {{review_findings_file}} = {reviewFindingsFile}
+      - Set {{review_findings_schema}} = "id,severity,type,summary,detail,file_line,proof,suggested_fix,reviewer,timestamp"
      - Each finding record MUST contain:
        id, severity, type, summary, detail, file_line, proof, suggested_fix, reviewer, timestamp
-      - `file_line` format MUST be `path/to/file:line`
+      - `file_line` is the required `file:line` locator field and MUST use `path/to/file:line` format
      - `reviewer` value MUST be `senior-dev-review`
      - `timestamp` MUST use system ISO datetime
    </action>
@ -35,28 +36,67 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
      1. Read the AC requirement
      2. Search implementation files for evidence
      3. Determine: IMPLEMENTED, PARTIAL, or MISSING using this algorithm:
+        - Parse each AC into explicit clauses (single requirement statements).
+        - Evaluate each clause independently, then derive overall AC status from clause outcomes.
        - IMPLEMENTED:
-          - Direct code evidence exists for ALL AC clauses, and
-          - At least one corroborating test OR deterministic runtime verification exists, and
-          - Any docs/comments are supported by code/test evidence.
+          - EVERY clause has direct code evidence tied to the story execution path, and
+          - Evidence includes at least one strong corroborator for AC behavior (automated test, integration test, or reproducible runtime proof), and
+          - Weak evidence (docs/comments/README) is only supplemental.
        - PARTIAL:
-          - Some AC clauses have direct implementation evidence but one or more clauses are missing OR only indirectly covered, or
-          - Evidence is helper/utility code not clearly wired to the story path, or
-          - Evidence is docs/comments only without strong corroboration.
+          - One or more clauses have direct evidence, but at least one clause lacks direct evidence, OR
+          - Coverage is only indirect (helper/generic utility not proven wired), OR
+          - Evidence is mostly weak and not corroborated by code/tests.
        - MISSING:
-          - No credible code/test/docs evidence addresses the AC clauses.
-      4. Evidence-strength rules:
-        - Code + tests = strong evidence
-        - Code only = medium evidence
-        - Docs/comments/README only = weak evidence (cannot justify IMPLEMENTED alone)
+          - No clause has credible direct implementation evidence, and
+          - No test/runtime proof demonstrates AC behavior.
+      4. Evidence-type rules:
+        - Strong evidence: implementation code plus validating tests/runtime proof.
+        - Medium evidence: implementation code without validating tests.
+        - Weak evidence: comments, README/docs, design notes, screenshots, or unverifiable logs.
+        - Weak evidence alone cannot qualify an AC as IMPLEMENTED.
      5. Indirect evidence rules:
-        - Generic helpers/utilities count as PARTIAL unless explicitly wired by call sites OR integration tests.
+        - Helper functions/utilities count as indirect until explicit call sites or integration coverage prove the AC path.
+        - Generic capability not wired to this story remains PARTIAL.
      6. Severity mapping for AC gaps:
-        - MISSING critical-path AC → HIGH
-        - MISSING non-critical AC → MEDIUM
-        - PARTIAL critical-path AC → HIGH
-        - PARTIAL non-critical AC → MEDIUM
-      7. If AC is PARTIAL or MISSING, append a finding object to {{review_findings}}.
+        - MISSING + security/data-loss/compliance/core user flow risk -> HIGH.
+        - MISSING + non-core behavior or secondary UX/documentation requirement -> MEDIUM.
+        - PARTIAL + security/data-integrity/compliance risk -> HIGH.
+        - PARTIAL + degraded core behavior -> MEDIUM.
+        - PARTIAL + optional/non-critical behavior gap with safe fallback -> LOW.
+      7. Classification examples:
+        - IMPLEMENTED example: AC requires validation + error response, and code path plus passing test covers all clauses.
+        - PARTIAL example: helper exists and one clause passes, but integration path for another clause is unproven.
+        - MISSING example: AC text exists, but no matching code path or tests are found.
+      8. If AC is PARTIAL or MISSING, append a finding object to {{review_findings}} with status, severity, and clause-level proof.
+    </action>
+
+    <action>When creating findings from any action above, populate fields using this mapping:
+      - id:
+        - Git discrepancy: `GIT-DIFF-{{index}}`
+        - AC gap: `AC-{{ac_id}}-{{status}}-{{index}}`
+        - Task mismatch: `TASK-{{task_id}}-MISMATCH-{{index}}`
+        - Code-quality issue: `CQ-{{category}}-{{index}}`
+      - severity:
+        - Use explicit severity rule from the originating action block
+      - type:
+        - `story-sync` for git/story discrepancies
+        - `acceptance-criteria` for AC gaps
+        - `task-audit` for task completion mismatches
+        - `code-quality` for quality/security/performance/test issues
+      - summary:
+        - One-line, user-facing issue statement
+      - detail:
+        - Include violated expectation plus observed behavior
+      - file_line:
+        - `path/to/file:line` evidence anchor (use most relevant file and line)
+      - proof:
+        - Concrete evidence snippet (code, test output, or git command result)
+      - suggested_fix:
+        - Actionable implementation guidance
+      - reviewer:
+        - `senior-dev-review`
+      - timestamp:
+        - System ISO datetime at finding creation time
    </action>

    <!-- Task Completion Audit -->
@ -94,7 +134,8 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
    <action>Persist findings contract for downstream step:
      - Save {{review_findings}} as JSON array to {{review_findings_file}}
      - Ensure JSON is valid and each finding includes all required fields
-      - Set {{findings_contract}} = "JSON array at {{review_findings_file}}"
+      - Set {{findings_contract}} = "JSON array at {{review_findings_file}} with schema {{review_findings_schema}}"
+      - Step 4 MUST load findings from {{review_findings_file}} and validate against {{review_findings_schema}} before presenting or resolving
    </action>

    <action>Example finding record (must match real records):
--- a/src/bmm/workflows/4-implementation/code-review/steps/step-04-present-and-resolve.md
+++ b/src/bmm/workflows/4-implementation/code-review/steps/step-04-present-and-resolve.md
@ -6,11 +6,20 @@ reviewFindingsFile: '{story_dir}/review-findings.json'
 ---

  <step n="4" goal="Present findings and fix them">
-    <action>Load structured findings from {reviewFindingsFile}</action>
+    <action>Resolve findings artifact input:
+      - Use {{review_findings_file}} from step 3 when present
+      - Otherwise fallback to {reviewFindingsFile}
+      - Set {{review_findings_schema}} = "id,severity,type,summary,detail,file_line,proof,suggested_fix,reviewer,timestamp" if not already set
+    </action>
+    <action>Load structured findings JSON array from {{review_findings_file}}</action>
    <action>Validate findings schema for each entry:
      id, severity, type, summary, detail, file_line, proof, suggested_fix, reviewer, timestamp
    </action>
-    <action>If findings file missing or malformed: HALT with explicit error and return to step 3 generation</action>
+    <action>Validation contract:
+      - `file_line` is the required `file:line` locator in `path/to/file:line` format
+      - Reject non-array JSON, missing required keys, or invalid file_line formatting
+      - If findings file missing/unreadable/malformed: HALT with explicit error and return to step 3 generation
+    </action>
    <action>Categorize findings: HIGH (must fix), MEDIUM (should fix), LOW (nice to fix)</action>
    <action>Set {{fixed_count}} = 0</action>
    <action>Set {{action_count}} = 0</action>
--- a/src/bmm/workflows/4-implementation/correct-course/workflow.md
+++ b/src/bmm/workflows/4-implementation/correct-course/workflow.md
@ -18,17 +18,24 @@ web_bundle: false
  - `sprint_status` = `{implementation_artifacts}/sprint-status.yaml`
  - `date` (system-generated)
  - `installed_path` = `{project-root}/_bmad/bmm/workflows/4-implementation/correct-course`
-  - Note: `installed_path` targets the installed runtime tree under `_bmad/...`; source authoring files are in `src/bmm/workflows/4-implementation/correct-course/...`.
+  - `source_path` = `{project-root}/src/bmm/workflows/4-implementation/correct-course`
+  - Note: `installed_path` targets the installed runtime tree under `_bmad/...`; `source_path` is the repository authoring path.
  - `default_output_file` = `{planning_artifacts}/sprint-change-proposal-{date}.md`

 <workflow>
  <critical>Communicate all responses in {communication_language} and generate all documents in {document_output_language}</critical>

  <step n="1" goal="Analyze changes and propose corrective actions">
-    <action>Read and follow instructions at: {installed_path}/instructions.md</action>
+    <action>Resolve workflow content path:
+      - If `{installed_path}/instructions.md` exists and is readable, set {{workflow_path}} = `{installed_path}`
+      - Else if `{source_path}/instructions.md` exists and is readable, set {{workflow_path}} = `{source_path}`
+      - Else emit an error listing both paths and HALT
+    </action>
+    <action>Read and follow instructions at: {{workflow_path}}/instructions.md</action>
  </step>

  <step n="2" goal="Validate proposal quality">
-    <invoke-task>Validate against checklist at {installed_path}/checklist.md using {project-root}/_bmad/core/tasks/validate-workflow.md</invoke-task>
+    <action>If {{workflow_path}} is not set from step 1, repeat path resolution using checklist.md</action>
+    <invoke-task>Validate against checklist at {{workflow_path}}/checklist.md using {project-root}/_bmad/core/tasks/validate-workflow.md</invoke-task>
  </step>
 </workflow>
--- a/src/bmm/workflows/4-implementation/dev-story/steps/step-08-mark-task-complete.md
+++ b/src/bmm/workflows/4-implementation/dev-story/steps/step-08-mark-task-complete.md
@ -10,8 +10,14 @@ nextStepFile: './step-09-mark-review-ready.md'
    <action>Initialize review-tracking variables before checks:
      - If {{resolved_review_items}} is undefined: set {{resolved_review_items}} = []
      - If {{unresolved_review_items}} is undefined: set {{unresolved_review_items}} = []
-      - Set {{review_continuation}} by checking current task title/original task list for prefix "[AI-Review]"
-      - Set {{date}} from system-generated timestamp in project date format
+      - Set {{review_continuation}} = false
+      - If current {{task_title}} starts with "[AI-Review]", set {{review_continuation}} = true
+      - Else scan {{original_task_list}}; if any item starts with "[AI-Review]", set {{review_continuation}} = true
+      - Set {{date}} from system-generated timestamp formatted for project change log entries
+      - Set {{resolved_count}} = length({{resolved_review_items}})
+      - Set {{review_match_threshold}} = 0.60
+      - Define normalize(text): lowercase, trim, remove "[AI-Review]" prefix and punctuation, collapse whitespace
+      - Define token_set(text): unique whitespace-separated normalized tokens
    </action>

    <!-- VALIDATION GATES -->
@ -23,28 +29,58 @@ nextStepFile: './step-09-mark-review-ready.md'
    <!-- REVIEW FOLLOW-UP HANDLING -->
    <check if="task is review follow-up (has [AI-Review] prefix)">
      <action>Extract review item details (severity, description, related AC/file)</action>
-      <action>Add current review task to resolution tracking list: append structured entry to {{resolved_review_items}}</action>
+      <action>Load all items from "Senior Developer Review (AI) → Action Items" as candidate list {{review_action_items}}</action>
+      <action>Set {{task_text_norm}} = normalize(current review follow-up task description)</action>
+      <action>Initialize {{best_match}} = null, {{best_score}} = 0, {{best_shared_tokens}} = 0, {{tie_candidates}} = []</action>
+      <action>For each candidate action item:
+        1. Set {{candidate_text_norm}} = normalize(candidate text)
+        2. If {{task_text_norm}} == {{candidate_text_norm}} OR either contains the other:
+           - set {{candidate_score}} = 1.0 and mark as strong match
+        3. Else:
+           - compute Jaccard score = |token_set(task) ∩ token_set(candidate)| / |token_set(task) ∪ token_set(candidate)|
+           - set {{candidate_score}} to computed score
+        4. Track shared-token count for tie-breaking
+        5. Keep highest score candidate; if same score, keep candidate with more shared tokens
+        6. If score and shared-token count both tie, add candidate to {{tie_candidates}}
+      </action>
+      <action>Set {{match_found}} = true only if {{best_score}} >= {{review_match_threshold}}</action>

-      <!-- Mark task in Review Follow-ups section -->
+      <!-- Mark task in Review Follow-ups section (always, regardless of action-item match result) -->
      <action>Mark task checkbox [x] in "Tasks/Subtasks → Review Follow-ups (AI)" section</action>

-      <!-- CRITICAL: Also mark corresponding action item in review section -->
-      <action>Find matching action item in "Senior Developer Review (AI) → Action Items" using fuzzy matching:
-        1. Normalize strings (lowercase, trim, remove "[AI-Review]" prefix/punctuation)
-        2. Try exact and substring matches first
-        3. If none, compute token-overlap/Jaccard score per candidate
-        4. Select highest-scoring candidate when score >= 0.60
-        5. If tie at best score, prefer the candidate with more shared tokens; log ambiguity
-      </action>
-      <check if="matching action item found">
-        <action>Mark that action item checkbox [x] as resolved</action>
-      </check>
-      <check if="no candidate meets threshold">
-        <action>Log warning and append task to {{unresolved_review_items}}</action>
-        <action>Add resolution note in Dev Agent Record that no corresponding action item was found</action>
+      <check if="{{match_found}} == true">
+        <action>Mark matched action item checkbox [x] in "Senior Developer Review (AI) → Action Items"</action>
+        <action>Append structured entry to {{resolved_review_items}}:
+          - task: current review follow-up task
+          - matched_action_item: {{best_match}}
+          - match_score: {{best_score}}
+          - resolved_at: {{date}}
+          - status: "matched"
+        </action>
+        <check if="{{tie_candidates}} is not empty">
+          <action>Log ambiguity warning with tied candidates and selected best_match</action>
+        </check>
+        <action>Add to Dev Agent Record → Completion Notes: "✅ Resolved review finding [{{severity}}]: {{description}} (matched action item, score {{best_score}})"</action>
      </check>

-      <action>Add to Dev Agent Record → Completion Notes: "✅ Resolved review finding [{{severity}}]: {{description}}"</action>
+      <check if="{{match_found}} == false">
+        <action>Log warning: no candidate met threshold {{review_match_threshold}} for task "{{task_text_norm}}"</action>
+        <action>Append structured entry to {{resolved_review_items}}:
+          - task: current review follow-up task
+          - matched_action_item: null
+          - match_score: {{best_score}}
+          - resolved_at: {{date}}
+          - status: "unmatched"
+        </action>
+        <action>Append structured entry to {{unresolved_review_items}}:
+          - task: current review follow-up task
+          - reason: "No corresponding action item met fuzzy-match threshold"
+          - best_candidate: {{best_match}}
+          - best_score: {{best_score}}
+          - recorded_at: {{date}}
+        </action>
+        <action>Add resolution note in Dev Agent Record that no corresponding action item was found, while follow-up checkbox was still marked complete</action>
+      </check>
    </check>

    <!-- ONLY MARK COMPLETE IF ALL VALIDATION PASS -->
@ -56,7 +92,12 @@ nextStepFile: './step-09-mark-review-ready.md'

    <check if="ANY validation fails">
      <action>DO NOT mark task complete - fix issues first</action>
-      <action>HALT if unable to fix validation failures</action>
+      <action>If unable to fix validation failures, invoke HALT protocol from dev-story/workflow.md with:
+        - reason_code: DEV-STORY-STEP-08-VALIDATION-FAIL
+        - step_id: step-08-mark-task-complete
+        - message: "Task completion validation failed and remediation was unsuccessful."
+        - required_action: "Fix failing validations/tests, then resume."
+      </action>
    </check>

    <check if="review_continuation == true and {{resolved_review_items}} is not empty">
--- a/src/bmm/workflows/4-implementation/dev-story/steps/step-09-mark-review-ready.md
+++ b/src/bmm/workflows/4-implementation/dev-story/steps/step-09-mark-review-ready.md
@ -10,9 +10,14 @@ nextStepFile: './step-10-closeout.md'
    <action>Confirm File List includes every changed file</action>
    <action>Execute enhanced definition-of-done validation</action>
    <action>Update the story Status to: "review"</action>
-    <action>Initialize sprint tracking state:
-      - If {sprint_status} exists and is readable, load file and set {{current_sprint_status}} from tracking mode/content
-      - If file does not exist, unreadable, or indicates no sprint tracking, set {{current_sprint_status}} = "no-sprint-tracking"
+    <action>Initialize sprint tracking state deterministically before any sprint-status check:
+      - Set {{current_sprint_status}} = "no-sprint-tracking"
+      - Set {{sprint_tracking_enabled}} = false
+      - If {sprint_status} exists and is readable:
+        - Load the FULL file: {sprint_status}
+        - If file content indicates tracking disabled OR development_status section is missing, keep "no-sprint-tracking"
+        - Else set {{current_sprint_status}} = "enabled" and {{sprint_tracking_enabled}} = true
+      - If file missing/unreadable, keep defaults and continue with story-only status update
    </action>

    <!-- Enhanced Definition of Done Validation -->
@ -31,31 +36,31 @@ nextStepFile: './step-10-closeout.md'
    </action>

    <!-- Mark story ready for review - sprint status conditional -->
-    <check if="{sprint_status} file exists AND {{current_sprint_status}} != 'no-sprint-tracking'">
-      <action>Load the FULL file: {sprint_status}</action>
+    <check if="{{sprint_tracking_enabled}} == true">
      <action>Find development_status key matching {{story_key}}</action>
-      <action>Verify current status is "in-progress" (expected previous state)</action>
-      <action>Update development_status[{{story_key}}] = "review"</action>
-      <action>Save file, preserving ALL comments and structure including STATUS DEFINITIONS</action>
-      <output>✅ Story status updated to "review" in sprint-status.yaml</output>
+      <check if="story key found in sprint status">
+        <action>Verify current status is "in-progress" (expected previous state)</action>
+        <action>Update development_status[{{story_key}}] = "review"</action>
+        <action>Save file, preserving ALL comments and structure including STATUS DEFINITIONS</action>
+        <output>✅ Story status updated to "review" in sprint-status.yaml</output>
+      </check>
+      <check if="story key not found in sprint status">
+        <output>⚠️ Story file updated, but sprint-status update failed: {{story_key}} not found
+
+          Story status is set to "review" in file, but sprint-status.yaml may be out of sync.
+        </output>
+      </check>
    </check>

-    <check if="{sprint_status} file does NOT exist OR {{current_sprint_status}} == 'no-sprint-tracking'">
+    <check if="{{sprint_tracking_enabled}} == false">
      <output>ℹ️ Story status updated to "review" in story file (no sprint tracking configured)</output>
    </check>

-    <check if="story key not found in sprint status">
-      <output>⚠️ Story file updated, but sprint-status update failed: {{story_key}} not found
-
-        Story status is set to "review" in file, but sprint-status.yaml may be out of sync.
-      </output>
-    </check>
-
    <!-- Final validation gates -->
-    <action if="any task is incomplete">HALT - Complete remaining tasks before marking ready for review</action>
-    <action if="regression failures exist">HALT - Fix regression issues before completing</action>
-    <action if="File List is incomplete">HALT - Update File List with all changed files</action>
-    <action if="definition-of-done validation fails">HALT - Address DoD failures before completing</action>
+    <action if="any task is incomplete">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-INCOMPLETE-TASKS, step_id: step-09-mark-review-ready, message: "Incomplete tasks remain before review-ready transition.", required_action: "Complete all tasks/subtasks and rerun validations.")</action>
+    <action if="regression failures exist">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-REGRESSION-FAIL, step_id: step-09-mark-review-ready, message: "Regression suite has failures.", required_action: "Fix failing tests and rerun full regression suite.")</action>
+    <action if="File List is incomplete">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-FILE-LIST-INCOMPLETE, step_id: step-09-mark-review-ready, message: "File List does not include all changed files.", required_action: "Update File List with all added/modified/deleted paths.")</action>
+    <action if="definition-of-done validation fails">Invoke HALT protocol (reason_code: DEV-STORY-STEP-09-DOD-FAIL, step_id: step-09-mark-review-ready, message: "Definition-of-done checks failed.", required_action: "Address DoD failures and rerun validation.")</action>
  </step>

 ## Next
--- a/src/bmm/workflows/4-implementation/dev-story/workflow.md
+++ b/src/bmm/workflows/4-implementation/dev-story/workflow.md
@ -45,22 +45,36 @@ Implement a ready story end-to-end with strict validation gates, accurate progre
  - Change Log
  - Status
 - Execute steps in order and do not skip validation gates.
- Continue until the story is complete unless a defined HALT condition triggers.
+- Continue until the story is complete unless the HALT protocol below is triggered.

-## HALT Definition
- HALT triggers:
-  - Required inputs/files are missing or unreadable.
-  - Validation gates fail and cannot be remediated in current step.
-  - Test/regression failures persist after fix attempts.
-  - Story state becomes inconsistent (e.g., malformed task structure preventing safe updates).
+## HALT Protocol (Normative)
+- Scope:
+  - Every `HALT` instruction in this workflow and all `steps/*.md` files MUST use this protocol.
+- Operational definition:
+  - HALT is a deterministic hard-stop event raised when execution cannot safely continue.
+  - A HALT event MUST include:
+    - `reason_code` (stable machine-readable code)
+    - `step_id` (current step file + step number)
+    - `message` (human-readable failure summary)
+    - `required_action` (what user/operator must do before resume)
+- Trigger criteria:
+  - Required inputs/files are missing, unreadable, or malformed.
+  - Validation gates fail and cannot be remediated in the current step.
+  - Test/regression failures persist after attempted fixes.
+  - Story state is inconsistent (for example malformed task structure preventing safe updates).
 - HALT behavior:
-  - Stop executing further steps immediately.
-  - Persist current story-file edits and workflow state safely.
-  - Emit explicit user-facing error message describing trigger and remediation needed.
-  - Do not apply partial completion marks after HALT.
+  - Stop execution immediately and skip all downstream steps.
+  - Persist workflow checkpoint: current step id, resolved variables, and pending task context.
+  - Persist only already-applied safe edits; do not apply new partial completion marks after HALT.
+  - Emit logger event exactly in this format:
+    - `HALT[{reason_code}] step={step_id} story={story_key|unknown} detail=\"{message}\"`
+  - Emit user-facing prompt exactly in this format:
+    - `Workflow HALTED at {step_id} ({reason_code}): {message}. Required action: {required_action}. Reply RESUME after remediation.`
 - Resume semantics:
-  - Manual resume only after user confirms the blocking issue is resolved.
-  - Resume from the last incomplete step checkpoint, re-running validations before progressing.
+  - Manual resume only (no automatic retry loop).
+  - Resume is checkpoint-based: restart from the halted step after user confirms remediation.
+  - Re-run the failed validation/input check before executing further actions.
+  - If the same HALT condition repeats, stop again with updated evidence.

 ## Execution
 Read fully and follow: `steps/step-01-find-story.md`.