feat: add triage step to code review workflow

External agents find ALL possible issues, but not all are relevant. This new step 4 evaluates findings against project context: - Actually Important: AC violations, bugs, real security issues - Context-Dependent: Performance, validation strictness (asks user) - Theoretical/Nitpicking: Micro-optimizations, style preferences (skips) This enables pragmatic code review that focuses on what matters.
2025-12-13 20:29:11 -06:00 · 2025-12-13 20:29:11 -06:00 · 195664029d
parent 57f59b2e5c
commit 195664029d
1 changed files with 121 additions and 24 deletions
--- a/src/modules/bmm/workflows/4-implementation/code-review/instructions.xml
+++ b/src/modules/bmm/workflows/4-implementation/code-review/instructions.xml
@ -288,38 +288,135 @@
    </check>
  </step>
-  <step n="4" goal="Present findings and fix them">
+  <!-- ================================================================ -->
-    <action>Categorize findings: HIGH (must fix), MEDIUM (should fix), LOW (nice to fix)</action>
+  <!-- STEP 4: TRIAGE FINDINGS - SEPARATE SIGNAL FROM NOISE            -->
  <!-- ================================================================ -->
  <!-- This step evaluates external agent findings with project context.
       External agents find ALL possible issues - but not all matter.
       A symlink bypass is critical for a web API, irrelevant for a game.
       The orchestrating agent applies pragmatic judgment here. -->
  <step n="4" goal="Triage findings - separate signal from noise">
    <critical>External agents are adversarial by design - they find EVERYTHING.</critical>
    <critical>Your job: Apply project context to determine what ACTUALLY matters.</critical>
    <critical>Do NOT blindly accept all findings. Do NOT dismiss valid concerns.</critical>
    <action>For EACH finding from {{all_findings}}, evaluate against these criteria:</action>
    <!-- Evaluation Framework -->
    <action>**ACTUALLY IMPORTANT** (always fix):
      - AC violations: Story claims something works but it doesn't
      - Task fraud: Checkbox marked [x] but code doesn't exist
      - Contract violations: Method signature doesn't match documented behavior
      - Real bugs: Code will fail at runtime in normal usage
      - Security issues for the ACTUAL threat model (game loading own files ≠ public API)
    </action>
    <action>**CONTEXT-DEPENDENT** (ask user):
      - Performance: Does it matter at this scale? Game data vs. million-record DB
      - Validation strictness: Nice-to-have vs. actually needed
      - Edge cases: Will they ever happen in practice?
      - Thread safety: Is this actually multi-threaded?
    </action>
    <action>**THEORETICAL/NITPICKING** (skip unless user insists):
      - "Could be exploited if attacker controls X" when attacker never controls X
      - Micro-optimizations that save nanoseconds
      - Style preferences disguised as bugs
      - "Best practice" violations that don't cause problems
      - DoS concerns for trusted internal data
    </action>
    <!-- Build categorized lists -->
    <set-var name="important_findings" value="[]" />
    <set-var name="contextual_findings" value="[]" />
    <set-var name="theoretical_findings" value="[]" />
    <action>Categorize each finding into one of the three lists with brief reasoning</action>
    <!-- Present triage to user via AskUserQuestion -->
    <output>**🔍 FINDINGS TRIAGE**
 I've reviewed {{external_agent_cmd}}'s findings against your project context.
 Here's what I think actually matters vs. theoretical concerns:
    </output>
    <!-- Important findings - recommend fixing -->
    <check if="{{important_findings}} is not empty">
      <output>
 ## ✅ ACTUALLY IMPORTANT (Recommend Fix)
 These are real issues that affect correctness or violate documented contracts:
 {{#each important_findings}}
 - **{{this.id}}**: {{this.summary}}
  - *Why it matters*: {{this.reasoning}}
 {{/each}}
      </output>
    </check>
    <!-- Context-dependent - ask user -->
    <check if="{{contextual_findings}} is not empty">
      <ask questions="[
        {
          'question': 'Which context-dependent issues should we address?',
          'header': 'Fix these?',
          'multiSelect': true,
          'options': [
            {{#each contextual_findings}}
            {
              'label': '{{this.id}}: {{this.short_summary}}',
              'description': '{{this.reasoning}}'
            },
            {{/each}}
            {
              'label': 'None of these',
              'description': 'Skip all context-dependent issues'
            }
          ]
        }
      ]" />
      <action>Add user-selected contextual findings to {{important_findings}}</action>
    </check>
    <!-- Theoretical findings - inform user but don't push -->
    <check if="{{theoretical_findings}} is not empty">
      <output>
 ## ⏭️ SKIPPING (Theoretical/Nitpicking)
 These findings are technically valid but don't matter for your use case:
 {{#each theoretical_findings}}
 - **{{this.id}}**: {{this.summary}} — *{{this.reasoning}}*
 {{/each}}
 *(Say "fix all" if you want these addressed anyway)*
      </output>
    </check>
    <!-- Final confirmation -->
    <set-var name="final_fix_list" value="{{important_findings}}" />
    <output>
 **📋 FINAL FIX LIST: {{final_fix_list.length}} issues**
    </output>
  </step>
  <step n="5" goal="Present findings and fix them">
    <action>Set {{fixed_count}} = 0</action>
    <action>Set {{action_count}} = 0</action>
-    <output>**🔥 CODE REVIEW FINDINGS, {user_name}!**
+    <output>**🔥 CODE REVIEW SUMMARY, {user_name}!**
      **Story:** {{story_file}}
      **Review Method:** {{#if external_agent_cmd}}{{external_agent_cmd}} CLI{{else}}built-in{{/if}}
-      **Git vs Story Discrepancies:** {{git_discrepancy_count}} found
+      **Raw Findings:** {{all_findings.length}} total
-      **Issues Found:** {{high_count}} High, {{medium_count}} Medium, {{low_count}} Low
+      **After Triage:** {{final_fix_list.length}} to address, {{theoretical_findings.length}} skipped
-      ## 🔴 CRITICAL ISSUES
+      ## Issues to Fix
-      - Tasks marked [x] but not actually implemented
+      {{#each final_fix_list}}
-      - Acceptance Criteria not implemented
+      - [{{this.severity}}] {{this.id}}: {{this.summary}}
-      - Story claims files changed but no git evidence
+      {{/each}}
      - Security vulnerabilities
      ## 🟡 MEDIUM ISSUES
      - Files changed but not documented in story File List
      - Uncommitted changes not tracked
      - Performance problems
      - Poor test coverage/quality
      - Code maintainability issues
      ## 🟢 LOW ISSUES
      - Code style improvements
      - Documentation gaps
      - Git commit message quality
    </output>
-    <ask>What should I do with these issues?
+    <ask>What should I do with these {{final_fix_list.length}} issues?
      1. **Fix them automatically** - I'll update the code and tests
      2. **Create action items** - Add to story Tasks/Subtasks for later
@ -349,7 +446,7 @@
    </check>
  </step>
-  <step n="5" goal="Update story status and sync sprint tracking">
+  <step n="6" goal="Update story status and sync sprint tracking">
    <!-- Determine new status based on review outcome -->
    <check if="all HIGH and MEDIUM issues fixed AND all ACs implemented">
      <action>Set {{new_status}} = "done"</action>