feat: add triage step to code review workflow

External agents find ALL possible issues, but not all are relevant.
This new step 4 evaluates findings against project context:
- Actually Important: AC violations, bugs, real security issues
- Context-Dependent: Performance, validation strictness (asks user)
- Theoretical/Nitpicking: Micro-optimizations, style preferences (skips)

This enables pragmatic code review that focuses on what matters.
This commit is contained in:
Scott Jennings 2025-12-13 20:29:11 -06:00
parent 57f59b2e5c
commit 195664029d
1 changed files with 121 additions and 24 deletions

View File

@ -288,38 +288,135 @@
</check>
</step>
<step n="4" goal="Present findings and fix them">
<action>Categorize findings: HIGH (must fix), MEDIUM (should fix), LOW (nice to fix)</action>
<!-- ================================================================ -->
<!-- STEP 4: TRIAGE FINDINGS - SEPARATE SIGNAL FROM NOISE -->
<!-- ================================================================ -->
<!-- This step evaluates external agent findings with project context.
External agents find ALL possible issues - but not all matter.
A symlink bypass is critical for a web API, irrelevant for a game.
The orchestrating agent applies pragmatic judgment here. -->
<step n="4" goal="Triage findings - separate signal from noise">
<critical>External agents are adversarial by design - they find EVERYTHING.</critical>
<critical>Your job: Apply project context to determine what ACTUALLY matters.</critical>
<critical>Do NOT blindly accept all findings. Do NOT dismiss valid concerns.</critical>
<action>For EACH finding from {{all_findings}}, evaluate against these criteria:</action>
<!-- Evaluation Framework -->
<action>**ACTUALLY IMPORTANT** (always fix):
- AC violations: Story claims something works but it doesn't
- Task fraud: Checkbox marked [x] but code doesn't exist
- Contract violations: Method signature doesn't match documented behavior
- Real bugs: Code will fail at runtime in normal usage
- Security issues for the ACTUAL threat model (game loading own files ≠ public API)
</action>
<action>**CONTEXT-DEPENDENT** (ask user):
- Performance: Does it matter at this scale? Game data vs. million-record DB
- Validation strictness: Nice-to-have vs. actually needed
- Edge cases: Will they ever happen in practice?
- Thread safety: Is this actually multi-threaded?
</action>
<action>**THEORETICAL/NITPICKING** (skip unless user insists):
- "Could be exploited if attacker controls X" when attacker never controls X
- Micro-optimizations that save nanoseconds
- Style preferences disguised as bugs
- "Best practice" violations that don't cause problems
- DoS concerns for trusted internal data
</action>
<!-- Build categorized lists -->
<set-var name="important_findings" value="[]" />
<set-var name="contextual_findings" value="[]" />
<set-var name="theoretical_findings" value="[]" />
<action>Categorize each finding into one of the three lists with brief reasoning</action>
<!-- Present triage to user via AskUserQuestion -->
<output>**🔍 FINDINGS TRIAGE**
I've reviewed {{external_agent_cmd}}'s findings against your project context.
Here's what I think actually matters vs. theoretical concerns:
</output>
<!-- Important findings - recommend fixing -->
<check if="{{important_findings}} is not empty">
<output>
## ✅ ACTUALLY IMPORTANT (Recommend Fix)
These are real issues that affect correctness or violate documented contracts:
{{#each important_findings}}
- **{{this.id}}**: {{this.summary}}
- *Why it matters*: {{this.reasoning}}
{{/each}}
</output>
</check>
<!-- Context-dependent - ask user -->
<check if="{{contextual_findings}} is not empty">
<ask questions="[
{
'question': 'Which context-dependent issues should we address?',
'header': 'Fix these?',
'multiSelect': true,
'options': [
{{#each contextual_findings}}
{
'label': '{{this.id}}: {{this.short_summary}}',
'description': '{{this.reasoning}}'
},
{{/each}}
{
'label': 'None of these',
'description': 'Skip all context-dependent issues'
}
]
}
]" />
<action>Add user-selected contextual findings to {{important_findings}}</action>
</check>
<!-- Theoretical findings - inform user but don't push -->
<check if="{{theoretical_findings}} is not empty">
<output>
## ⏭️ SKIPPING (Theoretical/Nitpicking)
These findings are technically valid but don't matter for your use case:
{{#each theoretical_findings}}
- **{{this.id}}**: {{this.summary}} — *{{this.reasoning}}*
{{/each}}
*(Say "fix all" if you want these addressed anyway)*
</output>
</check>
<!-- Final confirmation -->
<set-var name="final_fix_list" value="{{important_findings}}" />
<output>
**📋 FINAL FIX LIST: {{final_fix_list.length}} issues**
</output>
</step>
<step n="5" goal="Present findings and fix them">
<action>Set {{fixed_count}} = 0</action>
<action>Set {{action_count}} = 0</action>
<output>**🔥 CODE REVIEW FINDINGS, {user_name}!**
<output>**🔥 CODE REVIEW SUMMARY, {user_name}!**
**Story:** {{story_file}}
**Review Method:** {{#if external_agent_cmd}}{{external_agent_cmd}} CLI{{else}}built-in{{/if}}
**Git vs Story Discrepancies:** {{git_discrepancy_count}} found
**Issues Found:** {{high_count}} High, {{medium_count}} Medium, {{low_count}} Low
**Raw Findings:** {{all_findings.length}} total
**After Triage:** {{final_fix_list.length}} to address, {{theoretical_findings.length}} skipped
## 🔴 CRITICAL ISSUES
- Tasks marked [x] but not actually implemented
- Acceptance Criteria not implemented
- Story claims files changed but no git evidence
- Security vulnerabilities
## 🟡 MEDIUM ISSUES
- Files changed but not documented in story File List
- Uncommitted changes not tracked
- Performance problems
- Poor test coverage/quality
- Code maintainability issues
## 🟢 LOW ISSUES
- Code style improvements
- Documentation gaps
- Git commit message quality
## Issues to Fix
{{#each final_fix_list}}
- [{{this.severity}}] {{this.id}}: {{this.summary}}
{{/each}}
</output>
<ask>What should I do with these issues?
<ask>What should I do with these {{final_fix_list.length}} issues?
1. **Fix them automatically** - I'll update the code and tests
2. **Create action items** - Add to story Tasks/Subtasks for later
@ -349,7 +446,7 @@
</check>
</step>
<step n="5" goal="Update story status and sync sprint tracking">
<step n="6" goal="Update story status and sync sprint tracking">
<!-- Determine new status based on review outcome -->
<check if="all HIGH and MEDIUM issues fixed AND all ACs implemented">
<action>Set {{new_status}} = "done"</action>