Compare commits

...

5 Commits

Author SHA1 Message Date
Alex Verkhovsky f781ba6566
Merge 0f708d0b89 into 35ae4fd024 2026-01-05 02:29:53 +00:00
Alex Verkhovsky 0f708d0b89 refactor(core): shorten adversarial review task name 2026-01-04 18:28:51 -08:00
Alex Verkhovsky 5fcdae02b5 refactor(code-review): defer finding IDs until consolidation 2026-01-04 05:33:21 -08:00
Alex Verkhovsky b8eeb78cff refactor(adversarial-review): simplify severity/validity classification 2026-01-04 04:13:46 -08:00
Alex Verkhovsky b628eec9fd refactor(code-review): simplify adversarial review task invocation 2026-01-04 04:07:23 -08:00
5 changed files with 53 additions and 115 deletions

View File

@ -1,7 +1,7 @@
<!-- if possible, run this in a separate subagent or process with read access to the project,
but no context except the content to review -->
<task id="_bmad/core/tasks/review-adversarial-general.xml" name="Adversarial Review (General)">
<task id="_bmad/core/tasks/review-adversarial-general.xml" name="Adversarial Review">
<objective>Cynically review content and produce findings</objective>
<inputs>

View File

@ -43,15 +43,14 @@ Review `{git_discrepancies}` and create findings:
| Discrepancy Type | Severity |
| --- | --- |
| Files changed but not in story File List | MEDIUM |
| Story lists files but no git changes | HIGH |
| Uncommitted changes not documented | MEDIUM |
| Files changed but not in story File List | Medium |
| Story lists files but no git changes | High |
| Uncommitted changes not documented | Medium |
For each discrepancy, add to `{context_aware_findings}`:
For each discrepancy, add to `{context_aware_findings}` (no IDs yet - assigned after merge):
```
{
id: "CAF-{n}",
source: "git-discrepancy",
severity: "...",
description: "...",
@ -66,15 +65,14 @@ For EACH AC in `{acceptance_criteria}`:
1. Read the AC requirement
2. Search implementation files in `{comprehensive_file_list}` for evidence
3. Determine status: IMPLEMENTED, PARTIAL, or MISSING
4. If PARTIAL or MISSING → add HIGH severity finding
4. If PARTIAL or MISSING → add High severity finding
Add to `{context_aware_findings}`:
```
{
id: "CAF-{n}",
source: "ac-validation",
severity: "HIGH",
severity: "High",
description: "AC {id} not fully implemented: {details}",
evidence: "Expected: {ac}, Found: {what_was_found}"
}
@ -86,16 +84,15 @@ For EACH task marked [x] in `{tasks_with_status}`:
1. Read the task description
2. Search files for evidence it was actually done
3. **CRITICAL**: If marked [x] but NOT DONE → CRITICAL finding
3. **Critical**: If marked [x] but NOT DONE → Critical finding
4. Record specific proof (file:line) if done
Add to `{context_aware_findings}` if false:
```
{
id: "CAF-{n}",
source: "task-audit",
severity: "CRITICAL",
severity: "Critical",
description: "Task marked complete but not implemented: {task}",
evidence: "Searched: {files}, Found: no evidence of {expected}"
}
@ -137,10 +134,10 @@ Present context-aware findings:
**Phase 1: Context-Aware Review Complete**
**Findings:** {count}
- CRITICAL: {count}
- HIGH: {count}
- MEDIUM: {count}
- LOW: {count}
- Critical: {count}
- High: {count}
- Medium: {count}
- Low: {count}
Proceeding to Phase 2: Adversarial Review...
```

View File

@ -75,74 +75,29 @@ If no baseline available, review current state of files in `{file_list}`:
### 2. Invoke Adversarial Review
<critical>Use information asymmetry: separate context from review</critical>
**Execution Hierarchy (try in order):**
**Option A: Subagent (Preferred)**
If Task tool available with subagent capability:
With `{diff_output}` constructed, invoke the review task. If possible, use information asymmetry: run this step, and only it, in a separate subagent or process with read access to the project, but no context except the `{diff_output}`.
```xml
<invoke-task subagent="true">
Review {diff_output} using {project-root}/_bmad/core/tasks/review-adversarial-general.xml
</invoke-task>
<invoke-task>Review {diff_output} using {project-root}/_bmad/core/tasks/review-adversarial-general.xml</invoke-task>
```
The subagent:
**Platform fallback:** If task invocation not available, load the task file and execute its instructions inline, passing `{diff_output}` as the content.
- Has FULL read access to the repository
- Receives ONLY `{diff_output}` as context
- Does NOT know story requirements, ACs, or intent
- Reviews code purely on technical merit
**Option B: CLI Fallback**
If subagent not available but CLI available:
```bash
# Pipe diff to adversarial review task
cat {diff_file} | claude --task {adversarial_review_task}
```
**Option C: Inline Execution**
If neither available, load `review-adversarial-general.xml` and execute inline:
1. Load task file
2. Adopt adversarial persona
3. Review `{diff_output}` with zero story context
4. Generate findings
The task should: review `{diff_output}` and return a list of findings.
### 3. Process Adversarial Findings
Capture findings from adversarial review.
**If zero findings returned:**
**If zero findings:** HALT - this is suspicious. Re-analyze or ask for guidance.
<critical>HALT - Zero findings is suspicious. Re-analyze or ask for guidance.</critical>
Evaluate severity (Critical, High, Medium, Low) and validity (Real, Noise, Undecided).
**For each finding:**
Assign severity:
- CRITICAL: Security vulnerabilities, data loss risks
- HIGH: Logic errors, missing error handling
- MEDIUM: Performance issues, code smells
- LOW: Style, documentation
Assign validity:
- REAL: Genuine issue to address
- NOISE: False positive (explain why)
- UNDECIDED: Needs human judgment
Create `{asymmetric_findings}` list:
Add each finding to `{asymmetric_findings}` (no IDs yet - assigned after merge):
```
{
id: "AAF-{n}",
source: "adversarial-review",
source: "adversarial",
severity: "...",
validity: "...",
description: "...",
@ -159,15 +114,15 @@ Present adversarial findings:
**Reviewer Context:** Pure diff review (no story knowledge)
**Findings:** {count}
- CRITICAL: {count}
- HIGH: {count}
- MEDIUM: {count}
- LOW: {count}
- Critical: {count}
- High: {count}
- Medium: {count}
- Low: {count}
**Validity Assessment:**
- Real issues: {count}
- Noise/false positives: {count}
- Needs judgment: {count}
- Real: {count}
- Noise: {count}
- Undecided: {count}
Proceeding to findings consolidation...
```
@ -186,7 +141,7 @@ Proceeding to findings consolidation...
- Diff constructed from correct source (uncommitted or commits)
- Story file excluded from diff
- Subagent invoked with proper isolation (or fallback used)
- Task invoked with diff as input
- Adversarial review executed
- Findings captured with severity and validity
- `{asymmetric_findings}` populated
@ -198,7 +153,7 @@ Proceeding to findings consolidation...
- Including story file in diff (breaks asymmetry)
- Skipping adversarial review entirely
- Accepting zero findings without halt
- Not using subagent when available
- Invoking task without providing diff input
- Missing severity/validity classification
- Not storing findings for consolidation
- No explicit NEXT directive at step completion

View File

@ -57,53 +57,39 @@ Keep the MORE DETAILED version:
- If adversarial finding has better technical detail → keep that
- When in doubt, keep context-aware (has more context)
Mark duplicates as merged:
```
{
id: "CF-{n}",
merged_from: ["CAF-3", "AAF-2"],
kept_version: "CAF-3",
reason: "Context-aware version includes AC reference"
}
```
Note which findings were merged (for transparency in the summary).
### 3. Normalize Severity
Apply consistent severity scale:
| Severity | Icon | Criteria |
| --- | --- | --- |
| CRITICAL | RED | Security vuln, data loss, tasks marked done but not, broken core functionality |
| HIGH | ORANGE | Missing AC implementation, logic errors, missing critical error handling |
| MEDIUM | YELLOW | Performance issues, incomplete features, documentation gaps |
| LOW | GREEN | Code style, minor improvements, suggestions |
Apply consistent severity scale (Critical, High, Medium, Low).
### 4. Filter Noise
Review adversarial findings marked as NOISE:
Review adversarial findings marked as Noise:
- If clearly false positive (e.g., style preference, not actual issue) → exclude
- If questionable → keep with UNDECIDED validity
- If context reveals it's actually valid → upgrade to REAL
- If questionable → keep with Undecided validity
- If context reveals it's actually valid → upgrade to Real
**Do NOT filter:**
- Any CRITICAL or HIGH severity
- Any Critical or High severity
- Any context-aware findings (they have story context)
### 5. Create Consolidated Table
### 5. Sort and Number Findings
Sort by severity (Critical → High → Medium → Low), then assign IDs: F1, F2, F3, etc.
Build `{consolidated_findings}`:
```markdown
| ID | Severity | Source | Description | Location |
|----|----------|--------|-------------|----------|
| CF-1 | CRITICAL | task-audit | Task 3 marked [x] but not implemented | src/auth.ts |
| CF-2 | HIGH | ac-validation | AC2 partially implemented | src/api/*.ts |
| CF-3 | HIGH | adversarial | Missing error handling in API calls | src/api/client.ts:45 |
| CF-4 | MEDIUM | git-discrepancy | File changed but not in story | src/utils.ts |
| CF-5 | LOW | adversarial | Magic number should be constant | src/config.ts:12 |
| F1 | Critical | task-audit | Task 3 marked [x] but not implemented | src/auth.ts |
| F2 | High | ac-validation | AC2 partially implemented | src/api/*.ts |
| F3 | High | adversarial | Missing error handling in API calls | src/api/client.ts:45 |
| F4 | Medium | git-discrepancy | File changed but not in story | src/utils.ts |
| F5 | Low | adversarial | Magic number should be constant | src/config.ts:12 |
```
### 6. Present Consolidated Findings
@ -115,10 +101,10 @@ Build `{consolidated_findings}`:
**Summary:**
- Total findings: {count}
- CRITICAL: {count}
- HIGH: {count}
- MEDIUM: {count}
- LOW: {count}
- Critical: {count}
- High: {count}
- Medium: {count}
- Low: {count}
**Deduplication:** {merged_count} duplicate findings merged
@ -126,16 +112,16 @@ Build `{consolidated_findings}`:
## Findings by Severity
### CRITICAL (Must Fix)
### Critical (Must Fix)
{list critical findings with full details}
### HIGH (Should Fix)
### High (Should Fix)
{list high findings with full details}
### MEDIUM (Consider Fixing)
### Medium (Consider Fixing)
{list medium findings}
### LOW (Nice to Fix)
### Low (Nice to Fix)
{list low findings}
---

View File

@ -79,7 +79,7 @@ The task should: review `{diff_output}` and return a list of findings.
Capture the findings from the task output.
**If zero findings:** HALT - this is suspicious. Re-analyze or request user guidance.
Evaluate severity (Critical, High, Medium, Low) and validity (real, noise, undecided).
Evaluate severity (Critical, High, Medium, Low) and validity (Real, Noise, Undecided).
DO NOT exclude findings based on severity or validity unless explicitly asked to do so.
Order findings by severity.
Number the ordered findings (F1, F2, F3, etc.).