BMAD-METHOD/test/adversarial-review-tests
Alex Verkhovsky 1cbaeae643 fix: update stale task terminology to skill after format conversion
Address review findings from PR #1857: replace remaining "task"
references with "skill" in workflow steps and test documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 06:51:53 -06:00
..
README.md fix: update stale task terminology to skill after format conversion 2026-03-08 06:51:53 -06:00
sample-content.md feat: add optional also_consider input to adversarial review task (#1371) 2026-01-22 22:26:25 -06:00
test-cases.yaml fix: update stale task terminology to skill after format conversion 2026-03-08 06:51:53 -06:00

README.md

Adversarial Review Test Suite

Tests for the also_consider optional input in the bmad-review-adversarial-general skill.

Purpose

Evaluate whether the also_consider input gently nudges the reviewer toward specific areas without overriding normal adversarial analysis.

Test Content

All tests use sample-content.md - a deliberately imperfect User Authentication API doc with:

  • Vague error handling section
  • Missing rate limit details
  • No token expiration info
  • Password in plain text example
  • Missing authentication headers
  • No error response examples

Running Tests

For each test case in test-cases.yaml, invoke the adversarial review skill.

Manual Test Invocation

Review this content using the adversarial review skill:

<content>
[paste sample-content.md]
</content>

<also_consider>
[paste items from test case, or omit for TC01]
</also_consider>

Evaluation Criteria

For each test, note:

  1. Total findings - Still hitting ~10 issues?
  2. Distribution - Are findings spread across concerns or clustered?
  3. Relevance - Do findings relate to also_consider items when provided?
  4. Balance - Are also_consider findings elevated over others, or naturally mixed?
  5. Quality - Are findings actionable regardless of source?

Expected Outcomes

  • TC01 (baseline): Generic spread of findings
  • TC02-TC05 (domain-focused): Some findings align with domain, others still organic
  • TC06 (single item): Light influence, not dominant
  • TC07 (vague items): Minimal change from baseline
  • TC08 (specific items): Direct answers if gaps exist
  • TC09 (mixed): Balanced across domains
  • TC10 (contradictory): Graceful handling