History

Alex Verkhovsky 1cbaeae643 fix: update stale task terminology to skill after format conversion Address review findings from PR #1857: replace remaining "task" references with "skill" in workflow steps and test documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-03-08 06:51:53 -06:00
..
README.md	fix: update stale task terminology to skill after format conversion	2026-03-08 06:51:53 -06:00
sample-content.md	feat: add optional also_consider input to adversarial review task (#1371 )	2026-01-22 22:26:25 -06:00
test-cases.yaml	fix: update stale task terminology to skill after format conversion	2026-03-08 06:51:53 -06:00

README.md

Adversarial Review Test Suite

Tests for the also_consider optional input in the bmad-review-adversarial-general skill.

Purpose

Evaluate whether the also_consider input gently nudges the reviewer toward specific areas without overriding normal adversarial analysis.

Test Content

All tests use sample-content.md - a deliberately imperfect User Authentication API doc with:

Vague error handling section
Missing rate limit details
No token expiration info
Password in plain text example
Missing authentication headers
No error response examples

Running Tests

For each test case in test-cases.yaml, invoke the adversarial review skill.

Manual Test Invocation

Review this content using the adversarial review skill:

<content>
[paste sample-content.md]
</content>

<also_consider>
[paste items from test case, or omit for TC01]
</also_consider>

Evaluation Criteria

For each test, note:

Total findings - Still hitting ~10 issues?
Distribution - Are findings spread across concerns or clustered?
Relevance - Do findings relate to also_consider items when provided?
Balance - Are also_consider findings elevated over others, or naturally mixed?
Quality - Are findings actionable regardless of source?

Expected Outcomes

TC01 (baseline): Generic spread of findings
TC02-TC05 (domain-focused): Some findings align with domain, others still organic
TC06 (single item): Light influence, not dominant
TC07 (vague items): Minimal change from baseline
TC08 (specific items): Direct answers if gaps exist
TC09 (mixed): Balanced across domains
TC10 (contradictory): Graceful handling