BMAD-METHOD

Commit Graph

Author	SHA1	Message	Date
Brian Madison	b410f2c436	test(bmm): overhaul product-brief evals into A/B/C pattern split Refactor evals from 8 numbered single-shot tests into 16 typed evals: - Pattern A (A1-A8): artifact-correctness tests with headless prompts and precise, falsifiable expectations (no invented facts, right-sized output, file boundary enforcement) - Pattern B (B1-B8): process-discipline tests verifying decision-log fidelity, polish phase ordering, contradiction detection, and distillate generation - Pattern C (C1): config-compliance test for custom output paths and document_output_language Also tighten SKILL.md: add dependencies frontmatter, clarify headless override autonomy in Update (proceed when intent is clear; block when ambiguous), and make distillate skip explicit when bmad-distillator is not installed.	2026-05-09 17:10:59 -05:00
Brian Madison	9ef0b7f574	test(bmm): scaffold evals for bmad-product-brief Adds an eval suite for bmad-product-brief following the Anthropic skill-creator schema, plus a new "Extract, don't ingest" constraint on the skill itself. Skill change: - New constraint: source artifacts (user-provided or run-discovered) enter the parent conversation as relevance-filtered extracts via subagents, not loaded wholesale. Keeps the parent context lean against transcripts, brainstorms, research reports, code, web results, and prior briefs. Evals (evals/bmm-skills/bmad-product-brief/): - evals.json: 8 artifact/behavioral evals covering Create one-shot, source-memo ingest, Update with contradiction surfacing, Validate inline, Headless mode, brainstorm filtering, research-report filtering, persona filtering. All scenarios use fictional entities (InsuLens, Branfield CC, Forkbird Kitchen, Mossridge Library, Sproutkeeper, Hatchet & Loop Studio, Brightway, Pantry Bridge). - triggers.json: 15 description-firing checks (7 should-fire, 8 should-not-fire) to catch under-triggering and adjacent-skill poaching. - files/: realistic fixtures including a brainstorm with the relevant idea buried at the end, a 3000-word market research report with the relevant section in the middle, and customer interviews with the target persona in position 3 of 4 — each shaped to test that filtering happens against the user's stated focus regardless of where the relevant material sits. Eval directory placement: top-level evals/ outside src/, matching the convention in anthropics/skills (zero of 17 production skills include an evals/ subdir; their skill-creator places dev workspaces as siblings to skill folders). Keeps evals out of any installer or marketplace.json distribution path.	2026-05-09 10:03:56 -05:00

Author

SHA1

Message

Date

Brian Madison

b410f2c436

test(bmm): overhaul product-brief evals into A/B/C pattern split

Refactor evals from 8 numbered single-shot tests into 16 typed evals:
- Pattern A (A1-A8): artifact-correctness tests with headless prompts and
  precise, falsifiable expectations (no invented facts, right-sized output,
  file boundary enforcement)
- Pattern B (B1-B8): process-discipline tests verifying decision-log fidelity,
  polish phase ordering, contradiction detection, and distillate generation
- Pattern C (C1): config-compliance test for custom output paths and
  document_output_language

Also tighten SKILL.md: add dependencies frontmatter, clarify headless override
autonomy in Update (proceed when intent is clear; block when ambiguous), and
make distillate skip explicit when bmad-distillator is not installed.

2026-05-09 17:10:59 -05:00

Brian Madison

9ef0b7f574

test(bmm): scaffold evals for bmad-product-brief

Adds an eval suite for bmad-product-brief following the Anthropic skill-creator
schema, plus a new "Extract, don't ingest" constraint on the skill itself.

Skill change:
- New constraint: source artifacts (user-provided or run-discovered) enter the
  parent conversation as relevance-filtered extracts via subagents, not loaded
  wholesale. Keeps the parent context lean against transcripts, brainstorms,
  research reports, code, web results, and prior briefs.

Evals (evals/bmm-skills/bmad-product-brief/):
- evals.json: 8 artifact/behavioral evals covering Create one-shot,
  source-memo ingest, Update with contradiction surfacing, Validate inline,
  Headless mode, brainstorm filtering, research-report filtering, persona
  filtering. All scenarios use fictional entities (InsuLens, Branfield CC,
  Forkbird Kitchen, Mossridge Library, Sproutkeeper, Hatchet & Loop Studio,
  Brightway, Pantry Bridge).
- triggers.json: 15 description-firing checks (7 should-fire, 8 should-not-fire)
  to catch under-triggering and adjacent-skill poaching.
- files/: realistic fixtures including a brainstorm with the relevant idea
  buried at the end, a 3000-word market research report with the relevant
  section in the middle, and customer interviews with the target persona in
  position 3 of 4 — each shaped to test that filtering happens against the
  user's stated focus regardless of where the relevant material sits.

Eval directory placement: top-level evals/ outside src/, matching the convention
in anthropics/skills (zero of 17 production skills include an evals/ subdir;
their skill-creator places dev workspaces as siblings to skill folders). Keeps
evals out of any installer or marketplace.json distribution path.

2026-05-09 10:03:56 -05:00

2 Commits