Refactor evals from 8 numbered single-shot tests into 16 typed evals:
- Pattern A (A1-A8): artifact-correctness tests with headless prompts and
precise, falsifiable expectations (no invented facts, right-sized output,
file boundary enforcement)
- Pattern B (B1-B8): process-discipline tests verifying decision-log fidelity,
polish phase ordering, contradiction detection, and distillate generation
- Pattern C (C1): config-compliance test for custom output paths and
document_output_language
Also tighten SKILL.md: add dependencies frontmatter, clarify headless override
autonomy in Update (proceed when intent is clear; block when ambiguous), and
make distillate skip explicit when bmad-distillator is not installed.