18 tasks: skeleton test/eval, then per-step tighten/test/eval cycles, plus end-to-end eval. Each task file is a self-contained intent expression that can be fed to QD2.