876 B
876 B
Task 02: Eval Bare Skeleton
Prerequisite
Task 01 test cycle is clean (all gaps and plumbing issues resolved).
Intent
Evaluate the bare skeleton's efficiency against the existing QD workflow as baseline.
Method
- Run the same task through both QD (old) and QD2 (skeleton) if possible, or compare against a recent QD session log.
- Measure:
- Total human turns (the north star metric)
- Total agent turns / API round-trips
- Approximate token usage (context window utilization)
- Time to completion
- Quality of output (subjective: did it produce what was asked for?)
- Note where QD2 is better, where it's worse, where it's equivalent.
Output
An eval report: _experiment/results/skeleton-eval-report.md with metrics comparison and recommendations for which steps need tightening first (prioritized by impact on the north star metric).