# Task 02: Eval Bare Skeleton

## Prerequisite

Task 01 test cycle is clean (all gaps and plumbing issues resolved).

## Intent

Evaluate the bare skeleton's efficiency against the existing QD workflow as baseline.

## Method

1. Run the same task through both QD (old) and QD2 (skeleton) if possible, or compare against a recent QD session log.
2. Measure:
   - Total human turns (the north star metric)
   - Total agent turns / API round-trips
   - Approximate token usage (context window utilization)
   - Time to completion
   - Quality of output (subjective: did it produce what was asked for?)
3. Note where QD2 is better, where it's worse, where it's equivalent.

## Output

An eval report: `_experiment/results/skeleton-eval-report.md` with metrics comparison and recommendations for which steps need tightening first (prioritized by impact on the north star metric).