876 B

Raw Blame History

Task 02: Eval Bare Skeleton

Prerequisite

Task 01 test cycle is clean (all gaps and plumbing issues resolved).

Intent

Evaluate the bare skeleton's efficiency against the existing QD workflow as baseline.

Method

Run the same task through both QD (old) and QD2 (skeleton) if possible, or compare against a recent QD session log.
Measure:
- Total human turns (the north star metric)
- Total agent turns / API round-trips
- Approximate token usage (context window utilization)
- Time to completion
- Quality of output (subjective: did it produce what was asked for?)
Note where QD2 is better, where it's worse, where it's equivalent.

Output

An eval report: _experiment/results/skeleton-eval-report.md with metrics comparison and recommendations for which steps need tightening first (prioritized by impact on the north star metric).

876 B Raw Blame History

Task 02: Eval Bare Skeleton

Prerequisite

Intent

Method

Output

876 B

Raw Blame History