Capture the telemetry the claude CLI already emits (session id, tokens,
cost, latency, context window) as OTel-shaped trace spans and roll them
up into deterministic metrics. Gated behind BMAD_TRACE=1; the legacy
text path is unchanged when tracing is off.
- New scripts/epic-execute-lib/observability.sh: span recording, rollup,
jq dep enforcement, and an intra-phase heartbeat for crash forensics
- epic-execute.sh: stream-json capture in run_claude_to_file with clean
.result extraction, per-phase set_span_context calls, rollup in cleanup
- epic-chain.sh: measured (non-fabricated) telemetry section in reports
- Guard set -e aborts on malformed stream lines so crash/timeout paths
degrade gracefully instead of killing the run
- Docs: gap analysis + observability implementation plan
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>