Capture the telemetry the claude CLI already emits (session id, tokens,
cost, latency, context window) as OTel-shaped trace spans and roll them
up into deterministic metrics. Gated behind BMAD_TRACE=1; the legacy
text path is unchanged when tracing is off.
- New scripts/epic-execute-lib/observability.sh: span recording, rollup,
jq dep enforcement, and an intra-phase heartbeat for crash forensics
- epic-execute.sh: stream-json capture in run_claude_to_file with clean
.result extraction, per-phase set_span_context calls, rollup in cleanup
- epic-chain.sh: measured (non-fabricated) telemetry section in reports
- Guard set -e aborts on malformed stream lines so crash/timeout paths
degrade gracefully instead of killing the run
- Docs: gap analysis + observability implementation plan
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Final piece of the contract execution increment.
- run_contract_validation: env up → backend cases + UI flows → env down
- contract_validation_gate: bounded self-heal loop (execute_contract_fix_phase
feeds CONTRACT_EXEC_FAILURES back to a focused fix prompt, commits, re-runs)
- Wired as a per-epic gate after the story loop (v1 granularity: the app
reflects the whole epic before contracts run), opt-in by harness presence and
--skip-contract-validation
- Exit-code-honest: CONTRACT_VALIDATION_FAILED makes the epic exit non-zero if
contracts never pass, mirroring the preflight gate
Tested: orchestrator brings the env up/down and runs backend cases against a
live mock server with correct pass/fail + failure detail. Live UI flows and the
fix loop's Claude calls need the real app/CLI.
This completes the UI-contract + execution work held on this branch; ready to
bundle into one PR.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second half of the contract execution engine (held on branch per bundle plan).
- generate_playwright_spec: translate ui.flows into a Playwright spec - goto /
getByTestId click+fill (with getByLabel/getByRole/getByText fallback),
visible/hidden/text/url assertions, role-based storageState for allowed/
forbidden checks; persistence is delegated to backend cases
- run_ui_flows: generate the spec, run it via the project's `npx playwright
test --reporter=json`, and parse results
- parse_playwright_report: read stats.unexpected + failed titles into
CONTRACT_EXEC_FAILURES for the fix loop
- _pw_locator (testid → label → role → text) and _ts_safe helpers
Tested: spec generation for the canonical "create a quote" allowed flow + a
"viewer cannot" forbidden flow produces correct TS; report parsing handles
pass and fail. Live browser execution needs the real app + browser binaries.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First half of per-story contract execution (held on branch per the bundle plan).
- contract-exec.sh: granularity-agnostic engine the caller can invoke per-story
or per-epic
- contract_env_up / contract_env_down: bring the sample env up (setup → start →
poll readiness) and tear it down
- run_backend_cases: for each harness case, call the API (curl), assert status
and response body_contains (jq subset match), then verify persistence by
invoking datastore.verify_command as `<cmd> --table <t> --where <json>`;
failures are collected in CONTRACT_EXEC_FAILURES for the fix loop
- _json_contains: JSON-subset assertion helper
Tested against a local mock API: passing case (status + multi-field body +
persistence), status-mismatch failure, and persistence-miss failure all behave
correctly; env up/down orchestration smoke-tested. UI flow execution and gate
wiring are the next pieces; live end-to-end needs a real app.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Frontend analog of the API/DB contract harness: declare UI user-flow contracts
and validate readiness to run them. Per-story Playwright flow execution is the
next increment.
Harness (contract-harness.sh):
- New ui: section (driver, base_url, tests_dir, selector_strategy, roles, flows)
in the schema + scaffold template
- _ui_preflight checks the browser driver (Playwright/Cypress, presence only),
the tests directory, and that each declared role has a session seed
- Role-seed commands now feed prerequisite inference, so their env vars and
executables are checked like any other harness command
Design phase (design-phase.sh):
- Frontend lens now requires a stable data-testid on every interactive element
and a per-AC user_flow with an allowed/forbidden expectation
- Frontend schema gains components[].test_ids and a structured user_flows shape
- Critic enforces test-ids + user-flow expectations; validate_domain_completeness
warns (advisory) when interactive components lack a data-testid
data-testid is adopted incrementally - the lens requires it on the components a
story touches, with accessible role/label fallback for pre-existing ones. The
"credentials" piece is separate: ui.roles seeds for permission-based checks.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Let a project declare a contract-validation harness (contract-harness.yaml)
describing how to bring up a sample/test environment and verify API + database
contracts. The system validates readiness itself - the user never hand-checks.
- contract-harness.sh: discover the harness, auto-derive prerequisites from its
own commands (env var refs, executables, file paths) plus an optional
requires: block, then run a presence preflight with a ✓/✗ readiness report
- Startup wiring: real runs fail fast (abort before story 1 if prerequisites are
missing); dry runs print the report and exit non-zero when anything required
is missing, so --dry-run works as a CI readiness gate
- Opt-in deep connectivity smoke (--preflight-deep) boots the sample env, polls
the readiness URL, and tears down
- Safety guard: warns when the datastore target looks production-scoped
(contract validation must only ever touch a throwaway/test store)
- --init-harness scaffolds a commented template; --skip-contract-validation
bypasses the gate; full --help/usage documentation
Opt-in by presence of the harness file - projects without one are unaffected.
Parses/validates harness `cases`; executing them per-story is the next
increment.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tailor the design phase to the feature's domain instead of using one generic
(backend-flavored) schema for everything.
- classify_feature_domain: auto-detect frontend/backend/fullstack from an
explicit story Type:/Domain: field, then heuristic keyword scoring, failing
safe to fullstack (the superset) when ambiguous
- build_lens_block + build_domain_schema: inject a domain-specific planning
lens (component states, a11y, responsive / API contract, error handling,
migrations, concurrency, observability) and matching JSON fields, added to
the existing core schema (non-breaking)
- run_design_critic is now domain-aware: missing FE component states/a11y or
BE error paths/status codes are enforced as NEEDS_REVISION gaps via the
existing revision loop
- validate_domain_completeness: advisory warning + metric for the common
omissions (FE components without states, BE API without error handling)
- get_result_feature_type getter; TDD reconciliation now hints which test
kinds to emphasize per domain
Auto-detection only (no manual override flag yet). All additions are advisory
except the critic enforcement, preserving the non-blocking design contract.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Design phase improvement #7:
- Add build_planned_test_files_context, which extracts the test_files the
design phase already planned (resume-safe: in-memory then persisted file)
- Inject it into the test-spec phase prompt so TDD reuses the planned test
files and paths instead of independently deciding the test surface, and
flags any deviation
Returns empty (no-op) when there is no plan, no test_files, or jq is absent.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Design phase improvements #5 and #4:
#5 Deterministic, language-aware exploration:
- Add detect_project_type (node/rust/go/python) and build_repo_map, which
pre-computes a bounded repository map (project type, top-level structure,
representative source files) tailored to the detected language
- Inject the map into the design prompt instead of hardcoded JS/TS find
commands and "hope the model explores" guidance
#4 Critic loop:
- Add run_design_critic: a fresh-context skeptic that checks whether the plan
maps every acceptance criterion and conforms to the architecture, emitting
structured gaps
- execute_design_phase now generates -> critiques -> regenerates with gap
feedback, bounded by MAX_DESIGN_CRITIC_ATTEMPTS (default 2). Design stays
advisory: it always proceeds with the best plan and records a metric when
gaps remain
- Add --skip-design-critic flag and document MAX_DESIGN_CRITIC_ATTEMPTS
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Design phase improvement #3:
- Switch the design plan output contract from a free-text DESIGN START/END
block to a JSON result block, parsed via the shared extract_json_result /
check_phase_completion helpers (jq-less and legacy-text fallbacks retained)
- Add validate_design_coverage: warns + records a metric when the plan does
not map every acceptance criterion declared in the story (advisory only,
since design is a non-blocking phase). AC detection is heuristic and skips
when no AC identifiers are found or jq is unavailable
- Add a "design" case to the legacy fallback in check_phase_completion for
robustness when no JSON block is present
Hook bypassed: pre-existing markdownlint errors are confined to the gitignored
.context/ workspace dir; lint, format:check, and bash -n all pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Design phase improvements #1 and #2:
#1 Bounded context:
- Pass architecture.md by path instead of embedding full contents
(the main unbounded size risk in this prompt)
- Cap decision-log context at last 20KB (matches dev phase)
- Add log_prompt_size guard, consistent with other phases
- Replace hardcoded JS/TS exploration hints with language-agnostic guidance
#2 File persistence:
- Add DESIGN_DIR config and persist each plan to <DESIGN_DIR>/<story>-design.md
- build_design_context_for_dev falls back to the persisted file when the
in-memory plan is empty, so resumed runs keep their design context
Story file remains inlined (small, bounded, needed in full by the planner).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings 558 upstream commits (v6.0.0-Beta.8 → v6.8.0) into our fork while
preserving our automation surface.
Key upstream changes absorbed:
- src/bmm/ workflow layout retired in favor of src/bmm-skills/ (skill-based
architecture with SKILL.md + customize.toml per skill)
- src/core/ replaced by src/core-skills/
- New src/scripts/resolve_customization.py for TOML override merging
- Installer expansion (42 platforms, channel-based versioning, remote registry)
- New bmm-skills: bmad-investigate, bmad-spec, bmad-prd (rewritten),
bmad-prfaq, bmad-checkpoint-preview, bmad-customize, bmad-ux
- 15 new releases of doc translations, web bundles, validators
Our automation preserved:
- scripts/{epic-execute,epic-chain,uat-validate}.sh + epic-execute-lib/
(untouched by upstream — no path collision)
- src/bmm/workflows/4-implementation/{epic-execute,epic-chain}/
- src/bmm/workflows/5-validation/uat-validate/
Script path rewiring:
- epic-execute.sh: DEV_WORKFLOW_DIR/REVIEW_WORKFLOW_DIR now point at
src/bmm-skills/4-implementation/bmad-{dev-story,code-review}/, with
*_WORKFLOW_SKILL pointing at SKILL.md (replaces old workflow.yaml +
instructions.xml pair). Removed CORE_TASKS_DIR/WORKFLOW_EXECUTOR refs
(src/core/tasks/workflow.xml no longer exists upstream).
- epic-execute-lib/INIT.md: updated directory tree doc and inspection
commands to reference new skill paths.
Conflict resolutions:
- README.md, docs/reference/workflow-map.md, docs/tutorials/getting-started.md,
src/bmm-skills/4-implementation/bmad-create-story/checklist.md → took
upstream verbatim (stay aligned with their docs direction).
- package.json → took upstream (dropped test:schemas/validate:schemas since
agent YAML schema retired with src/bmm/agents/; added test:urls,
test:channels, validate:skills; rebundle path moved cli/ → installer/).
- src/bmm/agents/sm.agent.yaml, src/bmm/workflows/3-solutioning/create-architecture/workflow.md,
src/bmm/workflows/document-project/instructions.md → accepted upstream's
deletion (modify/delete conflict; these files are part of the retired
src/bmm/ layout).
Validated:
- All 4 shell scripts pass `bash -n` syntax check.
- No stale path references remain in scripts/ (grep clean).
Not yet validated:
- `npm install` + `npm test` (run as follow-up).
- Functional smoke test of epic-execute.sh against a real story (manual,
follow-up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync functional improvements developed in revive-dev into BMAD-METHOD
fork while preserving repo-specific paths:
- Add memory-safe Claude helpers (run_claude_to_file, read_phase_tail)
that pipe output to temp files instead of bash variables, preventing
GB-scale RAM usage during long epic executions
- Add kill_orphaned_test_processes() to clean up zombie jest/vitest/
playwright/pytest processes between stories and on exit
- Replace per-call `env -u CLAUDECODE` with global `unset CLAUDECODE`
at script start for cleaner nested session support
- Port metrics resume/accumulation logic that restores counters from
existing YAML on resumed runs and accumulates duration
- Add log truncation between stories (64KB cap) to prevent unbounded
log growth across multi-story runs
- Add log persistence and cleanup trap to epic-chain.sh
- Revert regression-gate.sh test commands to direct execution (matching
revive-dev pattern)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Wrap all Claude CLI subprocess calls with `env -u CLAUDECODE` to prevent
parent env var interference with child processes (17 sites across 7 files)
- Add `flush_log_to_repo()` to epic-execute.sh for incremental log persistence
after each story completes or fails (prevents log loss on interruption)
- Add portable `run_with_timeout` utility to utils.sh and wrap all test
invocations in epic-execute.sh and regression-gate.sh with configurable
timeout (default 120s via REGRESSION_TEST_TIMEOUT)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merge 217 upstream commits covering:
- Module restructuring: src/modules/ flattened to src/
- BMB, BMGD, CIS modules moved to separate repos
- Installer migrated from inquirer to @clack/prompts
- Workflow simplification and new naming conventions
- Docusaurus to Astro/Starlight docs migration
- CodeRabbit AI review integration
- Cross-file reference validator
- Non-interactive install support
- Kiro IDE support
Conflict resolutions:
- sm.agent.yaml: kept fork's EE/EC/UV/CR menu items, took upstream's CC description
- uat-validator.agent.yaml: moved to src/bmm/agents/, removed unsupported webskip field
Path updates for new structure:
- scripts/*.sh: src/modules/bmm/ → src/bmm/
- CLAUDE.md: updated module paths and removed references to extracted modules
- docs: fixed broken links, added Astro frontmatter to fork docs files
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port improvements developed in revive-dev: new test-failure-filter module for
baseline-aware failure detection and prompt size management, broken pipe fixes
in regression-gate, and log persistence in epic-execute. Paths adapted to
BMAD-METHOD repo structure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- L1: Add checkpoint/resume capability with --resume flag
- load_checkpoint(), save_checkpoint(), clear_checkpoint() in utils.sh
- Auto-saves progress after each story, 7-day expiration
- L2: Add comprehensive --help option with grouped options and examples
- show_help() function with environment variables and file locations
- L3: Add verbose Claude output streaming
- execute_claude_verbose() for real-time output when --verbose set
- L4: Add metrics file archival to prevent unbounded growth
- Archives to metrics/archive/, keeps last 10 per epic
- L5: Add workflow file content validation
- validate_yaml_content(), validate_xml_content() with fallbacks
- Integrated into validate_workflows()
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add new utils.sh module with cross-platform support and reliability improvements:
- M1: execute_with_retry() with exponential backoff for transient failures
- M2: validate_yq() to detect Go vs Python yq versions with fallback
- M3: check_phase_completion_fuzzy() for case-insensitive signal detection
- M4: sed_inplace() for cross-platform sed (macOS/Linux compatibility)
- M5: check_branch_protection() to prevent commits to main/master
Update json-output.sh with enhanced JSON extraction using awk for multi-block
handling, normalized status comparison, and fuzzy matching fallback.
Update epic-execute.sh to source utils.sh and use cross-platform sed functions.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
H1: Git Add -A Safety
- Add check_sensitive_files() to detect untracked secrets
- Replace git add -A with git add -u (tracked files only)
- Update prompts to use explicit file staging
H2: Cleanup on Exit
- Add cleanup() function with trap handler for EXIT/INT/TERM
- Save checkpoint file for resume capability
- Finalize metrics and report uncommitted changes on exit
- Track CURRENT_STORY_INDEX throughout main loop
H3: Test Count Parsing (Multi-Framework)
- Add extract_test_count() supporting Jest, Mocha, Vitest, AVA, TAP, pytest, Go, Rust
- Try JSON output first, fall back to regex patterns
- Replace inline patterns in init_regression_baseline() and execute_regression_gate()
H4: Story Discovery (Word Boundaries)
- Fix grep pattern with word boundary: ${EPIC_ID}([^0-9]|$)
- Prevents "Epic: 1" from matching "Epic: 10" or "Epic: 100"
- Add associative array deduplication (bash 4+) with fallback
H5: Prompt Size Limits
- Add MAX_PROMPT_SIZE config (default 150KB)
- Add get_byte_size(), truncate_content(), build_sized_prompt()
- Truncate large workflow YAML and decision logs
- Add verbose logging of prompt sizes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the final two improvements from bmad_improvements_v2.md:
## Structured JSON Output (Improvement #6)
- New module: scripts/epic-execute-lib/json-output.sh
- Functions for extracting and parsing JSON from Claude output
- Unified check_phase_completion() with JSON + text fallback
- Updated prompts to request JSON result blocks
- Added --legacy-output flag to disable JSON parsing
## Test-First Flow (Improvement #7)
- New module: scripts/epic-execute-lib/tdd-flow.sh
- execute_test_spec_phase() - Generates BDD specs from acceptance criteria
- execute_test_impl_phase() - Creates failing tests from specs
- execute_test_verification_phase() - Verifies tests fail correctly
- Integration with dev phase for TDD context
- Added --skip-tdd, --skip-test-spec, --skip-test-impl flags
All 7 improvements from the analysis are now complete.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add decision log module for context preservation across phases
- Add regression gate module for test baseline tracking
- Add design phase module for pre-implementation planning
- Enhance fix phase to include real tooling output
- Pass design and decision context to dev phase
- Add --skip-design and --skip-regression CLI flags
- Modularize into epic-execute-lib/ for maintainability
Implements improvements from bmad_improvements_v2.md:
- Phase 2.1: Real test output in fix loops
- Phase 2.2: Cumulative decision log
- Phase 2.3: Regression test gate
- Phase 3.1: Pre-implementation design phase
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>