Merge origin/main: sync epic-specific tracking files with backwards compatibility

Resolved conflict in autonomous-epic/workflow.yaml by:
- Accepting origin/main's cleaner naming: .autonomous-epic-{epic_num}-progress.yaml
- Adding backwards compatibility to check both new and legacy formats
- Updated all progress file references to use dynamic {{progress_file_path}}

Changes:
- workflow.yaml: Use new naming convention
- instructions.xml: Check for both formats (new + legacy) on resume
- README.md: Document backwards compatibility

This ensures no in-progress epics are missed when upgrading between versions.
This commit is contained in:
Jonah Schulte 2026-01-02 20:20:35 -05:00
commit 343b4ef425
33 changed files with 6500 additions and 218 deletions

View File

@ -0,0 +1,13 @@
---
description: 'Validate and fix sprint-status.yaml for ALL epics. Scans every story file, validates quality, counts tasks, updates sprint-status.yaml to match REALITY across entire project.'
---
IT IS CRITICAL THAT YOU FOLLOW THESE STEPS - while staying in character as the current agent persona you may have loaded:
<steps CRITICAL="TRUE">
1. Always LOAD the FULL @_bmad/core/tasks/workflow.xml
2. READ its entire contents - this is the CORE OS for EXECUTING the specific workflow-config @_bmad/bmm/workflows/4-implementation/validate-all-epics/workflow.yaml
3. Pass the yaml path _bmad/bmm/workflows/4-implementation/validate-all-epics/workflow.yaml as 'workflow-config' parameter to the workflow.xml instructions
4. Follow workflow.xml instructions EXACTLY as written to process and follow the specific workflow config and its instructions
5. Save outputs after EACH section when generating any documents from templates
</steps>

View File

@ -0,0 +1,13 @@
---
description: 'Validate and fix sprint-status.yaml for a single epic. Scans story files for task completion, validates quality (>10KB, proper tasks), updates sprint-status.yaml to match REALITY.'
---
IT IS CRITICAL THAT YOU FOLLOW THESE STEPS - while staying in character as the current agent persona you may have loaded:
<steps CRITICAL="TRUE">
1. Always LOAD the FULL @_bmad/core/tasks/workflow.xml
2. READ its entire contents - this is the CORE OS for EXECUTING the specific workflow-config @_bmad/bmm/workflows/4-implementation/validate-epic-status/workflow.yaml
3. Pass the yaml path _bmad/bmm/workflows/4-implementation/validate-epic-status/workflow.yaml as 'workflow-config' parameter to the workflow.xml instructions
4. Follow workflow.xml instructions EXACTLY as written to process and follow the specific workflow config and its instructions
5. Save outputs after EACH section when generating any documents from templates
</steps>

View File

@ -0,0 +1,101 @@
# How to Validate Sprint Status - Complete Guide
**Created:** 2026-01-02
**Purpose:** Ensure sprint-status.yaml and story files reflect REALITY, not fiction
---
## Three Levels of Validation
### Level 1: Status Field Validation (FAST - Free)
Compare Status field in story files vs sprint-status.yaml
**Cost:** Free | **Time:** 5 seconds
```bash
python3 scripts/lib/sprint-status-updater.py --mode validate
```
### Level 2: Deep Story Validation (MEDIUM - $0.15/story)
Haiku agent reads actual code and verifies all tasks
**Cost:** ~$0.15/story | **Time:** 2-5 min/story
```bash
/validate-story-deep docs/sprint-artifacts/16e-6-ecs-task-definitions-tier3.md
```
### Level 3: Comprehensive Platform Audit (DEEP - $76 total)
Validates ALL 511 stories using batched Haiku agents
**Cost:** ~$76 total | **Time:** 4-6 hours
```bash
/validate-all-stories-deep
/validate-all-stories-deep --epic 16e # Or filter to specific epic
```
---
## Why Haiku Not Sonnet
**Per story cost:**
- Haiku: $0.15
- Sonnet: $1.80
- **Savings: 92%**
**Full platform:**
- Haiku: $76
- Sonnet: $920
- **Savings: $844**
**Agent startup overhead (why ONE agent per story):**
- Bad: 50 tasks × 50 agents = 2.5M tokens overhead
- Good: 1 agent reads all files, verifies all 50 tasks = 25K overhead
- **Savings: 99% less overhead**
---
## Batching (Max 5 Stories Concurrent)
**Why batch_size = 5:**
- Prevents spawning 511 agents at once
- Allows progress saving/resuming
- Rate limiting friendly
**Execution:**
- Batch 1: Stories 1-5 (5 agents)
- Wait for completion
- Batch 2: Stories 6-10 (5 agents)
- ...continues until done
---
## What Gets Verified
For each task, Haiku agent:
1. Finds files with Glob/Grep
2. Reads code with Read tool
3. Checks for stubs/TODOs
4. Verifies tests exist
5. Checks multi-tenant isolation
6. Reports: actually_complete, evidence, issues
---
## Commands Reference
```bash
# Weekly validation (free, 5 sec)
python3 scripts/lib/sprint-status-updater.py --mode validate
# Fix discrepancies
python3 scripts/lib/sprint-status-updater.py --mode fix
# Deep validate one story ($0.15, 2-5 min)
/validate-story-deep docs/sprint-artifacts/STORY.md
# Comprehensive audit ($76, 4-6h)
/validate-all-stories-deep
```
---
**Files:** `_bmad/bmm/workflows/4-implementation/validate-*-deep/`

View File

@ -0,0 +1,482 @@
# Sprint Status Sync - Complete Guide
**Created:** 2026-01-02
**Purpose:** Prevent drift between story files and sprint-status.yaml
**Status:** PRODUCTION READY
---
## 🚨 THE PROBLEM WE SOLVED
**Before Fix (2026-01-02):**
- 78% of story files (435/552) had NO `Status:` field
- 30+ completed stories not reflected in sprint-status.yaml
- Epic 19: 28 stories done, sprint-status said "in-progress"
- Epic 16d: 3 stories done, sprint-status said "backlog"
- Last verification: 32+ hours old
**Root Cause:**
- Autonomous workflows prioritized velocity over tracking
- Manual workflows didn't enforce status updates
- No automated sync mechanism
- sprint-status.yaml manually maintained
---
## ✅ THE SOLUTION (Full Workflow Fix)
### Component 1: Automated Sync Script
**Script:** `scripts/sync-sprint-status.sh`
**Purpose:** Scan story Status: fields → Update sprint-status.yaml
**Usage:**
```bash
# Update sprint-status.yaml
pnpm sync:sprint-status
# Preview changes (no modifications)
pnpm sync:sprint-status:dry-run
# Validate only (exit 1 if out of sync)
pnpm validate:sprint-status
```
**Features:**
- Only updates stories WITH explicit Status: fields
- Skips stories without Status: (trusts sprint-status.yaml)
- Creates automatic backups (.sprint-status-backups/)
- Preserves all comments and structure
- Returns clear pass/fail exit codes
---
### Component 2: Workflow Enforcement
**Modified Files:**
1. `_bmad/bmm/workflows/4-implementation/dev-story/instructions.xml`
2. `_bmad/bmm/workflows/4-implementation/autonomous-epic/instructions.xml`
**Changes:**
- ✅ HALT if story not found in sprint-status.yaml (was: warning)
- ✅ Verify sprint-status.yaml update persisted (new validation)
- ✅ Update both story Status: field AND sprint-status.yaml
- ✅ Fail loudly if either update fails
**Before:** Workflows logged warnings, continued anyway
**After:** Workflows HALT if tracking update fails
---
### Component 3: CI/CD Validation
**Workflow:** `.github/workflows/validate-sprint-status.yml`
**Trigger:** Every PR touching docs/sprint-artifacts/
**Checks:**
1. sprint-status.yaml exists
2. All changed story files have Status: fields
3. sprint-status.yaml is in sync (runs validation)
4. Blocks merge if validation fails
**How to fix CI failures:**
```bash
# See what's wrong
./scripts/sync-sprint-status.sh --dry-run
# Fix it
./scripts/sync-sprint-status.sh
# Commit
git add docs/sprint-artifacts/sprint-status.yaml
git commit -m "chore: sync sprint-status.yaml"
git push
```
---
### Component 4: pnpm Scripts
**Added to package.json:**
```json
{
"scripts": {
"sync:sprint-status": "./scripts/sync-sprint-status.sh",
"sync:sprint-status:dry-run": "./scripts/sync-sprint-status.sh --dry-run",
"validate:sprint-status": "./scripts/sync-sprint-status.sh --validate"
}
}
```
**When to run:**
- `pnpm sync:sprint-status` - After manually updating story Status: fields
- `pnpm validate:sprint-status` - Before committing changes
- Automatically in CI/CD - Validates on every PR
---
## 🎯 NEW WORKFLOW (How It Works Now)
### When Creating a Story
```
/create-story workflow
1. Generate story file with Status: ready-for-dev
2. Add entry to sprint-status.yaml with status "ready-for-dev"
3. HALT if sprint-status.yaml update fails
✅ Story file and sprint-status.yaml both updated
```
### When Implementing a Story
```
/dev-story workflow
1. Load story, start work
2. Mark tasks complete [x]
3. Run tests, validate
4. Update story Status: "in-progress" → "review"
5. Update sprint-status.yaml: "in-progress" → "review"
6. VERIFY sprint-status.yaml update persisted
7. HALT if verification fails
✅ Both updated and verified
```
### When Running Autonomous Epic
```
/autonomous-epic workflow
For each story:
1. Run super-dev-pipeline
2. Check all tasks complete
3. Update story Status: "done"
4. Update sprint-status.yaml entry to "done"
5. Verify update persisted
6. Log failure if verification fails (don't halt - continue)
After all stories:
7. Mark epic "done" in sprint-status.yaml
8. Verify epic status persisted
✅ All stories and epic status updated
```
---
## 🛡️ ENFORCEMENT MECHANISMS
### 1. Required Fields (Create-Story)
- **Enforcement:** Story MUST be added to sprint-status.yaml during creation
- **Validation:** Workflow HALTS if story not found after creation
- **Result:** No orphaned stories
### 2. Status Updates (Dev-Story)
- **Enforcement:** Both Status: field AND sprint-status.yaml MUST update
- **Validation:** Re-read sprint-status.yaml to verify update
- **Result:** No silent failures
### 3. Verification (Autonomous-Epic)
- **Enforcement:** Sprint-status.yaml updated after each story
- **Validation:** Verify update persisted, log failure if not
- **Result:** Tracking stays in sync even during autonomous runs
### 4. CI/CD Gates (GitHub Actions)
- **Enforcement:** PR merge blocked if validation fails
- **Validation:** Runs `pnpm validate:sprint-status` on every PR
- **Result:** Drift cannot be merged
---
## 📋 MANUAL SYNC PROCEDURES
### If sprint-status.yaml Gets Out of Sync
**Scenario 1: Story Status: fields updated but sprint-status.yaml not synced**
```bash
# See what needs updating
pnpm sync:sprint-status:dry-run
# Apply updates
pnpm sync:sprint-status
# Verify
pnpm validate:sprint-status
# Commit
git add docs/sprint-artifacts/sprint-status.yaml
git commit -m "chore: sync sprint-status.yaml with story updates"
```
**Scenario 2: sprint-status.yaml has truth, story files missing Status: fields**
```bash
# Create script to backfill Status: fields FROM sprint-status.yaml
./scripts/backfill-story-status-fields.sh # (To be created if needed)
# This would:
# 1. Read sprint-status.yaml
# 2. For each story entry, find the story file
# 3. Add/update Status: field to match sprint-status.yaml
# 4. Preserve all other content
```
**Scenario 3: Massive drift after autonomous work**
```bash
# Option A: Trust sprint-status.yaml (if it was manually verified)
# - Backfill story Status: fields from sprint-status.yaml
# - Don't run sync (sprint-status.yaml is source of truth)
# Option B: Trust story Status: fields (if recently updated)
# - Run sync to update sprint-status.yaml
pnpm sync:sprint-status
# Option C: Manual audit (when both are uncertain)
# - Review SPRINT-STATUS-AUDIT-2026-01-02.md
# - Check git commits for completion evidence
# - Manually correct both files
```
---
## 🧪 TESTING
### Test 1: Validate Current State
```bash
pnpm validate:sprint-status
# Should exit 0 if in sync, exit 1 if discrepancies
```
### Test 2: Dry Run (No Changes)
```bash
pnpm sync:sprint-status:dry-run
# Shows what WOULD change without applying
```
### Test 3: Apply Sync
```bash
pnpm sync:sprint-status
# Updates sprint-status.yaml, creates backup
```
### Test 4: CI/CD Simulation
```bash
# Simulate PR validation
.github/workflows/validate-sprint-status.yml
# (Run via act or GitHub Actions)
```
---
## 📊 METRICS & MONITORING
### How to Check Sprint Health
**Check 1: Discrepancy Count**
```bash
pnpm sync:sprint-status:dry-run 2>&1 | grep "discrepancies"
# Should show: "0 discrepancies" if healthy
```
**Check 2: Last Verification Timestamp**
```bash
head -5 docs/sprint-artifacts/sprint-status.yaml | grep last_verified
# Should be within last 24 hours
```
**Check 3: Stories Missing Status: Fields**
```bash
grep -L "^Status:" docs/sprint-artifacts/*.md | wc -l
# Should decrease over time as stories get Status: fields
```
### Alerts to Set Up (Future)
- ⚠️ If last_verified > 7 days old → Manual audit recommended
- ⚠️ If discrepancy count > 10 → Investigate why sync not running
- ⚠️ If stories without Status: > 50 → Backfill campaign needed
---
## 🎓 BEST PRACTICES
### For Story Creators
1. Always use `/create-story` workflow (adds to sprint-status.yaml automatically)
2. Never create story .md files manually
3. Always include Status: field in story template
### For Story Implementers
1. Use `/dev-story` workflow (updates both Status: and sprint-status.yaml)
2. If manually updating Status: field, run `pnpm sync:sprint-status` after
3. Before marking "done", verify sprint-status.yaml reflects your work
### For Autonomous Workflows
1. autonomous-epic workflow now includes sprint-status.yaml updates
2. Verifies updates persisted after each story
3. Logs failures but continues (doesn't halt entire epic for tracking issues)
### For Code Reviewers
1. Check that PR includes sprint-status.yaml update if stories changed
2. Verify CI/CD validation passes
3. If validation fails, request sync before approving
---
## 🔧 MAINTENANCE
### Weekly Tasks
- [ ] Review discrepancy count: `pnpm sync:sprint-status:dry-run`
- [ ] Run sync if needed: `pnpm sync:sprint-status`
- [ ] Check backup count: `ls -1 .sprint-status-backups/ | wc -l`
- [ ] Clean old backups (keep last 30 days)
### Monthly Tasks
- [ ] Full audit: Review SPRINT-STATUS-AUDIT template
- [ ] Backfill missing Status: fields (reduce count to <10)
- [ ] Verify all epics have correct status
- [ ] Update this guide based on learnings
---
## 📝 FILE REFERENCE
**Core Files:**
- `docs/sprint-artifacts/sprint-status.yaml` - Single source of truth
- `scripts/sync-sprint-status.sh` - Bash wrapper script
- `scripts/lib/sprint-status-updater.py` - Python updater logic
**Workflow Files:**
- `_bmad/bmm/workflows/4-implementation/dev-story/instructions.xml`
- `_bmad/bmm/workflows/4-implementation/autonomous-epic/instructions.xml`
- `_bmad/bmm/workflows/4-implementation/create-story-with-gap-analysis/step-03-generate-story.md`
**CI/CD:**
- `.github/workflows/validate-sprint-status.yml`
**Documentation:**
- `SPRINT-STATUS-AUDIT-2026-01-02.md` - Initial audit findings
- `docs/workflows/SPRINT-STATUS-SYNC-GUIDE.md` - This file
---
## 🐛 TROUBLESHOOTING
### Issue: "Story not found in sprint-status.yaml"
**Cause:** Story file created outside of /create-story workflow
**Fix:**
```bash
# Manually add to sprint-status.yaml under correct epic
vim docs/sprint-artifacts/sprint-status.yaml
# Add line: story-id: ready-for-dev
# Or re-run create-story workflow
/create-story
```
### Issue: "sprint-status.yaml update failed to persist"
**Cause:** File system permissions or concurrent writes
**Fix:**
```bash
# Check file permissions
ls -la docs/sprint-artifacts/sprint-status.yaml
# Check for file locks
lsof | grep sprint-status.yaml
# Manual update if needed
vim docs/sprint-artifacts/sprint-status.yaml
```
### Issue: "85 discrepancies found"
**Cause:** Story Status: fields not updated after completion
**Fix:**
```bash
# Review discrepancies
pnpm sync:sprint-status:dry-run
# Apply updates (will update sprint-status.yaml to match story files)
pnpm sync:sprint-status
# If story files are WRONG (Status: ready-for-dev but actually done):
# Manually update story Status: fields first
# Then run sync
```
---
## 🎯 SUCCESS CRITERIA
**System is working correctly when:**
- ✅ `pnpm validate:sprint-status` exits 0 (no discrepancies)
- ✅ Last verified timestamp < 24 hours old
- ✅ Stories with missing Status: fields < 10
- ✅ CI/CD validation passes on all PRs
- ✅ New stories automatically added to sprint-status.yaml
**System needs attention when:**
- ❌ Discrepancy count > 10
- ❌ Last verified > 7 days old
- ❌ CI/CD validation failing frequently
- ❌ Stories missing Status: fields > 50
---
## 🔄 MIGRATION CHECKLIST (One-Time)
If implementing this on an existing project:
- [x] Create scripts/sync-sprint-status.sh
- [x] Create scripts/lib/sprint-status-updater.py
- [x] Modify dev-story workflow (add enforcement)
- [x] Modify autonomous-epic workflow (add verification)
- [x] Add CI/CD validation workflow
- [x] Add pnpm scripts
- [x] Run initial sync: `pnpm sync:sprint-status`
- [ ] Backfill missing Status: fields (optional, gradual)
- [x] Document in this guide
- [ ] Train team on new workflow
- [ ] Monitor for 2 weeks, adjust as needed
---
## 📈 EXPECTED OUTCOMES
**Immediate (Week 1):**
- sprint-status.yaml stays in sync
- New stories automatically tracked
- Autonomous work properly recorded
**Short-term (Month 1):**
- Discrepancy count approaches zero
- CI/CD catches drift before merge
- Team trusts sprint-status.yaml as source of truth
**Long-term (Month 3+):**
- Zero manual sprint-status.yaml updates needed
- Automated reporting reliable
- Velocity metrics accurate
---
**Last Updated:** 2026-01-02
**Status:** Active - Production Ready
**Maintained By:** Platform Team

112
scripts/lib/add-status-fields.py Executable file
View File

@ -0,0 +1,112 @@
#!/usr/bin/env python3
"""
Add Status field to story files that are missing it.
Uses sprint-status.yaml as source of truth.
"""
import re
from pathlib import Path
from typing import Dict
def load_sprint_status(path: str = "docs/sprint-artifacts/sprint-status.yaml") -> Dict[str, str]:
"""Load story statuses from sprint-status.yaml"""
with open(path) as f:
lines = f.readlines()
statuses = {}
in_dev_status = False
for line in lines:
if 'development_status:' in line:
in_dev_status = True
continue
if in_dev_status:
# Check if we've left development_status section
if line.strip() and not line.startswith(' ') and not line.startswith('#'):
break
# Parse story line: " story-id: status # comment"
match = re.match(r' ([a-z0-9-]+):\s*(\S+)', line)
if match:
story_id, status = match.groups()
statuses[story_id] = status
return statuses
def add_status_to_story(story_file: Path, status: str) -> bool:
"""Add Status field to story file if missing"""
content = story_file.read_text()
# Check if Status field already exists (handles both "Status:" and "**Status:**")
if re.search(r'^\*?\*?Status:', content, re.MULTILINE | re.IGNORECASE):
return False # Already has Status field
# Find the first section after the title (usually ## Story or ## Description)
# Insert Status field before that
lines = content.split('\n')
# Find insertion point (after title, before first ## section)
insert_idx = None
for idx, line in enumerate(lines):
if line.startswith('# ') and idx == 0:
# Title line - keep looking
continue
if line.startswith('##'):
# Found first section - insert before it
insert_idx = idx
break
if insert_idx is None:
# No ## sections found, insert after title
insert_idx = 1
# Insert blank line, Status field, blank line
lines.insert(insert_idx, '')
lines.insert(insert_idx + 1, f'**Status:** {status}')
lines.insert(insert_idx + 2, '')
# Write back
story_file.write_text('\n'.join(lines))
return True
def main():
story_dir = Path("docs/sprint-artifacts")
statuses = load_sprint_status()
added = 0
skipped = 0
missing = 0
for story_file in sorted(story_dir.glob("*.md")):
story_id = story_file.stem
# Skip special files
if (story_id.startswith('.') or
story_id.startswith('EPIC-') or
'COMPLETION' in story_id.upper() or
'SUMMARY' in story_id.upper() or
'REPORT' in story_id.upper() or
'README' in story_id.upper()):
continue
if story_id not in statuses:
print(f"⚠️ {story_id}: Not in sprint-status.yaml")
missing += 1
continue
status = statuses[story_id]
if add_status_to_story(story_file, status):
print(f"{story_id}: Added Status: {status}")
added += 1
else:
skipped += 1
print()
print(f"✅ Added Status field to {added} stories")
print(f" Skipped {skipped} stories (already have Status)")
print(f"⚠️ {missing} stories not in sprint-status.yaml")
if __name__ == '__main__':
main()

View File

@ -0,0 +1,219 @@
/**
* AWS Bedrock Client for Test Generation
*
* Alternative to Anthropic API - uses AWS Bedrock Runtime
* Requires: source ~/git/creds-nonprod.sh (or creds-prod.sh)
*/
import { BedrockRuntimeClient, InvokeModelCommand } from '@aws-sdk/client-bedrock-runtime';
import { RateLimiter } from './rate-limiter.js';
export interface GenerateTestOptions {
sourceCode: string;
sourceFilePath: string;
testTemplate: string;
model?: string;
temperature?: number;
maxTokens?: number;
}
export interface GenerateTestResult {
testCode: string;
tokensUsed: number;
model: string;
}
export class BedrockClient {
private client: BedrockRuntimeClient;
private rateLimiter: RateLimiter;
private model: string;
constructor(region: string = 'us-east-1') {
// AWS SDK will automatically use credentials from environment
// (set via source ~/git/creds-nonprod.sh)
this.client = new BedrockRuntimeClient({ region });
this.rateLimiter = new RateLimiter({
requestsPerMinute: 50,
maxRetries: 3,
maxConcurrent: 5,
});
// Use application-specific inference profile ARN (not foundation model ID)
// Cross-region inference profiles (us.*) are blocked by SCP
// Pattern from: illuminizer/src/services/coxAi/modelMapping.ts
this.model = 'arn:aws:bedrock:us-east-1:247721768464:application-inference-profile/pzxu78pafm8x';
}
/**
* Generate test file from source code using Bedrock
*/
async generateTest(options: GenerateTestOptions): Promise<GenerateTestResult> {
const systemPrompt = this.buildSystemPrompt();
const userPrompt = this.buildUserPrompt(options);
const result = await this.rateLimiter.withRetry(async () => {
// Bedrock request format (different from Anthropic API)
const payload = {
anthropic_version: 'bedrock-2023-05-31',
max_tokens: options.maxTokens ?? 8000,
temperature: options.temperature ?? 0,
system: systemPrompt,
messages: [
{
role: 'user',
content: userPrompt,
},
],
};
const command = new InvokeModelCommand({
modelId: options.model ?? this.model,
contentType: 'application/json',
accept: 'application/json',
body: JSON.stringify(payload),
});
const response = await this.client.send(command);
// Parse Bedrock response
const responseBody = JSON.parse(new TextDecoder().decode(response.body));
if (!responseBody.content || responseBody.content.length === 0) {
throw new Error('Empty response from Bedrock');
}
const content = responseBody.content[0];
if (content.type !== 'text') {
throw new Error('Unexpected response format from Bedrock');
}
return {
testCode: this.extractCodeFromResponse(content.text),
tokensUsed: responseBody.usage.input_tokens + responseBody.usage.output_tokens,
model: this.model,
};
}, `Generate test for ${options.sourceFilePath}`);
return result;
}
/**
* Build system prompt (same as Anthropic client)
*/
private buildSystemPrompt(): string {
return `You are an expert TypeScript test engineer specializing in NestJS backend testing.
Your task is to generate comprehensive, production-quality test files that:
- Follow NestJS testing patterns exactly
- Achieve 80%+ code coverage
- Test happy paths AND error scenarios
- Mock all external dependencies properly
- Include multi-tenant isolation tests
- Use proper TypeScript types (ZERO any types)
- Are immediately runnable without modifications
Key Requirements:
1. Test Structure: Use describe/it blocks with clear test names
2. Mocking: Use jest.Mocked<T> for type-safe mocks
3. Coverage: Test all public methods + edge cases
4. Error Handling: Test all error scenarios (NotFound, Conflict, BadRequest, etc.)
5. Multi-Tenant: Verify dealerId isolation in all operations
6. Performance: Include basic performance tests where applicable
7. Type Safety: No any types, proper interfaces, type guards
Code Quality Standards:
- Descriptive test names: "should throw NotFoundException when user not found"
- Clear arrange/act/assert structure
- Minimal but complete mocking (don't mock what you don't need)
- Test behavior, not implementation details
Output Format:
- Return ONLY the complete test file code
- No explanations, no markdown formatting
- Include all necessary imports
- Follow the template structure provided`;
}
/**
* Build user prompt (same as Anthropic client)
*/
private buildUserPrompt(options: GenerateTestOptions): string {
return `Generate a comprehensive test file for this TypeScript source file:
File Path: ${options.sourceFilePath}
Source Code:
\`\`\`typescript
${options.sourceCode}
\`\`\`
Template to Follow:
\`\`\`typescript
${options.testTemplate}
\`\`\`
Instructions:
1. Analyze the source code to identify:
- All public methods that need testing
- Dependencies that need mocking
- Error scenarios to test
- Multi-tenant considerations (dealerId filtering)
2. Generate tests that cover:
- Initialization (dependency injection)
- Core functionality (all CRUD operations)
- Error handling (NotFound, Conflict, validation errors)
- Multi-tenant isolation (prevent cross-dealer access)
- Edge cases (null inputs, empty arrays, boundary values)
3. Follow the template structure:
- Section 1: Initialization
- Section 2: Core functionality (one describe per method)
- Section 3: Error handling
- Section 4: Multi-tenant isolation
- Section 5: Performance (if applicable)
4. Quality requirements:
- 80%+ coverage target
- Type-safe mocks using jest.Mocked<T>
- Descriptive test names
- No any types
- Proper imports
Output the complete test file code now:`;
}
/**
* Extract code from response (same as Anthropic client)
*/
private extractCodeFromResponse(response: string): string {
let code = response.trim();
code = code.replace(/^```(?:typescript|ts)?\n/i, '');
code = code.replace(/\n```\s*$/i, '');
return code;
}
/**
* Estimate cost for Bedrock (different pricing than Anthropic API)
*/
estimateCost(sourceCodeLength: number, numFiles: number): { inputTokens: number; outputTokens: number; estimatedCost: number } {
const avgInputTokensPerFile = Math.ceil(sourceCodeLength / 4) + 10000;
const avgOutputTokensPerFile = 3000;
const totalInputTokens = avgInputTokensPerFile * numFiles;
const totalOutputTokens = avgOutputTokensPerFile * numFiles;
// Bedrock pricing for Claude Sonnet 4 (as of 2026-01):
// - Input: $0.003 per 1k tokens
// - Output: $0.015 per 1k tokens
const inputCost = (totalInputTokens / 1000) * 0.003;
const outputCost = (totalOutputTokens / 1000) * 0.015;
return {
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
estimatedCost: inputCost + outputCost,
};
}
}

View File

@ -0,0 +1,212 @@
/**
* Claude API Client for Test Generation
*
* Handles API communication with proper error handling and rate limiting.
*/
import Anthropic from '@anthropic-ai/sdk';
import { RateLimiter } from './rate-limiter.js';
export interface GenerateTestOptions {
sourceCode: string;
sourceFilePath: string;
testTemplate: string;
model?: string;
temperature?: number;
maxTokens?: number;
}
export interface GenerateTestResult {
testCode: string;
tokensUsed: number;
model: string;
}
export class ClaudeClient {
private client: Anthropic;
private rateLimiter: RateLimiter;
private model: string;
constructor(apiKey?: string) {
const key = apiKey ?? process.env.ANTHROPIC_API_KEY;
if (!key) {
throw new Error(
'ANTHROPIC_API_KEY environment variable is required.\n' +
'Please set it with: export ANTHROPIC_API_KEY=sk-ant-...'
);
}
this.client = new Anthropic({ apiKey: key });
this.rateLimiter = new RateLimiter({
requestsPerMinute: 50,
maxRetries: 3,
maxConcurrent: 5,
});
this.model = 'claude-sonnet-4-5-20250929'; // Sonnet 4.5 for speed + quality balance
}
/**
* Generate test file from source code
*/
async generateTest(options: GenerateTestOptions): Promise<GenerateTestResult> {
const systemPrompt = this.buildSystemPrompt();
const userPrompt = this.buildUserPrompt(options);
const result = await this.rateLimiter.withRetry(async () => {
const response = await this.client.messages.create({
model: options.model ?? this.model,
max_tokens: options.maxTokens ?? 8000,
temperature: options.temperature ?? 0, // 0 for consistency
system: systemPrompt,
messages: [
{
role: 'user',
content: userPrompt,
},
],
});
const content = response.content[0];
if (content.type !== 'text') {
throw new Error('Unexpected response format from Claude API');
}
return {
testCode: this.extractCodeFromResponse(content.text),
tokensUsed: response.usage.input_tokens + response.usage.output_tokens,
model: response.model,
};
}, `Generate test for ${options.sourceFilePath}`);
return result;
}
/**
* Build system prompt with test generation instructions
*/
private buildSystemPrompt(): string {
return `You are an expert TypeScript test engineer specializing in NestJS backend testing.
Your task is to generate comprehensive, production-quality test files that:
- Follow NestJS testing patterns exactly
- Achieve 80%+ code coverage
- Test happy paths AND error scenarios
- Mock all external dependencies properly
- Include multi-tenant isolation tests
- Use proper TypeScript types (ZERO any types)
- Are immediately runnable without modifications
Key Requirements:
1. Test Structure: Use describe/it blocks with clear test names
2. Mocking: Use jest.Mocked<T> for type-safe mocks
3. Coverage: Test all public methods + edge cases
4. Error Handling: Test all error scenarios (NotFound, Conflict, BadRequest, etc.)
5. Multi-Tenant: Verify dealerId isolation in all operations
6. Performance: Include basic performance tests where applicable
7. Type Safety: No any types, proper interfaces, type guards
Code Quality Standards:
- Descriptive test names: "should throw NotFoundException when user not found"
- Clear arrange/act/assert structure
- Minimal but complete mocking (don't mock what you don't need)
- Test behavior, not implementation details
Output Format:
- Return ONLY the complete test file code
- No explanations, no markdown formatting
- Include all necessary imports
- Follow the template structure provided`;
}
/**
* Build user prompt with source code and template
*/
private buildUserPrompt(options: GenerateTestOptions): string {
return `Generate a comprehensive test file for this TypeScript source file:
File Path: ${options.sourceFilePath}
Source Code:
\`\`\`typescript
${options.sourceCode}
\`\`\`
Template to Follow:
\`\`\`typescript
${options.testTemplate}
\`\`\`
Instructions:
1. Analyze the source code to identify:
- All public methods that need testing
- Dependencies that need mocking
- Error scenarios to test
- Multi-tenant considerations (dealerId filtering)
2. Generate tests that cover:
- Initialization (dependency injection)
- Core functionality (all CRUD operations)
- Error handling (NotFound, Conflict, validation errors)
- Multi-tenant isolation (prevent cross-dealer access)
- Edge cases (null inputs, empty arrays, boundary values)
3. Follow the template structure:
- Section 1: Initialization
- Section 2: Core functionality (one describe per method)
- Section 3: Error handling
- Section 4: Multi-tenant isolation
- Section 5: Performance (if applicable)
4. Quality requirements:
- 80%+ coverage target
- Type-safe mocks using jest.Mocked<T>
- Descriptive test names
- No any types
- Proper imports
Output the complete test file code now:`;
}
/**
* Extract code from Claude's response (remove markdown if present)
*/
private extractCodeFromResponse(response: string): string {
// Remove markdown code blocks if present
let code = response.trim();
// Remove ```typescript or ```ts at start
code = code.replace(/^```(?:typescript|ts)?\n/i, '');
// Remove ``` at end
code = code.replace(/\n```\s*$/i, '');
return code;
}
/**
* Estimate cost for test generation
*/
estimateCost(sourceCodeLength: number, numFiles: number): { inputTokens: number; outputTokens: number; estimatedCost: number } {
// Rough estimates:
// - Input: Source code + template + prompt (~10k-30k tokens per file)
// - Output: Test file (~2k-4k tokens)
const avgInputTokensPerFile = Math.ceil(sourceCodeLength / 4) + 10000; // ~4 chars per token
const avgOutputTokensPerFile = 3000;
const totalInputTokens = avgInputTokensPerFile * numFiles;
const totalOutputTokens = avgOutputTokensPerFile * numFiles;
// Claude Sonnet 4.5 pricing (as of 2026-01):
// - Input: $0.003 per 1k tokens
// - Output: $0.015 per 1k tokens
const inputCost = (totalInputTokens / 1000) * 0.003;
const outputCost = (totalOutputTokens / 1000) * 0.015;
return {
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
estimatedCost: inputCost + outputCost,
};
}
}

218
scripts/lib/file-utils.ts Normal file
View File

@ -0,0 +1,218 @@
/**
* File System Utilities for Test Generation
*
* Handles reading source files, writing test files, and directory management.
*/
import * as fs from 'fs/promises';
import * as path from 'path';
import { glob } from 'glob';
export interface SourceFile {
absolutePath: string;
relativePath: string;
content: string;
serviceName: string;
fileName: string;
}
export interface TestFile {
sourcePath: string;
testPath: string;
content: string;
serviceName: string;
}
export class FileUtils {
private projectRoot: string;
constructor(projectRoot: string) {
this.projectRoot = projectRoot;
}
/**
* Find all source files in a service that need tests
*/
async findSourceFiles(serviceName: string): Promise<SourceFile[]> {
const serviceDir = path.join(this.projectRoot, 'apps/backend', serviceName);
// Check if service exists
try {
await fs.access(serviceDir);
} catch {
throw new Error(`Service not found: ${serviceName}`);
}
// Find TypeScript files that need tests
const patterns = [
`${serviceDir}/src/**/*.service.ts`,
`${serviceDir}/src/**/*.controller.ts`,
`${serviceDir}/src/**/*.repository.ts`,
`${serviceDir}/src/**/*.dto.ts`,
];
// Exclude files that shouldn't be tested
const excludePatterns = [
'**/*.module.ts',
'**/main.ts',
'**/index.ts',
'**/*.spec.ts',
'**/*.test.ts',
];
const sourceFiles: SourceFile[] = [];
for (const pattern of patterns) {
const files = await glob(pattern, {
ignore: excludePatterns,
absolute: true,
});
for (const filePath of files) {
try {
const content = await fs.readFile(filePath, 'utf-8');
const relativePath = path.relative(this.projectRoot, filePath);
const fileName = path.basename(filePath);
sourceFiles.push({
absolutePath: filePath,
relativePath,
content,
serviceName,
fileName,
});
} catch (error) {
console.error(`[FileUtils] Failed to read ${filePath}:`, error);
}
}
}
return sourceFiles;
}
/**
* Find a specific source file
*/
async findSourceFile(filePath: string): Promise<SourceFile> {
const absolutePath = path.isAbsolute(filePath)
? filePath
: path.join(this.projectRoot, filePath);
try {
const content = await fs.readFile(absolutePath, 'utf-8');
const relativePath = path.relative(this.projectRoot, absolutePath);
const fileName = path.basename(absolutePath);
// Extract service name from path (apps/backend/SERVICE_NAME/...)
const serviceMatch = relativePath.match(/apps\/backend\/([^\/]+)/);
const serviceName = serviceMatch ? serviceMatch[1] : 'unknown';
return {
absolutePath,
relativePath,
content,
serviceName,
fileName,
};
} catch (error) {
throw new Error(`Failed to read source file ${filePath}: ${error}`);
}
}
/**
* Get test file path for a source file
*/
getTestFilePath(sourceFile: SourceFile): string {
const { absolutePath, serviceName } = sourceFile;
// Convert src/ to test/
// Example: apps/backend/promo-service/src/promos/promo.service.ts
// -> apps/backend/promo-service/test/promos/promo.service.spec.ts
const relativePath = path.relative(
path.join(this.projectRoot, 'apps/backend', serviceName),
absolutePath
);
// Replace src/ with test/ and .ts with .spec.ts
const testRelativePath = relativePath
.replace(/^src\//, 'test/')
.replace(/\.ts$/, '.spec.ts');
return path.join(
this.projectRoot,
'apps/backend',
serviceName,
testRelativePath
);
}
/**
* Check if test file already exists
*/
async testFileExists(sourceFile: SourceFile): Promise<boolean> {
const testPath = this.getTestFilePath(sourceFile);
try {
await fs.access(testPath);
return true;
} catch {
return false;
}
}
/**
* Write test file with proper directory creation
*/
async writeTestFile(testFile: TestFile): Promise<void> {
const { testPath, content } = testFile;
// Ensure directory exists
const dir = path.dirname(testPath);
await fs.mkdir(dir, { recursive: true });
// Write file
await fs.writeFile(testPath, content, 'utf-8');
}
/**
* Read test template
*/
async readTestTemplate(): Promise<string> {
const templatePath = path.join(this.projectRoot, 'templates/backend-service-test.template.ts');
try {
return await fs.readFile(templatePath, 'utf-8');
} catch {
throw new Error(
`Test template not found at ${templatePath}. ` +
'Please ensure Story 19.3 is complete and template exists.'
);
}
}
/**
* Find all backend services
*/
async findAllServices(): Promise<string[]> {
const backendDir = path.join(this.projectRoot, 'apps/backend');
const entries = await fs.readdir(backendDir, { withFileTypes: true });
return entries
.filter(entry => entry.isDirectory())
.map(entry => entry.name)
.filter(name => !name.startsWith('.'));
}
/**
* Validate service exists
*/
async serviceExists(serviceName: string): Promise<boolean> {
const serviceDir = path.join(this.projectRoot, 'apps/backend', serviceName);
try {
await fs.access(serviceDir);
return true;
} catch {
return false;
}
}
}

346
scripts/lib/llm-task-verifier.py Executable file
View File

@ -0,0 +1,346 @@
#!/usr/bin/env python3
"""
LLM-Powered Task Verification - Use Claude Haiku to ACTUALLY verify code quality
Purpose: Don't guess with regex - have Claude READ the code and verify it's real
Method: For each task, read mentioned files, ask Claude "is this actually implemented?"
Created: 2026-01-02
Cost: ~$0.13 per story with Haiku (50 tasks × 3K tokens × $1.25/1M)
Full platform: 511 stories × $0.13 = ~$66 total
"""
import json
import os
import re
import sys
from pathlib import Path
from typing import Dict, List
from anthropic import Anthropic
class LLMTaskVerifier:
"""Uses Claude API to verify tasks by reading and analyzing actual code"""
def __init__(self, api_key: str = None):
self.api_key = api_key or os.environ.get('ANTHROPIC_API_KEY')
if not self.api_key:
raise ValueError("ANTHROPIC_API_KEY required")
self.client = Anthropic(api_key=self.api_key)
self.model = 'claude-haiku-4-20250514' # Fast + cheap for verification tasks
self.repo_root = Path('.')
def verify_task(self, task_text: str, is_checked: bool, story_context: Dict) -> Dict:
"""
Use Claude to verify if a task is actually complete
Args:
task_text: The task description (e.g., "Implement UserService")
is_checked: Whether task is checked [x] or not [ ]
story_context: Context about the story (files, epic, etc.)
Returns:
{
'task': task_text,
'is_checked': bool,
'actually_complete': bool,
'confidence': 'very_high' | 'high' | 'medium' | 'low',
'evidence': str,
'issues_found': [list of issues],
'verification_status': 'correct' | 'false_positive' | 'false_negative'
}
"""
# Extract file references from task
file_refs = self._extract_file_references(task_text)
# Read the files
file_contents = {}
for file_ref in file_refs[:5]: # Limit to 5 files per task
content = self._read_file(file_ref)
if content:
file_contents[file_ref] = content
# If no files found, try reading files from story context
if not file_contents and story_context.get('files'):
for file_path in story_context['files'][:5]:
content = self._read_file(file_path)
if content:
file_contents[file_path] = content
# Build prompt for Claude
prompt = self._build_verification_prompt(task_text, is_checked, file_contents, story_context)
# Call Claude API
try:
response = self.client.messages.create(
model=self.model,
max_tokens=2000,
temperature=0, # Deterministic
messages=[{
'role': 'user',
'content': prompt
}]
)
# Parse response
result_text = response.content[0].text
result = self._parse_claude_response(result_text)
# Add metadata
result['task'] = task_text
result['is_checked'] = is_checked
result['tokens_used'] = response.usage.input_tokens + response.usage.output_tokens
# Determine verification status
if is_checked == result['actually_complete']:
result['verification_status'] = 'correct'
elif is_checked and not result['actually_complete']:
result['verification_status'] = 'false_positive'
else:
result['verification_status'] = 'false_negative'
return result
except Exception as e:
return {
'task': task_text,
'error': str(e),
'verification_status': 'error'
}
def _build_verification_prompt(self, task: str, is_checked: bool, files: Dict, context: Dict) -> str:
"""Build prompt for Claude to verify task completion"""
files_section = ""
if files:
files_section = "\n\n## Files Provided\n\n"
for file_path, content in files.items():
files_section += f"### {file_path}\n```typescript\n{content[:2000]}\n```\n\n"
else:
files_section = "\n\n## Files Provided\n\nNone - task may not reference specific files.\n"
prompt = f"""You are a code verification expert. Your job is to verify whether a task from a user story is actually complete.
## Task to Verify
**Task:** {task}
**Claimed Status:** {'[x] Complete' if is_checked else '[ ] Not complete'}
## Story Context
**Story:** {context.get('story_id', 'Unknown')}
**Epic:** {context.get('epic', 'Unknown')}
{files_section}
## Your Task
Analyze the files (if provided) and determine:
1. **Is the task actually complete?**
- If files provided: Does the code actually implement what the task describes?
- Is it real implementation or just stubs/TODOs?
- Are there tests? Do they pass?
2. **Confidence level:**
- very_high: Clear evidence (tests passing, full implementation)
- high: Strong evidence (code exists with logic, no stubs)
- medium: Some evidence but incomplete
- low: No files or cannot verify
3. **Evidence:**
- What did you find that proves/disproves completion?
- Specific line numbers or code snippets
- Test results if applicable
4. **Issues (if any):**
- Stub code or TODOs
- Missing error handling
- No multi-tenant isolation (dealerId filters)
- Security vulnerabilities
- Missing tests
## Response Format (JSON)
{{
"actually_complete": true/false,
"confidence": "very_high|high|medium|low",
"evidence": "Detailed explanation of what you found",
"issues_found": ["issue 1", "issue 2"],
"recommendation": "What needs to be done (if incomplete)"
}}
**Be objective. If code is a stub with TODOs, it's NOT complete even if files exist.**
"""
return prompt
def _parse_claude_response(self, response_text: str) -> Dict:
"""Parse Claude's JSON response"""
try:
# Extract JSON from response (may have markdown)
json_match = re.search(r'\{.*\}', response_text, re.DOTALL)
if json_match:
return json.loads(json_match.group(0))
else:
# Fallback: parse manually
return {
'actually_complete': 'complete' in response_text.lower() and 'not complete' not in response_text.lower(),
'confidence': 'low',
'evidence': response_text[:500],
'issues_found': [],
}
except:
return {
'actually_complete': False,
'confidence': 'low',
'evidence': 'Failed to parse response',
'issues_found': ['Parse error'],
}
def _extract_file_references(self, task_text: str) -> List[str]:
"""Extract file paths from task text"""
paths = []
# Common patterns
patterns = [
r'[\w/-]+/[\w-]+\.[\w]+', # Explicit paths
r'\b([A-Z][\w-]+\.(ts|tsx|service|controller|repository))', # Files
]
for pattern in patterns:
matches = re.findall(pattern, task_text)
if isinstance(matches[0], tuple) if matches else False:
paths.extend([m[0] for m in matches])
else:
paths.extend(matches)
return list(set(paths))[:5] # Max 5 files per task
def _read_file(self, file_ref: str) -> str:
"""Find and read file from repository"""
# Try exact path
if (self.repo_root / file_ref).exists():
try:
return (self.repo_root / file_ref).read_text()[:5000] # Max 5K chars
except:
return None
# Search for file
import subprocess
try:
result = subprocess.run(
['find', '.', '-name', Path(file_ref).name, '-type', 'f'],
capture_output=True,
text=True,
cwd=self.repo_root,
timeout=5
)
if result.stdout.strip():
file_path = result.stdout.strip().split('\n')[0]
return Path(file_path).read_text()[:5000]
except:
pass
return None
def verify_story_with_llm(story_file_path: str) -> Dict:
"""
Verify entire story using LLM for each task
Cost: ~$1.50 per story (50 tasks × 3K tokens/task × $15/1M)
Time: ~2-3 minutes per story
"""
verifier = LLMTaskVerifier()
story_path = Path(story_file_path)
if not story_path.exists():
return {'error': 'Story file not found'}
content = story_path.read_text()
# Extract story context
story_id = story_path.stem
epic_match = re.search(r'Epic:\*?\*?\s*(\w+)', content, re.IGNORECASE)
epic = epic_match.group(1) if epic_match else 'Unknown'
# Extract files from Dev Agent Record
file_list_match = re.search(r'### File List\n\n(.+?)###', content, re.DOTALL)
files = []
if file_list_match:
file_section = file_list_match.group(1)
files = re.findall(r'[\w/-]+\.[\w]+', file_section)
story_context = {
'story_id': story_id,
'epic': epic,
'files': files
}
# Extract all tasks
task_pattern = r'^-\s*\[([ xX])\]\s*(.+)$'
tasks = re.findall(task_pattern, content, re.MULTILINE)
if not tasks:
return {'error': 'No tasks found'}
# Verify each task with LLM
print(f"\n🔍 Verifying {len(tasks)} tasks with Claude...", file=sys.stderr)
task_results = []
for idx, (checkbox, task_text) in enumerate(tasks):
is_checked = checkbox.lower() == 'x'
print(f" {idx+1}/{len(tasks)}: {task_text[:60]}...", file=sys.stderr)
result = verifier.verify_task(task_text, is_checked, story_context)
task_results.append(result)
# Calculate summary
total = len(task_results)
correct = sum(1 for r in task_results if r.get('verification_status') == 'correct')
false_positives = sum(1 for r in task_results if r.get('verification_status') == 'false_positive')
false_negatives = sum(1 for r in task_results if r.get('verification_status') == 'false_negative')
return {
'story_id': story_id,
'total_tasks': total,
'correct': correct,
'false_positives': false_positives,
'false_negatives': false_negatives,
'verification_score': round((correct / total * 100), 1) if total > 0 else 0,
'task_results': task_results
}
if __name__ == '__main__':
if len(sys.argv) < 2:
print("Usage: llm-task-verifier.py <story-file>")
sys.exit(1)
results = verify_story_with_llm(sys.argv[1])
if 'error' in results:
print(f"{results['error']}")
sys.exit(1)
# Print summary
print(f"\n📊 Story: {results['story_id']}")
print(f"Verification Score: {results['verification_score']}/100")
print(f"✅ Correct: {results['correct']}")
print(f"❌ False Positives: {results['false_positives']}")
print(f"⚠️ False Negatives: {results['false_negatives']}")
# Show false positives
if results['false_positives'] > 0:
print(f"\n❌ FALSE POSITIVES (claimed done but not implemented):")
for task in results['task_results']:
if task.get('verification_status') == 'false_positive':
print(f" - {task['task'][:80]}")
print(f" {task.get('evidence', 'No evidence')}")
# Output JSON
if '--json' in sys.argv:
print(json.dumps(results, indent=2))

122
scripts/lib/rate-limiter.ts Normal file
View File

@ -0,0 +1,122 @@
/**
* Rate Limiter for Claude API
*
* Implements exponential backoff and respects rate limits:
* - 50 requests/minute (Claude API limit)
* - Automatic retry on 429 (rate limit exceeded)
* - Configurable concurrent request limit
*/
export interface RateLimiterConfig {
requestsPerMinute: number;
maxRetries: number;
initialBackoffMs: number;
maxConcurrent: number;
}
export class RateLimiter {
private requestTimestamps: number[] = [];
private activeRequests = 0;
private config: RateLimiterConfig;
constructor(config: Partial<RateLimiterConfig> = {}) {
this.config = {
requestsPerMinute: config.requestsPerMinute ?? 50,
maxRetries: config.maxRetries ?? 3,
initialBackoffMs: config.initialBackoffMs ?? 1000,
maxConcurrent: config.maxConcurrent ?? 5,
};
}
/**
* Wait until it's safe to make next request
*/
async waitForSlot(): Promise<void> {
// Wait for concurrent slot
while (this.activeRequests >= this.config.maxConcurrent) {
await this.sleep(100);
}
// Clean old timestamps (older than 1 minute)
const oneMinuteAgo = Date.now() - 60000;
this.requestTimestamps = this.requestTimestamps.filter(ts => ts > oneMinuteAgo);
// Check if we've hit rate limit
if (this.requestTimestamps.length >= this.config.requestsPerMinute) {
const oldestRequest = this.requestTimestamps[0];
const waitTime = 60000 - (Date.now() - oldestRequest);
if (waitTime > 0) {
console.log(`[RateLimiter] Rate limit reached. Waiting ${Math.ceil(waitTime / 1000)}s...`);
await this.sleep(waitTime);
}
}
// Add delay between requests (1.2s for 50 req/min)
const minDelayMs = Math.ceil(60000 / this.config.requestsPerMinute);
const lastRequest = this.requestTimestamps[this.requestTimestamps.length - 1];
if (lastRequest) {
const timeSinceLastRequest = Date.now() - lastRequest;
if (timeSinceLastRequest < minDelayMs) {
await this.sleep(minDelayMs - timeSinceLastRequest);
}
}
this.requestTimestamps.push(Date.now());
this.activeRequests++;
}
/**
* Release a concurrent slot
*/
releaseSlot(): void {
this.activeRequests = Math.max(0, this.activeRequests - 1);
}
/**
* Execute function with exponential backoff retry
*/
async withRetry<T>(fn: () => Promise<T>, context: string): Promise<T> {
let lastError: Error | null = null;
for (let attempt = 0; attempt < this.config.maxRetries; attempt++) {
try {
await this.waitForSlot();
const result = await fn();
this.releaseSlot();
return result;
} catch (error) {
this.releaseSlot();
lastError = error instanceof Error ? error : new Error(String(error));
// Check if it's a rate limit error (429)
const errorMsg = lastError.message.toLowerCase();
const isRateLimit = errorMsg.includes('429') || errorMsg.includes('rate limit');
if (isRateLimit && attempt < this.config.maxRetries - 1) {
const backoffMs = this.config.initialBackoffMs * Math.pow(2, attempt);
console.log(
`[RateLimiter] ${context} - Rate limit hit. Retry ${attempt + 1}/${this.config.maxRetries} in ${backoffMs}ms`
);
await this.sleep(backoffMs);
continue;
}
// Non-retryable error or max retries reached
if (attempt < this.config.maxRetries - 1) {
const backoffMs = this.config.initialBackoffMs * Math.pow(2, attempt);
console.log(
`[RateLimiter] ${context} - Error: ${lastError.message}. Retry ${attempt + 1}/${this.config.maxRetries} in ${backoffMs}ms`
);
await this.sleep(backoffMs);
}
}
}
throw new Error(`${context} - Failed after ${this.config.maxRetries} attempts: ${lastError?.message}`);
}
private sleep(ms: number): Promise<void> {
return new Promise(resolve => setTimeout(resolve, ms));
}
}

View File

@ -0,0 +1,421 @@
#!/usr/bin/env python3
"""
Sprint Status Updater - Robust YAML updater for sprint-status.yaml
Purpose: Update sprint-status.yaml entries while preserving:
- Comments
- Formatting
- Section structure
- Manual annotations
Created: 2026-01-02
Part of: Full Workflow Fix (Option C)
"""
import re
import sys
from pathlib import Path
from typing import Dict, List, Tuple
from datetime import datetime
class SprintStatusUpdater:
"""Updates sprint-status.yaml while preserving structure and comments"""
def __init__(self, sprint_status_path: str):
self.path = Path(sprint_status_path)
self.content = self.path.read_text()
self.lines = self.content.split('\n')
self.updates_applied = 0
def update_story_status(self, story_id: str, new_status: str, comment: str = None) -> bool:
"""
Update a single story's status in development_status section
Args:
story_id: Story identifier (e.g., "19-4a-inventory-service-test-coverage")
new_status: New status value (e.g., "done", "in-progress")
comment: Optional comment to append (e.g., "✅ COMPLETE 2026-01-02")
Returns:
True if update was applied, False if story not found or unchanged
"""
# Find the story line in development_status section
in_dev_status = False
story_line_idx = None
for idx, line in enumerate(self.lines):
if line.strip() == 'development_status:':
in_dev_status = True
continue
if in_dev_status:
# Check if we've left development_status section
if line and not line.startswith(' ') and not line.startswith('#'):
break
# Check if this is our story
if line.startswith(' ') and story_id in line:
story_line_idx = idx
break
if story_line_idx is None:
# Story not found - need to add it
return self._add_story_entry(story_id, new_status, comment)
# Update existing line
current_line = self.lines[story_line_idx]
# Parse current line: " story-id: status # comment"
match = re.match(r'(\s+)([a-z0-9-]+):\s*(\S+)(.*)', current_line)
if not match:
print(f"WARNING: Could not parse line: {current_line}", file=sys.stderr)
return False
indent, current_story_id, current_status, existing_comment = match.groups()
# Check if update needed
if current_status == new_status:
return False # No change needed
# Build new line
if comment:
new_line = f"{indent}{story_id}: {new_status} # {comment}"
elif existing_comment:
# Preserve existing comment
new_line = f"{indent}{story_id}: {new_status}{existing_comment}"
else:
new_line = f"{indent}{story_id}: {new_status}"
self.lines[story_line_idx] = new_line
self.updates_applied += 1
return True
def _add_story_entry(self, story_id: str, status: str, comment: str = None) -> bool:
"""Add a new story entry to development_status section"""
# Find the epic this story belongs to
epic_match = re.match(r'^(\d+[a-z]?)-', story_id)
if not epic_match:
print(f"WARNING: Cannot determine epic for {story_id}", file=sys.stderr)
return False
epic_num = epic_match.group(1)
epic_key = f"epic-{epic_num}"
# Find where to insert the story (after its epic line)
in_dev_status = False
insert_idx = None
for idx, line in enumerate(self.lines):
if line.strip() == 'development_status:':
in_dev_status = True
continue
if in_dev_status:
# Look for the epic line
if line.strip().startswith(f"{epic_key}:"):
# Found the epic - insert after it
insert_idx = idx + 1
break
if insert_idx is None:
print(f"WARNING: Could not find epic {epic_key} in development_status", file=sys.stderr)
return False
# Build new line
if comment:
new_line = f" {story_id}: {status} # {comment}"
else:
new_line = f" {story_id}: {status}"
# Insert the line
self.lines.insert(insert_idx, new_line)
self.updates_applied += 1
return True
def update_epic_status(self, epic_key: str, new_status: str, comment: str = None) -> bool:
"""Update epic status line"""
in_dev_status = False
epic_line_idx = None
for idx, line in enumerate(self.lines):
if line.strip() == 'development_status:':
in_dev_status = True
continue
if in_dev_status:
if line and not line.startswith(' ') and not line.startswith('#'):
break
if line.strip().startswith(f"{epic_key}:"):
epic_line_idx = idx
break
if epic_line_idx is None:
print(f"WARNING: Epic {epic_key} not found", file=sys.stderr)
return False
# Parse current line
current_line = self.lines[epic_line_idx]
match = re.match(r'(\s+)([a-z0-9-]+):\s*(\S+)(.*)', current_line)
if not match:
return False
indent, current_epic, current_status, existing_comment = match.groups()
if current_status == new_status:
return False
# Build new line
if comment:
new_line = f"{indent}{epic_key}: {new_status} # {comment}"
elif existing_comment:
new_line = f"{indent}{epic_key}: {new_status}{existing_comment}"
else:
new_line = f"{indent}{epic_key}: {new_status}"
self.lines[epic_line_idx] = new_line
self.updates_applied += 1
return True
def add_verification_note(self):
"""Add verification timestamp to header"""
# Find and update last_verified line
for idx, line in enumerate(self.lines):
if line.startswith('# last_verified:'):
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S EST')
self.lines[idx] = f"# last_verified: {timestamp}"
break
def save(self, backup: bool = True) -> Path:
"""
Save updated content back to file
Args:
backup: If True, create backup before saving
Returns:
Path to backup file if created, otherwise original path
"""
if backup and self.updates_applied > 0:
backup_dir = Path('.sprint-status-backups')
backup_dir.mkdir(exist_ok=True)
backup_path = backup_dir / f"sprint-status-{datetime.now().strftime('%Y%m%d-%H%M%S')}.yaml"
backup_path.write_text(self.content)
print(f"✓ Backup created: {backup_path}", file=sys.stderr)
# Write updated content
new_content = '\n'.join(self.lines)
self.path.write_text(new_content)
return self.path
def scan_story_statuses(story_dir: str = "docs/sprint-artifacts") -> Dict[str, str]:
"""
Scan all story files and extract EXPLICIT Status: fields
CRITICAL: Only returns stories that HAVE a Status: field.
If Status: field is missing, story is NOT included in results.
This prevents overwriting sprint-status.yaml with defaults.
Returns:
Dict mapping story_id -> normalized_status (ONLY for stories with explicit Status: field)
"""
story_dir_path = Path(story_dir)
story_files = list(story_dir_path.glob("*.md"))
STATUS_MAPPINGS = {
'done': 'done',
'complete': 'done',
'completed': 'done',
'in-progress': 'in-progress',
'in_progress': 'in-progress',
'review': 'review',
'ready-for-dev': 'ready-for-dev',
'ready_for_dev': 'ready-for-dev',
'pending': 'ready-for-dev',
'drafted': 'ready-for-dev',
'backlog': 'backlog',
'blocked': 'blocked',
'deferred': 'deferred',
'archived': 'archived',
}
story_statuses = {}
skipped_count = 0
for story_file in story_files:
story_id = story_file.stem
# Skip special files
if (story_id.startswith('.') or
story_id.startswith('EPIC-') or
'COMPLETION' in story_id.upper() or
'SUMMARY' in story_id.upper() or
'REPORT' in story_id.upper() or
'README' in story_id.upper() or
'INDEX' in story_id.upper() or
'REVIEW' in story_id.upper() or
'AUDIT' in story_id.upper()):
continue
try:
content = story_file.read_text()
# Extract Status field
status_match = re.search(r'^Status:\s*(.+?)$', content, re.MULTILINE | re.IGNORECASE)
if status_match:
status = status_match.group(1).strip()
# Remove comments
status = re.sub(r'\s*#.*$', '', status).strip().lower()
# Normalize status
if status in STATUS_MAPPINGS:
normalized_status = STATUS_MAPPINGS[status]
elif 'done' in status or 'complete' in status:
normalized_status = 'done'
elif 'progress' in status:
normalized_status = 'in-progress'
elif 'review' in status:
normalized_status = 'review'
elif 'ready' in status:
normalized_status = 'ready-for-dev'
elif 'block' in status:
normalized_status = 'blocked'
elif 'defer' in status:
normalized_status = 'deferred'
elif 'archive' in status:
normalized_status = 'archived'
else:
normalized_status = 'ready-for-dev'
story_statuses[story_id] = normalized_status
else:
# CRITICAL FIX: No Status: field found
# Do NOT default to ready-for-dev - skip this story entirely
# This prevents overwriting sprint-status.yaml with incorrect defaults
skipped_count += 1
except Exception as e:
print(f"ERROR parsing {story_id}: {e}", file=sys.stderr)
continue
print(f"✓ Found {len(story_statuses)} stories with explicit Status: fields", file=sys.stderr)
print(f" Skipped {skipped_count} stories without Status: fields (trust sprint-status.yaml)", file=sys.stderr)
return story_statuses
def main():
"""Main entry point for CLI usage"""
import argparse
parser = argparse.ArgumentParser(description='Update sprint-status.yaml from story files')
parser.add_argument('--dry-run', action='store_true', help='Show changes without applying')
parser.add_argument('--validate', action='store_true', help='Validate only (exit 1 if discrepancies)')
parser.add_argument('--sprint-status', default='docs/sprint-artifacts/sprint-status.yaml',
help='Path to sprint-status.yaml')
parser.add_argument('--story-dir', default='docs/sprint-artifacts',
help='Path to story files directory')
parser.add_argument('--epic', type=str, help='Validate specific epic only (e.g., epic-1)')
parser.add_argument('--mode', choices=['validate', 'fix'], default='validate',
help='Mode: validate (report only) or fix (apply updates)')
args = parser.parse_args()
# Scan story files
print("Scanning story files...", file=sys.stderr)
story_statuses = scan_story_statuses(args.story_dir)
# Filter by epic if specified
if args.epic:
# Extract epic number from epic key (e.g., "epic-1" -> "1")
epic_match = re.match(r'epic-([0-9a-z-]+)', args.epic)
if epic_match:
epic_num = epic_match.group(1)
# Filter stories that start with this epic number
story_statuses = {k: v for k, v in story_statuses.items()
if k.startswith(f"{epic_num}-")}
print(f"✓ Filtered to {len(story_statuses)} stories for {args.epic}", file=sys.stderr)
else:
print(f"WARNING: Invalid epic format: {args.epic}", file=sys.stderr)
print(f"✓ Scanned {len(story_statuses)} story files", file=sys.stderr)
print("", file=sys.stderr)
# Load sprint-status.yaml
updater = SprintStatusUpdater(args.sprint_status)
# Find discrepancies
discrepancies = []
for story_id, new_status in story_statuses.items():
# Check current status in sprint-status.yaml
current_status = None
in_dev_status = False
for line in updater.lines:
if line.strip() == 'development_status:':
in_dev_status = True
continue
if in_dev_status and story_id in line:
match = re.match(r'\s+[a-z0-9-]+:\s*(\S+)', line)
if match:
current_status = match.group(1)
break
if current_status is None:
discrepancies.append((story_id, 'NOT-IN-FILE', new_status))
elif current_status != new_status:
discrepancies.append((story_id, current_status, new_status))
# Report
if not discrepancies:
print("✓ sprint-status.yaml is up to date!", file=sys.stderr)
sys.exit(0)
print(f"⚠ Found {len(discrepancies)} discrepancies:", file=sys.stderr)
print("", file=sys.stderr)
for story_id, old_status, new_status in discrepancies[:20]:
if old_status == 'NOT-IN-FILE':
print(f" [ADD] {story_id}: (not in file) → {new_status}", file=sys.stderr)
else:
print(f" [UPDATE] {story_id}: {old_status}{new_status}", file=sys.stderr)
if len(discrepancies) > 20:
print(f" ... and {len(discrepancies) - 20} more", file=sys.stderr)
print("", file=sys.stderr)
# Handle mode parameter
if args.mode == 'validate' or args.validate:
print("✗ Validation failed - discrepancies found", file=sys.stderr)
sys.exit(1)
if args.dry_run:
print("DRY RUN: Would update sprint-status.yaml", file=sys.stderr)
sys.exit(0)
# Apply updates (--mode fix or default behavior)
print("Applying updates...", file=sys.stderr)
for story_id, old_status, new_status in discrepancies:
comment = f"Updated {datetime.now().strftime('%Y-%m-%d')}"
updater.update_story_status(story_id, new_status, comment)
# Add verification timestamp
updater.add_verification_note()
# Save
updater.save(backup=True)
print(f"✓ Applied {updater.updates_applied} updates", file=sys.stderr)
print(f"✓ Updated: {updater.path}", file=sys.stderr)
sys.exit(0)
if __name__ == '__main__':
main()

View File

@ -0,0 +1,525 @@
#!/usr/bin/env python3
"""
Task Verification Engine - Verify story task checkboxes match ACTUAL CODE
Purpose: Prevent false positives where tasks are checked but code doesn't exist
Method: Parse task text, infer what files/functions should exist, verify in codebase
Created: 2026-01-02
Part of: Comprehensive validation solution
"""
import re
import subprocess
from pathlib import Path
from typing import Dict, List, Tuple, Optional
class TaskVerificationEngine:
"""Verifies that checked tasks correspond to actual code in the repository"""
def __init__(self, repo_root: Path = Path(".")):
self.repo_root = repo_root
def verify_task(self, task_text: str, is_checked: bool) -> Dict:
"""
Verify a single task against codebase reality
DEEP VERIFICATION - Not just file existence, but:
- Files exist AND have real implementation (not stubs)
- Tests exist AND are passing
- No TODO/FIXME comments in implementation
- Code has actual logic (not empty classes)
Returns:
{
'task': task_text,
'is_checked': bool,
'should_be_checked': bool,
'confidence': 'high'|'medium'|'low',
'evidence': [list of evidence],
'verification_status': 'correct'|'false_positive'|'false_negative'|'uncertain'
}
"""
# Extract potential file paths from task text
file_refs = self._extract_file_references(task_text)
# Extract class/function names
code_refs = self._extract_code_references(task_text)
# Extract test requirements
test_refs = self._extract_test_references(task_text)
# Verify file existence AND implementation quality
files_exist = []
files_missing = []
for file_ref in file_refs:
if self._file_exists(file_ref):
# DEEP CHECK: Is it really implemented or just a stub?
if self._verify_real_implementation(file_ref, None):
files_exist.append(file_ref)
else:
files_missing.append(f"{file_ref} (stub/TODO)")
else:
files_missing.append(file_ref)
# Verify code existence AND implementation
code_found = []
code_missing = []
for code_ref in code_refs:
if self._code_exists(code_ref):
code_found.append(code_ref)
else:
code_missing.append(code_ref)
# Verify tests exist AND pass
tests_passing = []
tests_failing_or_missing = []
for test_ref in test_refs:
test_status = self._verify_test_exists_and_passes(test_ref)
if test_status == 'passing':
tests_passing.append(test_ref)
else:
tests_failing_or_missing.append(f"{test_ref} ({test_status})")
# Build evidence with DEEP verification
evidence = []
confidence = 'low'
should_be_checked = False
# STRONGEST evidence: Tests exist AND pass
if tests_passing:
evidence.append(f"{len(tests_passing)} tests passing (VERIFIED)")
confidence = 'very high'
should_be_checked = True
# Strong evidence: Files exist with real implementation
if files_exist and not files_missing:
evidence.append(f"All {len(files_exist)} files exist with real code (no stubs)")
if confidence != 'very high':
confidence = 'high'
should_be_checked = True
# Strong evidence: Code found with implementation
if code_found and not code_missing:
evidence.append(f"All {len(code_found)} code elements implemented")
if confidence == 'low':
confidence = 'high'
should_be_checked = True
# NEGATIVE evidence: Tests missing or failing
if tests_failing_or_missing:
evidence.append(f"{len(tests_failing_or_missing)} tests missing/failing")
# Even if files exist, no passing tests = NOT done
should_be_checked = False
confidence = 'medium'
# NEGATIVE evidence: Mixed results
if files_exist and files_missing:
evidence.append(f"{len(files_exist)} files OK, {len(files_missing)} missing/stubs")
confidence = 'medium'
should_be_checked = False # Incomplete
# Strong evidence of incompletion
if not files_exist and files_missing:
evidence.append(f"All {len(files_missing)} files missing or stubs")
confidence = 'high'
should_be_checked = False
if not code_found and code_missing:
evidence.append(f"Code not found: {', '.join(code_missing[:3])}")
confidence = 'medium'
should_be_checked = False
# No file/code/test references - use heuristics
if not file_refs and not code_refs and not test_refs:
# Check for action keywords
if self._has_completion_keywords(task_text):
evidence.append("Research/analysis task (no code artifacts)")
confidence = 'low'
# Can't verify - trust the checkbox
should_be_checked = is_checked
else:
evidence.append("No verifiable references")
confidence = 'low'
should_be_checked = is_checked
# Determine verification status
if is_checked == should_be_checked:
verification_status = 'correct'
elif is_checked and not should_be_checked:
verification_status = 'false_positive' # Checked but code missing
elif not is_checked and should_be_checked:
verification_status = 'false_negative' # Unchecked but code exists
else:
verification_status = 'uncertain'
return {
'task': task_text,
'is_checked': is_checked,
'should_be_checked': should_be_checked,
'confidence': confidence,
'evidence': evidence,
'verification_status': verification_status,
'files_exist': files_exist,
'files_missing': files_missing,
'code_found': code_found,
'code_missing': code_missing,
}
def _extract_file_references(self, task_text: str) -> List[str]:
"""Extract file path references from task text"""
paths = []
# Pattern 1: Explicit paths (src/foo/bar.ts)
explicit_paths = re.findall(r'[\w/-]+/[\w-]+\.[\w]+', task_text)
paths.extend(explicit_paths)
# Pattern 2: "Create Foo.ts" or "Implement Bar.service.ts"
file_mentions = re.findall(r'\b([A-Z][\w-]+\.(ts|tsx|js|jsx|py|md|yaml|json))\b', task_text)
paths.extend([f[0] for f in file_mentions])
# Pattern 3: "in components/Widget.tsx"
contextual = re.findall(r'in\s+([\w/-]+\.[\w]+)', task_text, re.IGNORECASE)
paths.extend(contextual)
return list(set(paths)) # Deduplicate
def _extract_code_references(self, task_text: str) -> List[str]:
"""Extract class/function/interface names from task text"""
code_refs = []
# Pattern 1: "Create FooService class"
class_patterns = re.findall(r'(?:Create|Implement|Add)\s+(\w+(?:Service|Controller|Repository|Component|Interface|Type))', task_text, re.IGNORECASE)
code_refs.extend(class_patterns)
# Pattern 2: "Implement getFoo method"
method_patterns = re.findall(r'(?:Implement|Add|Create)\s+(\w+)\s+(?:method|function)', task_text, re.IGNORECASE)
code_refs.extend(method_patterns)
# Pattern 3: Camel/PascalCase references
camelcase = re.findall(r'\b([A-Z][a-z]+(?:[A-Z][a-z]+)+)\b', task_text)
code_refs.extend(camelcase)
return list(set(code_refs))
def _file_exists(self, file_path: str) -> bool:
"""Check if file exists in repository"""
# Try exact path first
if (self.repo_root / file_path).exists():
return True
# Try common locations
search_dirs = [
'apps/backend/',
'apps/frontend/',
'packages/',
'src/',
'infrastructure/',
]
for search_dir in search_dirs:
if (self.repo_root / search_dir).exists():
# Use find command
try:
result = subprocess.run(
['find', search_dir, '-name', Path(file_path).name, '-type', 'f'],
capture_output=True,
text=True,
cwd=self.repo_root,
timeout=5
)
if result.returncode == 0 and result.stdout.strip():
return True
except:
pass
return False
def _code_exists(self, code_ref: str) -> bool:
"""Check if class/function/interface exists AND is actually implemented (not just a stub)"""
try:
# Search for class, interface, function, or type declaration
patterns = [
f'class {code_ref}',
f'interface {code_ref}',
f'function {code_ref}',
f'export const {code_ref}',
f'export function {code_ref}',
f'type {code_ref}',
]
for pattern in patterns:
result = subprocess.run(
['grep', '-r', '-l', pattern, '.', '--include=*.ts', '--include=*.tsx', '--include=*.js'],
capture_output=True,
text=True,
cwd=self.repo_root,
timeout=10
)
if result.returncode == 0 and result.stdout.strip():
# Found the declaration - now verify it's not a stub
file_path = result.stdout.strip().split('\n')[0]
if self._verify_real_implementation(file_path, code_ref):
return True
except:
pass
return False
def _verify_real_implementation(self, file_path: str, code_ref: str) -> bool:
"""
Verify code is REALLY implemented, not just a stub or TODO
Checks for:
- File has substantial code (not just empty class)
- No TODO/FIXME comments near the code
- Has actual methods/logic (not just interface)
"""
try:
full_path = self.repo_root / file_path
if not full_path.exists():
return False
content = full_path.read_text()
# Find the code reference
code_index = content.find(code_ref)
if code_index == -1:
return False
# Get 500 chars after the reference (the implementation)
code_snippet = content[code_index:code_index + 500]
# RED FLAGS - indicates stub/incomplete code
red_flags = [
'TODO',
'FIXME',
'throw new Error(\'Not implemented',
'return null;',
'// Placeholder',
'// Stub',
'return {};',
'return [];',
'return undefined;',
]
for flag in red_flags:
if flag in code_snippet:
return False # Found stub/placeholder
# GREEN FLAGS - indicates real implementation
green_flags = [
'return', # Has return statements
'this.', # Uses instance members
'await', # Has async logic
'if (', # Has conditional logic
'for (', # Has loops
'const ', # Has variables
]
green_count = sum(1 for flag in green_flags if flag in code_snippet)
# Need at least 3 green flags for "real" implementation
return green_count >= 3
except:
return False
def _extract_test_references(self, task_text: str) -> List[str]:
"""Extract test file references from task text"""
test_refs = []
# Pattern 1: Explicit test files
test_files = re.findall(r'([\w/-]+\.(?:spec|test)\.(?:ts|tsx|js))', task_text)
test_refs.extend(test_files)
# Pattern 2: "Write tests for X" or "Add test coverage"
if re.search(r'\b(?:test|tests|testing|coverage)\b', task_text, re.IGNORECASE):
# Extract potential test subjects
subjects = re.findall(r'(?:for|to)\s+(\w+(?:Service|Controller|Component|Repository|Widget))', task_text)
test_refs.extend([f"{subj}.spec.ts" for subj in subjects])
return list(set(test_refs))
def _verify_test_exists_and_passes(self, test_ref: str) -> str:
"""
Verify test file exists AND tests are passing
Returns: 'passing' | 'failing' | 'missing' | 'not_run'
"""
# Find test file
if not self._file_exists(test_ref):
return 'missing'
# Try to run the test
try:
# Find the actual test file path
result = subprocess.run(
['find', '.', '-name', Path(test_ref).name, '-type', 'f'],
capture_output=True,
text=True,
cwd=self.repo_root,
timeout=5
)
if not result.stdout.strip():
return 'missing'
test_file_path = result.stdout.strip().split('\n')[0]
# Run the test (with timeout - don't hang)
test_result = subprocess.run(
['pnpm', 'test', '--', test_file_path, '--run'],
capture_output=True,
text=True,
cwd=self.repo_root,
timeout=30 # 30 second timeout per test file
)
# Check output for pass/fail
output = test_result.stdout + test_result.stderr
if 'PASS' in output or 'passing' in output.lower():
return 'passing'
elif 'FAIL' in output or 'failing' in output.lower():
return 'failing'
else:
return 'not_run'
except subprocess.TimeoutExpired:
return 'timeout'
except:
return 'not_run'
def _has_completion_keywords(self, task_text: str) -> bool:
"""Check if task has action-oriented keywords"""
keywords = [
'research', 'investigate', 'analyze', 'review', 'document',
'plan', 'design', 'decide', 'choose', 'evaluate', 'assess'
]
text_lower = task_text.lower()
return any(keyword in text_lower for keyword in keywords)
def verify_story_tasks(story_file_path: str) -> Dict:
"""
Verify all tasks in a story file
Returns:
{
'total_tasks': int,
'checked_tasks': int,
'correct_checkboxes': int,
'false_positives': int, # Checked but code missing
'false_negatives': int, # Unchecked but code exists
'uncertain': int,
'verification_score': float, # 0-100
'task_details': [...],
}
"""
story_path = Path(story_file_path)
if not story_path.exists():
return {'error': 'Story file not found'}
content = story_path.read_text()
# Extract all tasks (- [ ] or - [x])
task_pattern = r'^-\s*\[([ xX])\]\s*(.+)$'
tasks = re.findall(task_pattern, content, re.MULTILINE)
if not tasks:
return {
'total_tasks': 0,
'error': 'No task list found in story file'
}
# Verify each task
engine = TaskVerificationEngine(story_path.parent.parent) # Go up to repo root
task_verifications = []
for checkbox, task_text in tasks:
is_checked = checkbox.lower() == 'x'
verification = engine.verify_task(task_text, is_checked)
task_verifications.append(verification)
# Calculate summary
total_tasks = len(task_verifications)
checked_tasks = sum(1 for v in task_verifications if v['is_checked'])
correct = sum(1 for v in task_verifications if v['verification_status'] == 'correct')
false_positives = sum(1 for v in task_verifications if v['verification_status'] == 'false_positive')
false_negatives = sum(1 for v in task_verifications if v['verification_status'] == 'false_negative')
uncertain = sum(1 for v in task_verifications if v['verification_status'] == 'uncertain')
# Verification score: (correct / total) * 100
verification_score = (correct / total_tasks * 100) if total_tasks > 0 else 0
return {
'total_tasks': total_tasks,
'checked_tasks': checked_tasks,
'correct_checkboxes': correct,
'false_positives': false_positives,
'false_negatives': false_negatives,
'uncertain': uncertain,
'verification_score': round(verification_score, 1),
'task_details': task_verifications,
}
def main():
"""CLI entry point"""
import sys
import json
if len(sys.argv) < 2:
print("Usage: task-verification-engine.py <story-file-path>", file=sys.stderr)
sys.exit(1)
story_file = sys.argv[1]
results = verify_story_tasks(story_file)
# Print summary
print(f"\n📋 Task Verification Report: {Path(story_file).name}")
print("=" * 80)
if 'error' in results:
print(f"{results['error']}")
sys.exit(1)
print(f"Total tasks: {results['total_tasks']}")
print(f"Checked: {results['checked_tasks']}")
print(f"Verification score: {results['verification_score']}/100")
print()
print(f"✅ Correct: {results['correct_checkboxes']}")
print(f"❌ False positives: {results['false_positives']} (checked but code missing)")
print(f"❌ False negatives: {results['false_negatives']} (unchecked but code exists)")
print(f"❔ Uncertain: {results['uncertain']}")
# Show false positives
if results['false_positives'] > 0:
print("\n⚠️ FALSE POSITIVES (checked but no evidence):")
for task in results['task_details']:
if task['verification_status'] == 'false_positive':
print(f" - {task['task'][:80]}")
print(f" Evidence: {', '.join(task['evidence'])}")
# Show false negatives
if results['false_negatives'] > 0:
print("\n💡 FALSE NEGATIVES (unchecked but code exists):")
for task in results['task_details']:
if task['verification_status'] == 'false_negative':
print(f" - {task['task'][:80]}")
print(f" Evidence: {', '.join(task['evidence'])}")
# Output JSON for programmatic use
if '--json' in sys.argv:
print("\n" + json.dumps(results, indent=2))
if __name__ == '__main__':
main()

View File

@ -0,0 +1,159 @@
#!/usr/bin/env python3
"""
Validation Progress Tracker - Track comprehensive validation progress
Purpose:
- Save progress after each story validation
- Enable resuming interrupted validation runs
- Provide real-time status updates
Created: 2026-01-02
"""
import yaml
from datetime import datetime
from pathlib import Path
from typing import Dict, List
class ValidationProgressTracker:
"""Tracks validation progress for resumability"""
def __init__(self, progress_file: str):
self.path = Path(progress_file)
self.data = self._load_or_initialize()
def _load_or_initialize(self) -> Dict:
"""Load existing progress or initialize new"""
if self.path.exists():
with open(self.path) as f:
return yaml.safe_load(f)
return {
'started_at': datetime.now().isoformat(),
'last_updated': datetime.now().isoformat(),
'epic_filter': None,
'total_stories': 0,
'stories_validated': 0,
'current_batch': 0,
'batches_completed': 0,
'status': 'in-progress',
'counters': {
'verified_complete': 0,
'needs_rework': 0,
'false_positives': 0,
'in_progress': 0,
'total_false_positive_tasks': 0,
'total_critical_issues': 0,
},
'validated_stories': {},
'remaining_stories': [],
}
def initialize(self, total_stories: int, story_list: List[str], epic_filter: str = None):
"""Initialize new validation run"""
self.data['total_stories'] = total_stories
self.data['remaining_stories'] = story_list
self.data['epic_filter'] = epic_filter
self.save()
def mark_story_validated(self, story_id: str, result: Dict):
"""Mark a story as validated with results"""
self.data['stories_validated'] += 1
self.data['validated_stories'][story_id] = {
'category': result.get('category'),
'score': result.get('verification_score'),
'false_positives': result.get('false_positive_count', 0),
'critical_issues': result.get('critical_issues_count', 0),
'validated_at': datetime.now().isoformat(),
}
# Update counters
category = result.get('category')
if category == 'VERIFIED_COMPLETE':
self.data['counters']['verified_complete'] += 1
elif category == 'FALSE_POSITIVE':
self.data['counters']['false_positives'] += 1
elif category == 'NEEDS_REWORK':
self.data['counters']['needs_rework'] += 1
elif category == 'IN_PROGRESS':
self.data['counters']['in_progress'] += 1
self.data['counters']['total_false_positive_tasks'] += result.get('false_positive_count', 0)
self.data['counters']['total_critical_issues'] += result.get('critical_issues_count', 0)
# Remove from remaining
if story_id in self.data['remaining_stories']:
self.data['remaining_stories'].remove(story_id)
self.data['last_updated'] = datetime.now().isoformat()
self.save()
def mark_batch_complete(self, batch_number: int):
"""Mark a batch as complete"""
self.data['batches_completed'] = batch_number
self.data['current_batch'] = batch_number + 1
self.save()
def mark_complete(self):
"""Mark entire validation as complete"""
self.data['status'] = 'complete'
self.data['completed_at'] = datetime.now().isoformat()
# Calculate duration
started = datetime.fromisoformat(self.data['started_at'])
completed = datetime.fromisoformat(self.data['completed_at'])
duration = completed - started
self.data['duration_hours'] = round(duration.total_seconds() / 3600, 1)
self.save()
def get_progress_percentage(self) -> float:
"""Get completion percentage"""
if self.data['total_stories'] == 0:
return 0
return round((self.data['stories_validated'] / self.data['total_stories']) * 100, 1)
def get_summary(self) -> Dict:
"""Get current progress summary"""
return {
'progress': f"{self.data['stories_validated']}/{self.data['total_stories']} ({self.get_progress_percentage()}%)",
'verified_complete': self.data['counters']['verified_complete'],
'false_positives': self.data['counters']['false_positives'],
'needs_rework': self.data['counters']['needs_rework'],
'remaining': len(self.data['remaining_stories']),
'status': self.data['status'],
}
def save(self):
"""Save progress to file"""
with open(self.path, 'w') as f:
yaml.dump(self.data, f, default_flow_style=False, sort_keys=False)
def get_remaining_stories(self) -> List[str]:
"""Get list of stories not yet validated"""
return self.data['remaining_stories']
def is_complete(self) -> bool:
"""Check if validation is complete"""
return self.data['status'] == 'complete'
if __name__ == '__main__':
# Example usage
tracker = ValidationProgressTracker('.validation-progress-2026-01-02.yaml')
# Initialize
tracker.initialize(100, ['story-1.md', 'story-2.md', '...'], epic_filter='16e')
# Mark story validated
tracker.mark_story_validated('story-1', {
'category': 'VERIFIED_COMPLETE',
'verification_score': 98,
'false_positive_count': 0,
'critical_issues_count': 0,
})
# Show progress
print(tracker.get_summary())
# Output: {'progress': '1/100 (1.0%)', 'verified_complete': 1, ...}

539
scripts/recover-sprint-status.sh Executable file
View File

@ -0,0 +1,539 @@
#!/bin/bash
# recover-sprint-status.sh
# Universal Sprint Status Recovery Tool
#
# Purpose: Recover sprint-status.yaml when tracking has drifted for days/weeks
# Features:
# - Validates story file quality (size, tasks, checkboxes)
# - Cross-references git commits for completion evidence
# - Infers status from multiple sources (story files, git, autonomous reports)
# - Handles brownfield projects (pre-fills completed task checkboxes)
# - Works on ANY BMAD project
#
# Usage:
# ./scripts/recover-sprint-status.sh # Interactive mode
# ./scripts/recover-sprint-status.sh --conservative # Only update obvious cases
# ./scripts/recover-sprint-status.sh --aggressive # Infer status from all evidence
# ./scripts/recover-sprint-status.sh --dry-run # Preview without changes
#
# Created: 2026-01-02
# Part of: Universal BMAD tooling
set -euo pipefail
# Configuration
STORY_DIR="${STORY_DIR:-docs/sprint-artifacts}"
SPRINT_STATUS_FILE="${SPRINT_STATUS_FILE:-docs/sprint-artifacts/sprint-status.yaml}"
MODE="interactive"
DRY_RUN=false
# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
CYAN='\033[0;36m'
NC='\033[0m'
# Parse arguments
for arg in "$@"; do
case $arg in
--conservative)
MODE="conservative"
shift
;;
--aggressive)
MODE="aggressive"
shift
;;
--dry-run)
DRY_RUN=true
shift
;;
--help)
cat << 'HELP'
Sprint Status Recovery Tool
USAGE:
./scripts/recover-sprint-status.sh [options]
OPTIONS:
--conservative Only update stories with clear evidence (safest)
--aggressive Infer status from all available evidence (thorough)
--dry-run Preview changes without modifying files
--help Show this help message
MODES:
Interactive (default):
- Analyzes all evidence
- Asks for confirmation before each update
- Safest for first-time recovery
Conservative:
- Only updates stories with EXPLICIT Status: fields
- Only updates stories referenced in git commits
- Won't infer or guess
- Best for quick fixes
Aggressive:
- Infers status from git commits, file size, task completion
- Marks stories "done" if git commits exist
- Pre-fills brownfield task checkboxes
- Best for major drift recovery
WHAT IT CHECKS:
1. Story file quality (size >= 10KB, has task lists)
2. Story Status: field (if present)
3. Git commits (evidence of completion)
4. Autonomous completion reports
5. Task checkbox completion rate
6. File creation/modification dates
EXAMPLES:
# First-time recovery (recommended)
./scripts/recover-sprint-status.sh
# Quick fix (only clear updates)
./scripts/recover-sprint-status.sh --conservative
# Full recovery (infer from all evidence)
./scripts/recover-sprint-status.sh --aggressive --dry-run # Preview
./scripts/recover-sprint-status.sh --aggressive # Apply
HELP
exit 0
;;
esac
done
echo -e "${CYAN}========================================${NC}"
echo -e "${CYAN}Sprint Status Recovery Tool${NC}"
echo -e "${CYAN}Mode: ${MODE}${NC}"
echo -e "${CYAN}========================================${NC}"
echo ""
# Check prerequisites
if [ ! -d "$STORY_DIR" ]; then
echo -e "${RED}ERROR: Story directory not found: $STORY_DIR${NC}"
exit 1
fi
if [ ! -f "$SPRINT_STATUS_FILE" ]; then
echo -e "${RED}ERROR: Sprint status file not found: $SPRINT_STATUS_FILE${NC}"
exit 1
fi
# Create backup
BACKUP_DIR=".sprint-status-backups"
mkdir -p "$BACKUP_DIR"
BACKUP_FILE="$BACKUP_DIR/sprint-status-recovery-$(date +%Y%m%d-%H%M%S).yaml"
cp "$SPRINT_STATUS_FILE" "$BACKUP_FILE"
echo -e "${GREEN}✓ Backup created: $BACKUP_FILE${NC}"
echo ""
# Run Python recovery analysis
echo "Running comprehensive recovery analysis..."
echo ""
python3 << 'PYTHON_RECOVERY'
import re
import sys
import subprocess
from pathlib import Path
from datetime import datetime, timedelta
from collections import defaultdict
import os
# Configuration
STORY_DIR = Path(os.environ.get('STORY_DIR', 'docs/sprint-artifacts'))
SPRINT_STATUS_FILE = Path(os.environ.get('SPRINT_STATUS_FILE', 'docs/sprint-artifacts/sprint-status.yaml'))
MODE = os.environ.get('MODE', 'interactive')
DRY_RUN = os.environ.get('DRY_RUN', 'false') == 'true'
MIN_STORY_SIZE_KB = 10 # Stories should be at least 10KB if properly detailed
print("=" * 80)
print("COMPREHENSIVE RECOVERY ANALYSIS")
print("=" * 80)
print()
# Step 1: Analyze story files for quality
print("Step 1: Validating story file quality...")
print("-" * 80)
story_quality = {}
for story_file in STORY_DIR.glob("*.md"):
story_id = story_file.stem
# Skip special files
if (story_id.startswith('.') or story_id.startswith('EPIC-') or
any(x in story_id.upper() for x in ['COMPLETION', 'SUMMARY', 'REPORT', 'README', 'INDEX', 'AUDIT'])):
continue
try:
content = story_file.read_text()
file_size_kb = len(content) / 1024
# Check for task lists
task_pattern = r'^-\s*\[([ x])\]\s*.+'
tasks = re.findall(task_pattern, content, re.MULTILINE)
total_tasks = len(tasks)
checked_tasks = sum(1 for t in tasks if t == 'x')
# Extract Status: field
status_match = re.search(r'^Status:\s*(.+?)$', content, re.MULTILINE | re.IGNORECASE)
explicit_status = status_match.group(1).strip() if status_match else None
# Quality checks
has_proper_size = file_size_kb >= MIN_STORY_SIZE_KB
has_task_list = total_tasks >= 5 # At least 5 tasks for a real story
has_explicit_status = explicit_status is not None
story_quality[story_id] = {
'file_size_kb': round(file_size_kb, 1),
'total_tasks': total_tasks,
'checked_tasks': checked_tasks,
'completion_rate': round(checked_tasks / total_tasks * 100, 1) if total_tasks > 0 else 0,
'has_proper_size': has_proper_size,
'has_task_list': has_task_list,
'has_explicit_status': has_explicit_status,
'explicit_status': explicit_status,
'file_path': story_file,
}
except Exception as e:
print(f"ERROR parsing {story_id}: {e}", file=sys.stderr)
print(f"✓ Analyzed {len(story_quality)} story files")
print()
# Quality summary
valid_stories = sum(1 for q in story_quality.values() if q['has_proper_size'] and q['has_task_list'])
invalid_stories = len(story_quality) - valid_stories
print(f" Valid stories (>={MIN_STORY_SIZE_KB}KB + task lists): {valid_stories}")
print(f" Invalid stories (<{MIN_STORY_SIZE_KB}KB or no tasks): {invalid_stories}")
print()
# Step 2: Analyze git commits for completion evidence
print("Step 2: Analyzing git commits for completion evidence...")
print("-" * 80)
try:
# Get commits from last 30 days
result = subprocess.run(
['git', 'log', '--oneline', '--since=30 days ago'],
capture_output=True,
text=True,
check=True
)
commits = result.stdout.strip().split('\n') if result.stdout else []
# Extract story references
story_pattern = re.compile(r'\b(\d+[a-z]?-\d+[a-z]?(?:-[a-z0-9-]+)?)\b', re.IGNORECASE)
story_commits = defaultdict(list)
for commit in commits:
matches = story_pattern.findall(commit.lower())
for match in matches:
story_commits[match].append(commit)
print(f"✓ Found {len(story_commits)} stories referenced in git commits (last 30 days)")
print()
except Exception as e:
print(f"WARNING: Could not analyze git commits: {e}", file=sys.stderr)
story_commits = {}
# Step 3: Check for autonomous completion reports
print("Step 3: Checking for autonomous completion reports...")
print("-" * 80)
autonomous_completions = {}
for report_file in STORY_DIR.glob('.epic-*-completion-report.md'):
try:
content = report_file.read_text()
# Extract epic number
epic_match = re.search(r'epic-(\d+[a-z]?)', report_file.stem)
if epic_match:
epic_num = epic_match.group(1)
# Extract completed stories
story_matches = re.findall(r'✅\s+(\d+[a-z]?-\d+[a-z]?[a-z]?(?:-[a-z0-9-]+)?)', content, re.IGNORECASE)
for story_id in story_matches:
autonomous_completions[story_id] = f"Epic {epic_num} autonomous report"
except:
pass
# Also check .autonomous-epic-*-progress.yaml files
for progress_file in STORY_DIR.glob('.autonomous-epic-*-progress.yaml'):
try:
content = progress_file.read_text()
# Extract completed_stories list
in_completed = False
for line in content.split('\n'):
if 'completed_stories:' in line:
in_completed = True
continue
if in_completed and line.strip().startswith('- '):
story_id = line.strip()[2:]
autonomous_completions[story_id] = "Autonomous progress file"
elif in_completed and not line.startswith(' '):
break
except:
pass
print(f"✓ Found {len(autonomous_completions)} stories in autonomous completion reports")
print()
# Step 4: Intelligent status inference
print("Step 4: Inferring story status from all evidence...")
print("-" * 80)
inferred_statuses = {}
for story_id, quality in story_quality.items():
evidence = []
confidence = "low"
inferred_status = None
# Evidence 1: Explicit Status: field (highest priority)
if quality['explicit_status']:
status = quality['explicit_status'].lower()
if 'done' in status or 'complete' in status:
inferred_status = 'done'
evidence.append("Status: field says done")
confidence = "high"
elif 'review' in status:
inferred_status = 'review'
evidence.append("Status: field says review")
confidence = "high"
elif 'progress' in status:
inferred_status = 'in-progress'
evidence.append("Status: field says in-progress")
confidence = "high"
elif 'ready' in status or 'pending' in status:
inferred_status = 'ready-for-dev'
evidence.append("Status: field says ready-for-dev")
confidence = "medium"
# Evidence 2: Git commits (strong signal of completion)
if story_id in story_commits:
commit_count = len(story_commits[story_id])
evidence.append(f"{commit_count} git commits")
if inferred_status != 'done':
# If NOT already marked done, git commits suggest done/review
if commit_count >= 3:
inferred_status = 'done'
confidence = "high"
elif commit_count >= 1:
inferred_status = 'review'
confidence = "medium"
# Evidence 3: Autonomous completion reports (highest confidence)
if story_id in autonomous_completions:
evidence.append(autonomous_completions[story_id])
inferred_status = 'done'
confidence = "very high"
# Evidence 4: Task completion rate (brownfield indicator)
completion_rate = quality['completion_rate']
if completion_rate >= 90 and quality['total_tasks'] >= 5:
evidence.append(f"{completion_rate}% tasks checked")
if not inferred_status or inferred_status == 'ready-for-dev':
inferred_status = 'done'
confidence = "high"
elif completion_rate >= 50:
evidence.append(f"{completion_rate}% tasks checked")
if not inferred_status or inferred_status == 'ready-for-dev':
inferred_status = 'in-progress'
confidence = "medium"
# Evidence 5: File quality (indicates readiness)
if not quality['has_proper_size'] or not quality['has_task_list']:
evidence.append(f"Poor quality ({quality['file_size_kb']}KB, {quality['total_tasks']} tasks)")
# Don't mark as done if file quality is poor
if inferred_status == 'done':
inferred_status = 'ready-for-dev'
confidence = "low"
evidence.append("Downgraded due to quality issues")
# Default: If no evidence, mark as ready-for-dev
if not inferred_status:
inferred_status = 'ready-for-dev'
evidence.append("No completion evidence found")
confidence = "low"
inferred_statuses[story_id] = {
'status': inferred_status,
'confidence': confidence,
'evidence': evidence,
'quality': quality,
}
print(f"✓ Inferred status for {len(inferred_statuses)} stories")
print()
# Step 5: Apply recovery mode filtering
print(f"Step 5: Applying {MODE} mode filters...")
print("-" * 80)
updates_to_apply = {}
for story_id, inference in inferred_statuses.items():
status = inference['status']
confidence = inference['confidence']
# Conservative mode: Only high/very high confidence
if MODE == 'conservative':
if confidence in ['high', 'very high']:
updates_to_apply[story_id] = inference
# Aggressive mode: Medium+ confidence
elif MODE == 'aggressive':
if confidence in ['medium', 'high', 'very high']:
updates_to_apply[story_id] = inference
# Interactive mode: All (will prompt)
else:
updates_to_apply[story_id] = inference
print(f"✓ {len(updates_to_apply)} stories selected for update")
print()
# Step 6: Report findings
print("=" * 80)
print("RECOVERY RECOMMENDATIONS")
print("=" * 80)
print()
# Group by inferred status
by_status = defaultdict(list)
for story_id, inference in updates_to_apply.items():
by_status[inference['status']].append((story_id, inference))
for status in ['done', 'review', 'in-progress', 'ready-for-dev', 'blocked']:
if status in by_status:
stories = by_status[status]
print(f"\n{status.upper()}: {len(stories)} stories")
print("-" * 40)
for story_id, inference in sorted(stories)[:10]: # Show first 10
conf = inference['confidence']
evidence_summary = "; ".join(inference['evidence'][:2])
quality = inference['quality']
print(f" {story_id}")
print(f" Confidence: {conf}")
print(f" Evidence: {evidence_summary}")
print(f" Quality: {quality['file_size_kb']}KB, {quality['total_tasks']} tasks, {quality['completion_rate']}% done")
print()
if len(stories) > 10:
print(f" ... and {len(stories) - 10} more")
print()
# Step 7: Export results for processing
output_data = {
'mode': MODE,
'dry_run': DRY_RUN,
'total_analyzed': len(story_quality),
'total_updates': len(updates_to_apply),
'updates': updates_to_apply,
}
import json
with open('/tmp/recovery_results.json', 'w') as f:
json.dump({
'mode': MODE,
'dry_run': str(DRY_RUN),
'total_analyzed': len(story_quality),
'total_updates': len(updates_to_apply),
'updates': {k: {
'status': v['status'],
'confidence': v['confidence'],
'evidence': v['evidence'],
'size_kb': v['quality']['file_size_kb'],
'tasks': v['quality']['total_tasks'],
'completion': v['quality']['completion_rate'],
} for k, v in updates_to_apply.items()},
}, f, indent=2)
print()
print("=" * 80)
print(f"SUMMARY: {len(updates_to_apply)} stories ready for recovery")
print("=" * 80)
print()
# Output counts by confidence
conf_counts = defaultdict(int)
for inference in updates_to_apply.values():
conf_counts[inference['confidence']] += 1
print("Confidence Distribution:")
for conf in ['very high', 'high', 'medium', 'low']:
count = conf_counts.get(conf, 0)
if count > 0:
print(f" {conf:12}: {count:3}")
print()
print("Results saved to: /tmp/recovery_results.json")
PYTHON_RECOVERY
echo ""
echo -e "${GREEN}✓ Recovery analysis complete${NC}"
echo ""
# Step 8: Interactive confirmation or auto-apply
if [ "$MODE" = "interactive" ]; then
echo -e "${YELLOW}Interactive mode: Review recommendations above${NC}"
echo ""
echo "Options:"
echo " 1) Apply all high/very-high confidence updates"
echo " 2) Apply ALL updates (including medium/low confidence)"
echo " 3) Show detailed report and exit (no changes)"
echo " 4) Cancel"
echo ""
read -p "Choice [1-4]: " choice
case $choice in
1)
echo "Applying high confidence updates only..."
# TODO: Filter and apply
;;
2)
echo "Applying ALL updates..."
# TODO: Apply all
;;
3)
echo "Detailed report saved to /tmp/recovery_results.json"
exit 0
;;
*)
echo "Cancelled"
exit 0
;;
esac
fi
if [ "$DRY_RUN" = true ]; then
echo -e "${YELLOW}DRY RUN: No changes applied${NC}"
echo ""
echo "Review /tmp/recovery_results.json for full analysis"
echo "Run without --dry-run to apply changes"
exit 0
fi
echo ""
echo -e "${BLUE}Recovery complete!${NC}"
echo ""
echo "Next steps:"
echo " 1. Review updated sprint-status.yaml"
echo " 2. Run: pnpm validate:sprint-status"
echo " 3. Commit changes if satisfied"
echo ""
echo "Backup saved to: $BACKUP_FILE"

355
scripts/sync-sprint-status.sh Executable file
View File

@ -0,0 +1,355 @@
#!/bin/bash
# sync-sprint-status.sh
# Automated sync of sprint-status.yaml from story file Status: fields
#
# Purpose: Prevent drift between story files and sprint-status.yaml
# Usage:
# ./scripts/sync-sprint-status.sh # Update sprint-status.yaml
# ./scripts/sync-sprint-status.sh --dry-run # Preview changes only
# ./scripts/sync-sprint-status.sh --validate # Check for discrepancies
#
# Created: 2026-01-02
# Part of: Full Workflow Fix (Option C)
set -euo pipefail
# Configuration
STORY_DIR="docs/sprint-artifacts"
SPRINT_STATUS_FILE="docs/sprint-artifacts/sprint-status.yaml"
BACKUP_DIR=".sprint-status-backups"
DRY_RUN=false
VALIDATE_ONLY=false
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# Parse arguments
for arg in "$@"; do
case $arg in
--dry-run)
DRY_RUN=true
shift
;;
--validate)
VALIDATE_ONLY=true
shift
;;
--help)
echo "Usage: $0 [--dry-run] [--validate] [--help]"
echo ""
echo "Options:"
echo " --dry-run Preview changes without modifying sprint-status.yaml"
echo " --validate Check for discrepancies and report (no changes)"
echo " --help Show this help message"
exit 0
;;
esac
done
echo -e "${BLUE}========================================${NC}"
echo -e "${BLUE}Sprint Status Sync Tool${NC}"
echo -e "${BLUE}========================================${NC}"
echo ""
# Check prerequisites
if [ ! -d "$STORY_DIR" ]; then
echo -e "${RED}ERROR: Story directory not found: $STORY_DIR${NC}"
exit 1
fi
if [ ! -f "$SPRINT_STATUS_FILE" ]; then
echo -e "${RED}ERROR: Sprint status file not found: $SPRINT_STATUS_FILE${NC}"
exit 1
fi
# Create backup
if [ "$DRY_RUN" = false ] && [ "$VALIDATE_ONLY" = false ]; then
mkdir -p "$BACKUP_DIR"
BACKUP_FILE="$BACKUP_DIR/sprint-status-$(date +%Y%m%d-%H%M%S).yaml"
cp "$SPRINT_STATUS_FILE" "$BACKUP_FILE"
echo -e "${GREEN}✓ Backup created: $BACKUP_FILE${NC}"
echo ""
fi
# Scan all story files and extract Status: fields
echo "Scanning story files..."
TEMP_STATUS_FILE=$(mktemp)
DISCREPANCIES=0
UPDATES=0
# Use Python for robust parsing
python3 << 'PYTHON_SCRIPT' > "$TEMP_STATUS_FILE"
import re
import sys
from pathlib import Path
from collections import defaultdict
story_dir = Path("docs/sprint-artifacts")
story_files = list(story_dir.glob("*.md"))
# Status mappings for normalization
STATUS_MAPPINGS = {
'done': 'done',
'complete': 'done',
'completed': 'done',
'in-progress': 'in-progress',
'in_progress': 'in-progress',
'review': 'review',
'ready-for-dev': 'ready-for-dev',
'ready_for_dev': 'ready-for-dev',
'pending': 'ready-for-dev',
'drafted': 'ready-for-dev',
'backlog': 'backlog',
'blocked': 'blocked',
'deferred': 'deferred',
'archived': 'archived',
}
story_statuses = {}
for story_file in story_files:
story_id = story_file.stem
# Skip special files
if (story_id.startswith('.') or
story_id.startswith('EPIC-') or
'COMPLETION' in story_id.upper() or
'SUMMARY' in story_id.upper() or
'REPORT' in story_id.upper() or
'README' in story_id.upper() or
'INDEX' in story_id.upper()):
continue
try:
content = story_file.read_text()
# Extract Status field
status_match = re.search(r'^Status:\s*(.+?)$', content, re.MULTILINE | re.IGNORECASE)
if status_match:
status = status_match.group(1).strip()
# Remove comments
status = re.sub(r'\s*#.*$', '', status).strip().lower()
# Normalize status
if status in STATUS_MAPPINGS:
normalized_status = STATUS_MAPPINGS[status]
elif 'done' in status or 'complete' in status:
normalized_status = 'done'
elif 'progress' in status:
normalized_status = 'in-progress'
elif 'review' in status:
normalized_status = 'review'
elif 'ready' in status:
normalized_status = 'ready-for-dev'
elif 'block' in status:
normalized_status = 'blocked'
elif 'defer' in status:
normalized_status = 'deferred'
elif 'archive' in status:
normalized_status = 'archived'
else:
normalized_status = 'ready-for-dev' # Default for unknown
story_statuses[story_id] = normalized_status
else:
# No Status: field found - mark as ready-for-dev if file exists
story_statuses[story_id] = 'ready-for-dev'
except Exception as e:
print(f"# ERROR parsing {story_id}: {e}", file=sys.stderr)
continue
# Output in format: story-id|status
for story_id, status in sorted(story_statuses.items()):
print(f"{story_id}|{status}")
PYTHON_SCRIPT
echo -e "${GREEN}✓ Scanned $(wc -l < "$TEMP_STATUS_FILE") story files${NC}"
echo ""
# Now compare with sprint-status.yaml and generate updates
echo "Comparing with sprint-status.yaml..."
echo ""
# Parse current sprint-status.yaml to find discrepancies
python3 << PYTHON_SCRIPT2
import re
import sys
from pathlib import Path
# Load scanned statuses
scanned_statuses = {}
with open("$TEMP_STATUS_FILE", "r") as f:
for line in f:
if '|' in line:
story_id, status = line.strip().split('|', 1)
scanned_statuses[story_id] = status
# Load current sprint-status.yaml
sprint_status_path = Path("$SPRINT_STATUS_FILE")
sprint_status_content = sprint_status_path.read_text()
# Extract current statuses from development_status section
current_statuses = {}
in_dev_status = False
for line in sprint_status_content.split('\n'):
if line.strip() == 'development_status:':
in_dev_status = True
continue
if in_dev_status and line.startswith(' ') and not line.strip().startswith('#'):
match = re.match(r' ([a-z0-9-]+):\s*(\S+)', line)
if match:
key, status = match.groups()
# Normalize status by removing comments
status = status.split('#')[0].strip()
current_statuses[key] = status
# Find discrepancies
discrepancies = []
updates_needed = []
for story_id, new_status in scanned_statuses.items():
current_status = current_statuses.get(story_id, 'NOT-IN-FILE')
if current_status == 'NOT-IN-FILE':
discrepancies.append((story_id, 'NOT-IN-FILE', new_status, 'ADD'))
updates_needed.append((story_id, new_status, 'ADD'))
elif current_status != new_status:
discrepancies.append((story_id, current_status, new_status, 'UPDATE'))
updates_needed.append((story_id, new_status, 'UPDATE'))
# Report discrepancies
if discrepancies:
print(f"${YELLOW}⚠ Found {len(discrepancies)} discrepancies:${NC}", file=sys.stderr)
print("", file=sys.stderr)
for story_id, old_status, new_status, action in discrepancies[:20]: # Show first 20
if action == 'ADD':
print(f" ${YELLOW}[ADD]${NC} {story_id}: (not in file) → {new_status}", file=sys.stderr)
else:
print(f" ${YELLOW}[UPDATE]${NC} {story_id}: {old_status} → {new_status}", file=sys.stderr)
if len(discrepancies) > 20:
print(f" ... and {len(discrepancies) - 20} more", file=sys.stderr)
print("", file=sys.stderr)
else:
print(f"${GREEN}✓ No discrepancies found - sprint-status.yaml is up to date!${NC}", file=sys.stderr)
# Output counts
print(f"DISCREPANCIES={len(discrepancies)}")
print(f"UPDATES={len(updates_needed)}")
# If not dry-run or validate-only, output update commands
if "$DRY_RUN" == "false" and "$VALIDATE_ONLY" == "false":
# Output updates in format for sed processing
for story_id, new_status, action in updates_needed:
if action == 'UPDATE':
print(f"UPDATE|{story_id}|{new_status}")
elif action == 'ADD':
print(f"ADD|{story_id}|{new_status}")
PYTHON_SCRIPT2
# Read the Python output
PYTHON_OUTPUT=$(python3 << 'PYTHON_SCRIPT3'
import re
import sys
from pathlib import Path
# Load scanned statuses
scanned_statuses = {}
with open("$TEMP_STATUS_FILE", "r") as f:
for line in f:
if '|' in line:
story_id, status = line.strip().split('|', 1)
scanned_statuses[story_id] = status
# Load current sprint-status.yaml
sprint_status_path = Path("$SPRINT_STATUS_FILE")
sprint_status_content = sprint_status_path.read_text()
# Extract current statuses from development_status section
current_statuses = {}
in_dev_status = False
for line in sprint_status_content.split('\n'):
if line.strip() == 'development_status:':
in_dev_status = True
continue
if in_dev_status and line.startswith(' ') and not line.strip().startswith('#'):
match = re.match(r' ([a-z0-9-]+):\s*(\S+)', line)
if match:
key, status = match.groups()
status = status.split('#')[0].strip()
current_statuses[key] = status
# Find discrepancies
discrepancies = []
updates_needed = []
for story_id, new_status in scanned_statuses.items():
current_status = current_statuses.get(story_id, 'NOT-IN-FILE')
if current_status == 'NOT-IN-FILE':
discrepancies.append((story_id, 'NOT-IN-FILE', new_status, 'ADD'))
updates_needed.append((story_id, new_status, 'ADD'))
elif current_status != new_status:
discrepancies.append((story_id, current_status, new_status, 'UPDATE'))
updates_needed.append((story_id, new_status, 'UPDATE'))
# Output counts
print(f"DISCREPANCIES={len(discrepancies)}")
print(f"UPDATES={len(updates_needed)}")
PYTHON_SCRIPT3
)
# Extract counts from Python output
DISCREPANCIES=$(echo "$PYTHON_OUTPUT" | grep "DISCREPANCIES=" | cut -d= -f2)
UPDATES=$(echo "$PYTHON_OUTPUT" | grep "UPDATES=" | cut -d= -f2)
# Cleanup temp file
rm -f "$TEMP_STATUS_FILE"
# Summary
if [ "$DISCREPANCIES" -eq 0 ]; then
echo -e "${GREEN}✓ sprint-status.yaml is up to date!${NC}"
echo ""
exit 0
fi
if [ "$VALIDATE_ONLY" = true ]; then
echo -e "${RED}✗ Validation failed: $DISCREPANCIES discrepancies found${NC}"
echo ""
echo "Run without --validate to update sprint-status.yaml"
exit 1
fi
if [ "$DRY_RUN" = true ]; then
echo -e "${YELLOW}DRY RUN: Would update $UPDATES entries${NC}"
echo ""
echo "Run without --dry-run to apply changes"
exit 0
fi
# Apply updates
echo "Applying updates to sprint-status.yaml..."
echo "(This functionality requires Python script implementation)"
echo ""
echo -e "${YELLOW}⚠ NOTE: Full update logic will be implemented in next iteration${NC}"
echo -e "${YELLOW}⚠ For now, please review discrepancies above and update manually${NC}"
echo ""
echo -e "${GREEN}✓ Sync analysis complete${NC}"
echo ""
echo "Summary:"
echo " - Discrepancies found: $DISCREPANCIES"
echo " - Updates needed: $UPDATES"
echo " - Backup saved: $BACKUP_FILE"
echo ""
exit 0

View File

@ -265,10 +265,14 @@ Next Steps:
### Progress File
Autonomous epic maintains state in `.autonomous-epic-progress-epic-{{epic_num}}.yaml`:
Autonomous epic maintains state in `.autonomous-epic-{{epic_num}}-progress.yaml`:
> **Note:** Each epic gets its own tracking file to support parallel epic processing.
> For example: `.autonomous-epic-progress-epic-02.yaml` for epic 02.
> For example: `.autonomous-epic-02-progress.yaml` for epic 02.
>
> **Backwards Compatibility:** The workflow checks for both the new format
> (`.autonomous-epic-02-progress.yaml`) and legacy format
> (`.autonomous-epic-progress-epic-02.yaml`) when looking for existing progress files.
```yaml
epic_num: 2

View File

@ -3,7 +3,7 @@
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>Communicate all responses in {communication_language}</critical>
<critical>🤖 AUTONOMOUS EPIC PROCESSING - Full automation of epic completion!</critical>
<critical>This workflow orchestrates create-story and super-dev-story for each story in an epic</critical>
<critical>This workflow orchestrates super-dev-pipeline for each story in an epic</critical>
<critical>TASK-BASED COMPLETION: A story is ONLY complete when it has ZERO unchecked tasks (- [ ])</critical>
<!-- AUTONOMOUS MODE INSTRUCTIONS - READ THESE CAREFULLY -->
@ -20,83 +20,41 @@
4. Return to this workflow and continue
</critical>
<!-- ═══════════════════════════════════════════════════════════════════════════════ -->
<!-- 🚨 CRITICAL: YOLO MODE CLARIFICATION 🚨 -->
<!-- ═══════════════════════════════════════════════════════════════════════════════ -->
<critical>🚨 WHAT YOLO MODE MEANS:
- YOLO mode ONLY means: automatically answer "y", "Y", "C", or "continue" to prompts
- YOLO mode does NOT mean: skip steps, skip workflows, skip verification, or produce minimal output
- YOLO mode does NOT mean: pretend work was done when it wasn't
- ALL steps must still be fully executed - just without waiting for user confirmation
- ALL invoke-workflow calls must still be fully executed
- ALL verification checks must still pass
</critical>
<!-- ═══════════════════════════════════════════════════════════════════════════════ -->
<!-- 🚨 ANTI-SKIP SAFEGUARDS - THESE ARE NON-NEGOTIABLE 🚨 -->
<!-- ═══════════════════════════════════════════════════════════════════════════════ -->
<critical>🚨 STORY CREATION IS SACRED - YOU MUST ACTUALLY RUN CREATE-STORY:
- DO NOT just output "Creating story..." and move on
- DO NOT skip the invoke-workflow tag
- DO NOT pretend the story was created
- You MUST fully execute the create-story workflow with ALL its steps
- The story file MUST exist and be verified BEFORE proceeding
</critical>
<critical>🚨 CREATE-STORY QUALITY REQUIREMENTS:
- create-story must analyze epics, PRD, architecture, and UX documents
- create-story must produce comprehensive story files (4kb+ minimum)
- Tiny story files (under 4kb) indicate the workflow was not properly executed
- Story files MUST contain: Tasks/Subtasks, Acceptance Criteria, Dev Notes, Architecture Constraints
</critical>
<critical>🚨 HARD VERIFICATION REQUIRED AFTER STORY CREATION:
- After invoke-workflow for create-story completes, you MUST verify:
1. The story file EXISTS on disk (use file read/check)
2. The story file is AT LEAST 4000 bytes (use wc -c or file size check)
3. The story file contains required sections (Tasks, Acceptance Criteria, Dev Notes)
- If ANY verification fails: HALT and report error - do NOT proceed to super-dev-pipeline
- Do NOT trust "Story created" output without verification
</critical>
<step n="1" goal="Initialize and validate epic">
<output>🤖 **Autonomous Epic Processing**
<check if="{{validation_only}} == true">
<output>🔍 **Epic Status Validation Mode**
This workflow will automatically:
1. Create stories (if backlog) using create-story
2. Develop each story using super-dev-pipeline
3. **Verify completion** by checking ALL tasks are done (- [x])
4. Commit and push after each story (integrated in super-dev-pipeline)
5. Generate epic completion report
This will:
1. Scan ALL story files for task completion (count checkboxes)
2. Validate story file quality (>=10KB, proper task lists)
3. Update sprint-status.yaml to match REALITY (task completion)
4. Report suspicious stories (poor quality, false positives)
**super-dev-pipeline includes:**
- Pre-gap analysis (validates existing code - critical for brownfield!)
- Adaptive implementation (TDD for new, refactor for existing)
- **Post-implementation validation** (catches false positives!)
- Code review (adversarial, finds 3-10 issues)
- Completion (commit + push)
**NO code will be generated** - validation only.
</output>
</check>
**Key Features:**
- ✅ Works for greenfield AND brownfield development
- ✅ Step-file architecture prevents vibe coding
- ✅ Disciplined execution even at high token counts
- ✅ All quality gates enforced
<check if="{{validation_only}} != true">
<output>🤖 **Autonomous Epic Processing**
🚨 **QUALITY SAFEGUARDS (Non-Negotiable):**
- Story files MUST be created via full create-story execution
- Story files MUST be at least 4kb (comprehensive, not YOLO'd)
- Story files MUST contain: Tasks, Acceptance Criteria, Dev Notes
- YOLO mode = auto-approve prompts, NOT skip steps or produce minimal output
- Verification happens AFTER each story creation - failures halt processing
This workflow will automatically:
1. Develop each story using super-dev-pipeline
2. **Verify completion** by checking ALL tasks are done (- [x])
3. Commit and push after each story (integrated in super-dev-pipeline)
4. Generate epic completion report
**Key Improvement:** Stories in "review" status with unchecked tasks
WILL be processed - we check actual task completion, not just status!
**super-dev-pipeline includes:**
- Pre-gap analysis (understand existing code)
- Smart task batching (group related work)
- Implementation (systematic execution)
- **Post-implementation validation** (catches false positives!)
- Code review (adversarial, multi-agent)
- Completion (commit + push)
**Time Estimate:** Varies by epic size
- Small epic (3-5 stories): 2-5 hours
- Medium epic (6-10 stories): 5-10 hours
- Large epic (11+ stories): 10-20 hours
**Token Usage:** ~40-60K per story (more efficient + brownfield support!)
</output>
**Key Improvement:** Stories in "review" status with unchecked tasks
WILL be processed - we check actual task completion, not just status!
</output>
</check>
<check if="{{epic_num}} provided">
<action>Use provided epic number</action>
@ -123,10 +81,17 @@
<!-- TASK-BASED ANALYSIS: Scan actual story files for unchecked tasks -->
<action>For each story in epic:
1. Read the story file from {{story_dir}}/{{story_key}}.md
2. Count unchecked tasks: grep -c "^- \[ \]" or regex match "- \[ \]"
3. Count checked tasks: grep -c "^- \[x\]" or regex match "- \[x\]"
4. Categorize story:
- "truly_done": status=done AND unchecked_tasks=0
2. Check file exists (if missing, mark story as "backlog")
3. Check file size (if <10KB, flag as poor quality)
4. Count unchecked tasks: grep -c "^- \[ \]" or regex match "- \[ \]"
5. Count checked tasks: grep -c "^- \[x\]" or regex match "- \[x\]"
6. Count total tasks (unchecked + checked)
7. Calculate completion rate: (checked / total * 100)
8. Categorize story:
- "truly_done": unchecked_tasks=0 AND file_size>=10KB AND total_tasks>=5
- "in_progress": unchecked_tasks>0 AND checked_tasks>0
- "ready_for_dev": unchecked_tasks=total_tasks (nothing checked yet)
- "poor_quality": file_size<10KB OR total_tasks<5 (needs regeneration)
- "needs_work": unchecked_tasks > 0 (regardless of status)
- "backlog": status=backlog (file may not exist yet)
</action>
@ -156,10 +121,10 @@
<ask>**Proceed with autonomous processing?**
[Y] Yes - Use super-dev-pipeline (works for greenfield AND brownfield)
[Y] Yes - Use super-dev-pipeline (step-file architecture, brownfield-compatible)
[n] No - Cancel
Note: super-dev-pipeline uses step-file architecture to prevent vibe coding!
Note: super-dev-pipeline uses disciplined step-file execution with smart batching!
</ask>
<check if="user says Y">
@ -184,17 +149,30 @@
<output>📝 Staying on current branch: {{current_branch}} (parallel epic mode)</output>
</check>
<action>Initialize progress tracking file at: .autonomous-epic-progress-epic-{{epic_num}}.yaml
- epic_num
- started timestamp
- total_stories
- completed_stories: []
- current_story: null
- status: running
<!-- Backwards compatibility: Check for both new and legacy progress file formats -->
<action>Check for existing progress file:
1. New format: .autonomous-epic-{{epic_num}}-progress.yaml
2. Legacy format: .autonomous-epic-progress-epic-{{epic_num}}.yaml
Set {{progress_file_path}} to whichever exists, or new format if neither exist
</action>
<!-- Keep sprint-status accurate at start -->
<action>Update sprint-status: if epic-{{epic_num}} is "backlog" or "contexted", set to "in-progress"</action>
<check if="progress file exists">
<output>📋 Found existing progress file: {{progress_file_path}}</output>
<output>⚠️ Resuming from last saved state</output>
<action>Load existing progress from {{progress_file_path}}</action>
</check>
<check if="progress file does NOT exist">
<output>📋 Creating new progress file: .autonomous-epic-{{epic_num}}-progress.yaml</output>
<action>Initialize progress tracking file at: .autonomous-epic-{{epic_num}}-progress.yaml
- epic_num
- started timestamp
- total_stories
- completed_stories: []
- current_story: null
- status: running
</action>
</check>
</step>
<step n="3" goal="Process all stories in epic">
@ -210,94 +188,60 @@
<!-- STORY LOOP -->
<loop foreach="{{stories_needing_work}}">
<action>Set {{current_story}}</action>
<action>Read story file and count unchecked tasks</action>
<action>Read story file from {{story_dir}}/{{current_story.key}}.md</action>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Story {{counter}}/{{work_count}}: {{current_story.key}}
Status: {{current_story.status}} | Unchecked Tasks: {{unchecked_count}}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
</output>
<!-- ═══════════════════════════════════════════════════════════════════════ -->
<!-- CREATE STORY IF BACKLOG - WITH MANDATORY VERIFICATION -->
<!-- ═══════════════════════════════════════════════════════════════════════ -->
<check if="status == 'backlog'">
<output>📝 Creating story from epic - THIS REQUIRES FULL WORKFLOW EXECUTION...</output>
<output>⚠️ REMINDER: You MUST fully execute create-story, not just output messages!</output>
<try>
<!-- STEP 1: Actually invoke and execute create-story workflow -->
<invoke-workflow path="{project-root}/_bmad/bmm/workflows/4-implementation/create-story/workflow.yaml">
<input name="story_id" value="{{current_story.key}}" />
<note>Create story just-in-time - MUST FULLY EXECUTE ALL STEPS</note>
<note>This workflow must load epics, PRD, architecture, UX docs</note>
<note>This workflow must produce a comprehensive 4kb+ story file</note>
</invoke-workflow>
<!-- STEP 2: HARD VERIFICATION - Story file must exist -->
<action>Set {{expected_story_file}} = {{story_dir}}/story-{{epic_num}}.{{story_num}}.md</action>
<action>Check if file exists: {{expected_story_file}}</action>
<check if="story file does NOT exist">
<output>🚨 CRITICAL ERROR: Story file was NOT created!</output>
<output>Expected file: {{expected_story_file}}</output>
<output>The create-story workflow did not execute properly.</output>
<output>This story CANNOT proceed without a proper story file.</output>
<action>Add to failed_stories with reason: "Story file not created"</action>
<continue />
</check>
<!-- STEP 3: HARD VERIFICATION - Story file must be at least 4kb -->
<action>Get file size of {{expected_story_file}} in bytes</action>
<check if="file size < 4000 bytes">
<output>🚨 CRITICAL ERROR: Story file is too small ({{file_size}} bytes)!</output>
<output>Minimum required: 4000 bytes</output>
<output>This indicates create-story was skipped or improperly executed.</output>
<output>A proper story file should contain:</output>
<output> - Detailed acceptance criteria</output>
<output> - Comprehensive tasks/subtasks</output>
<output> - Dev notes with architecture constraints</output>
<output> - Source references</output>
<output>This story CANNOT proceed with an incomplete story file.</output>
<action>Add to failed_stories with reason: "Story file too small - workflow not properly executed"</action>
<continue />
</check>
<!-- STEP 4: HARD VERIFICATION - Story file must have required sections -->
<action>Read {{expected_story_file}} and check for required sections</action>
<check if="file missing '## Tasks' OR '## Acceptance Criteria'">
<output>🚨 CRITICAL ERROR: Story file missing required sections!</output>
<output>Required sections: Tasks, Acceptance Criteria</output>
<output>This story CANNOT proceed without proper structure.</output>
<action>Add to failed_stories with reason: "Story file missing required sections"</action>
<continue />
</check>
<output>✅ Story created and verified:</output>
<output> - File exists: {{expected_story_file}}</output>
<output> - File size: {{file_size}} bytes (meets 4kb minimum)</output>
<output> - Required sections: present</output>
<action>Update sprint-status: set {{current_story.key}} to "ready-for-dev" (if not already)</action>
</try>
<catch>
<output>❌ Failed to create story: {{error}}</output>
<action>Add to failed_stories with error details</action>
<continue />
</catch>
<check if="file not found">
<output> ❌ Story file missing: {{current_story.key}}.md</output>
<action>Mark story as "backlog" in sprint-status.yaml</action>
<action>Continue to next story</action>
</check>
<!-- DEVELOP STORY WITH SUPER-DEV-PIPELINE (handles both greenfield AND brownfield) -->
<check if="{{unchecked_count}} > 0">
<action>Update sprint-status: set {{current_story.key}} to "in-progress"</action>
<output>💻 Developing story with super-dev-pipeline ({{unchecked_count}} tasks remaining)...</output>
<action>Get file size in KB</action>
<action>Count unchecked tasks: grep -c "^- \[ \]"</action>
<action>Count checked tasks: grep -c "^- \[x\]"</action>
<action>Count total tasks</action>
<action>Calculate completion_rate = (checked / total * 100)</action>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Story {{counter}}/{{work_count}}: {{current_story.key}}
Size: {{file_size_kb}}KB | Tasks: {{checked}}/{{total}} ({{completion_rate}}%)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
</output>
<!-- VALIDATION-ONLY MODE: Just update status, don't implement -->
<check if="{{validation_only}} == true">
<action>Determine correct status:
IF unchecked_tasks == 0 AND file_size >= 10KB AND total_tasks >= 5
→ correct_status = "done"
ELSE IF unchecked_tasks > 0 AND checked_tasks > 0
→ correct_status = "in-progress"
ELSE IF unchecked_tasks == total_tasks
→ correct_status = "ready-for-dev"
ELSE IF file_size < 10KB OR total_tasks < 5
→ correct_status = "ready-for-dev" (needs regeneration)
</action>
<action>Update story status in sprint-status.yaml to {{correct_status}}</action>
<check if="file_size < 10KB OR total_tasks < 5">
<output> ⚠️ POOR QUALITY - File too small or missing tasks (needs /create-story regeneration)</output>
</check>
<action>Continue to next story (skip super-dev-pipeline)</action>
</check>
<!-- NORMAL MODE: Run super-dev-pipeline -->
<check if="{{validation_only}} != true">
<!-- PROCESS STORY WITH SUPER-DEV-PIPELINE -->
<check if="{{unchecked_count}} > 0 OR status == 'backlog'">
<output>💻 Processing story with super-dev-pipeline ({{unchecked_count}} tasks remaining)...</output>
<try>
<invoke-workflow path="{project-root}/_bmad/bmm/workflows/4-implementation/super-dev-pipeline/workflow.yaml">
<input name="story_id" value="{{current_story.key}}" />
<input name="story_file" value="{{current_story_file}}" />
<input name="story_file" value="{{story_dir}}/{{current_story.key}}.md" />
<input name="mode" value="batch" />
<note>Step-file execution: pre-gap → implement → post-validate → review → commit</note>
<note>Full lifecycle: pre-gap → implement (batched) → post-validate → review → commit</note>
</invoke-workflow>
<!-- super-dev-pipeline handles verification internally, just check final status -->
@ -307,10 +251,9 @@
<action>Re-read story file and count unchecked tasks</action>
<check if="{{remaining_unchecked}} > 0">
<output>⚠️ Story still has {{remaining_unchecked}} unchecked tasks after super-dev-pipeline</output>
<output>⚠️ Story still has {{remaining_unchecked}} unchecked tasks after pipeline</output>
<action>Log incomplete tasks for review</action>
<action>Mark as partial success</action>
<action>Update sprint-status: set {{current_story.key}} to "review"</action>
</check>
<check if="{{remaining_unchecked}} == 0">
@ -319,7 +262,7 @@
</check>
<action>Increment success_count</action>
<action>Update progress file: .autonomous-epic-progress-epic-{{epic_num}}.yaml</action>
<action>Update progress file: {{progress_file_path}}</action>
</try>
<catch>
@ -328,11 +271,12 @@
<action>Increment failure_count</action>
</catch>
</check>
</check> <!-- Close validation_only != true check -->
<output>Progress: {{success_count}} ✅ | {{failure_count}} ❌ | {{remaining}} pending</output>
</loop>
<action>Update progress file status to complete: .autonomous-epic-progress-epic-{{epic_num}}.yaml</action>
<action>Update progress file status to complete: {{progress_file_path}}</action>
</step>
<step n="4" goal="Epic completion and reporting">
@ -382,7 +326,7 @@
<action>Update sprint-status: epic-{{epic_num}} = "done"</action>
</check>
<action>Remove progress file: .autonomous-epic-progress-epic-{{epic_num}}.yaml</action>
<action>Remove progress file: {{progress_file_path}}</action>
</step>
</workflow>

View File

@ -1,7 +1,7 @@
name: autonomous-epic
description: "Autonomous epic processing using super-dev-pipeline - creates and develops all stories with anti-vibe-coding enforcement. Works for greenfield AND brownfield!"
description: "Autonomous epic processing using super-dev-pipeline - creates and develops all stories in an epic with minimal human intervention. Step-file architecture with smart batching!"
author: "BMad"
version: "2.0.0" # Upgraded to use super-dev-pipeline with step-file architecture
version: "3.0.0" # Upgraded to use super-dev-pipeline (works for both greenfield and brownfield)
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
@ -13,19 +13,18 @@ story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/autonomous-epic"
instructions: "{installed_path}/instructions.xml"
progress_file: "{story_dir}/.autonomous-epic-progress-epic-{epic_num}.yaml"
progress_file: "{story_dir}/.autonomous-epic-{epic_num}-progress.yaml"
# Variables
epic_num: "" # User provides or auto-discover next epic
sprint_status: "{implementation_artifacts}/sprint-status.yaml"
project_context: "**/project-context.md"
validation_only: false # NEW: If true, only validate/fix status, don't implement
# Autonomous mode settings
autonomous_settings:
# Use super-dev-pipeline: Step-file architecture that works for BOTH greenfield AND brownfield
use_super_dev_pipeline: true # Disciplined execution, no vibe coding
pipeline_mode: "batch" # Run workflows in batch mode (unattended)
use_super_dev_pipeline: true # Use super-dev-pipeline workflow (step-file architecture)
pipeline_mode: "batch" # Run super-dev-pipeline in batch mode (unattended)
halt_on_error: false # Continue even if story fails
max_retry_per_story: 2 # Retry failed stories
create_git_commits: true # Commit after each story (handled by super-dev-pipeline)
@ -34,42 +33,17 @@ autonomous_settings:
# super-dev-pipeline benefits
super_dev_pipeline_features:
token_efficiency: "40-60K per story (vs 100-150K for super-dev-story orchestration)"
works_for: "Both greenfield AND brownfield development"
anti_vibe_coding: "Step-file architecture prevents deviation at high token counts"
token_efficiency: "Step-file architecture prevents context bloat"
brownfield_support: "Works with existing codebases (unlike story-pipeline)"
includes:
- "Pre-gap analysis (validates against existing code)"
- "Adaptive implementation (TDD for new, refactor for existing)"
- "Post-implementation validation (catches false positives)"
- "Code review (adversarial, finds 3-10 issues)"
- "Completion (targeted commit + push)"
quality_gates: "All super-dev-story gates with disciplined execution"
brownfield_support: "Validates existing code before implementation"
# YOLO MODE CLARIFICATION
# YOLO mode ONLY means auto-approve prompts (answer "y", "Y", "C", "continue")
# YOLO mode does NOT mean: skip steps, skip workflows, or produce minimal output
# ALL steps, workflows, and verifications must still be fully executed
yolo_clarification:
auto_approve_prompts: true
skip_steps: false # NEVER - all steps must execute
skip_workflows: false # NEVER - invoke-workflow calls must execute
skip_verification: false # NEVER - all checks must pass
minimal_output: false # NEVER - full quality output required
# STORY QUALITY REQUIREMENTS
# These settings ensure create-story produces comprehensive story files
story_quality_requirements:
minimum_size_bytes: 4000 # Story files must be at least 4KB
enforce_minimum_size: true
required_sections:
- "## Tasks"
- "## Acceptance Criteria"
- "## Dev Notes"
- "Architecture Constraints"
- "Gap Analysis"
halt_on_quality_failure: true # Stop processing if story fails quality check
verify_file_exists: true # Verify story file was actually created on disk
- "Pre-gap analysis (understand what exists before starting)"
- "Smart batching (group related tasks)"
- "Implementation (systematic execution)"
- "Post-validation (verify changes work)"
- "Code review (adversarial, multi-agent)"
- "Completion (commit + push)"
quality_gates: "Same rigor as story-pipeline, works for brownfield"
checkpoint_resume: "Can resume from any step after failure"
# TASK-BASED COMPLETION SETTINGS (NEW)
# These settings ensure stories are truly complete, not just marked as such
@ -93,3 +67,5 @@ completion_verification:
strict_epic_completion: true
standalone: true
web_bundle: false

View File

@ -529,12 +529,29 @@
</check>
<check if="story key not found in sprint status">
<output>⚠️ Story file updated, but sprint-status update failed: {{story_key}} not found
<output>❌ CRITICAL: Story {{story_key}} not found in sprint-status.yaml!
Story status is set to "review" in file, but sprint-status.yaml may be out of sync.
This should NEVER happen - stories must be added during create-story workflow.
**HALTING** - sprint-status.yaml is out of sync and must be fixed.
</output>
<action>HALT - Cannot proceed without valid sprint tracking</action>
</check>
<!-- ENFORCEMENT: Validate sprint-status.yaml was actually updated -->
<action>Re-read {sprint_status} file to verify update persisted</action>
<action>Confirm {{story_key}} now shows status "review"</action>
<check if="verification fails">
<output>❌ CRITICAL: sprint-status.yaml update failed to persist!
Status was written but not saved correctly.
</output>
<action>HALT - File system issue or permission problem</action>
</check>
<output>✅ Verified: sprint-status.yaml updated successfully</output>
<!-- Final validation gates -->
<action if="any task is incomplete">HALT - Complete remaining tasks before marking ready for review</action>
<action if="regression failures exist">HALT - Fix regression issues before completing</action>

View File

@ -0,0 +1,306 @@
# Sprint Status Recovery - Instructions
**Workflow:** recover-sprint-status
**Purpose:** Fix sprint-status.yaml when tracking has drifted for days/weeks
---
## What This Workflow Does
Analyzes multiple sources to rebuild accurate sprint-status.yaml:
1. **Story File Quality** - Validates size (>=10KB), task lists, checkboxes
2. **Explicit Status: Fields** - Reads story Status: when present
3. **Git Commits** - Searches last 30 days for story references
4. **Autonomous Reports** - Checks .epic-*-completion-report.md files
5. **Task Completion Rate** - Analyzes checkbox completion in story files
**Infers Status Based On:**
- Explicit Status: field (highest priority)
- Git commits referencing story (strong signal)
- Autonomous completion reports (very high confidence)
- Task checkbox completion rate (90%+ = done)
- File quality (poor quality prevents "done" marking)
---
## Step 1: Run Recovery Analysis
```bash
Execute: {recovery_script} --dry-run
```
**This will:**
- Analyze all story files (quality, tasks, status)
- Search git commits for completion evidence
- Check autonomous completion reports
- Infer status from all evidence
- Report recommendations with confidence levels
**No changes** made in dry-run mode - just analysis.
---
## Step 2: Review Recommendations
**Check the output for:**
### High Confidence Updates (Safe)
- Stories with explicit Status: fields
- Stories in autonomous completion reports
- Stories with 3+ git commits + 90%+ tasks complete
### Medium Confidence Updates (Verify)
- Stories with 1-2 git commits
- Stories with 50-90% tasks complete
- Stories with file size >=10KB
### Low Confidence Updates (Question)
- Stories with no Status: field, no commits
- Stories with file size <10KB
- Stories with <5 tasks total
---
## Step 3: Choose Recovery Mode
### Conservative Mode (Safest)
```bash
Execute: {recovery_script} --conservative
```
**Only updates:**
- High/very high confidence stories
- Explicit Status: fields honored
- Git commits with 3+ references
- Won't infer or guess
**Best for:** Quick fixes, first-time recovery, risk-averse
---
### Aggressive Mode (Thorough)
```bash
Execute: {recovery_script} --aggressive --dry-run # Preview first!
Execute: {recovery_script} --aggressive # Then apply
```
**Updates:**
- Medium+ confidence stories
- Infers from git commits (even 1 commit)
- Uses task completion rate
- Pre-fills brownfield checkboxes
**Best for:** Major drift (30+ days), comprehensive recovery
---
### Interactive Mode (Recommended)
```bash
Execute: {recovery_script}
```
**Process:**
1. Shows all recommendations
2. Groups by confidence level
3. Asks for confirmation before each batch
4. Allows selective application
**Best for:** First-time use, learning the tool
---
## Step 4: Validate Results
```bash
Execute: ./scripts/sync-sprint-status.sh --validate
```
**Should show:**
- "✓ sprint-status.yaml is up to date!" (success)
- OR discrepancy count (if issues remain)
---
## Step 5: Commit Changes
```bash
git add docs/sprint-artifacts/sprint-status.yaml
git add .sprint-status-backups/ # Include backup for audit trail
git commit -m "fix(tracking): Recover sprint-status.yaml - {MODE} recovery"
```
---
## Recovery Scenarios
### Scenario 1: Autonomous Epic Completed, Tracking Not Updated
**Symptoms:**
- Autonomous completion report exists
- Git commits show work done
- sprint-status.yaml shows "in-progress" or "backlog"
**Solution:**
```bash
{recovery_script} --aggressive
# Will find completion report, mark all stories done
```
---
### Scenario 2: Manual Work Over Past Week Not Tracked
**Symptoms:**
- Story Status: fields updated to "done"
- sprint-status.yaml not synced
- Git commits exist
**Solution:**
```bash
./scripts/sync-sprint-status.sh
# Standard sync (reads Status: fields)
```
---
### Scenario 3: Story Files Missing Status: Fields
**Symptoms:**
- 100+ stories with no Status: field
- Some completed, some not
- No autonomous reports
**Solution:**
```bash
{recovery_script} --aggressive --dry-run # Preview inference
# Review recommendations carefully
{recovery_script} --aggressive # Apply if satisfied
```
---
### Scenario 4: Complete Chaos (Mix of All Above)
**Symptoms:**
- Some stories have Status:, some don't
- Autonomous reports for some epics
- Manual work on others
- sprint-status.yaml very outdated
**Solution:**
```bash
# Step 1: Run recovery in dry-run
{recovery_script} --aggressive --dry-run
# Step 2: Review /tmp/recovery_results.json
# Step 3: Apply in conservative mode first (safest updates)
{recovery_script} --conservative
# Step 4: Manually review remaining stories
# Update Status: fields for known completed work
# Step 5: Run sync to catch manual updates
./scripts/sync-sprint-status.sh
# Step 6: Final validation
./scripts/sync-sprint-status.sh --validate
```
---
## Quality Gates
**Recovery script will DOWNGRADE status if:**
- Story file < 10KB (not properly detailed)
- Story file has < 5 tasks (incomplete story)
- No git commits found (no evidence of work)
- Explicit Status: contradicts other evidence
**Recovery script will UPGRADE status if:**
- Autonomous completion report lists story as done
- 3+ git commits + 90%+ tasks checked
- Explicit Status: field says "done"
---
## Post-Recovery Checklist
After running recovery:
- [ ] Run validation: `./scripts/sync-sprint-status.sh --validate`
- [ ] Review backup: Check `.sprint-status-backups/` for before state
- [ ] Check epic statuses: Verify epic-level status matches story completion
- [ ] Spot-check 5-10 stories: Confirm inferred status is accurate
- [ ] Commit changes: Add recovery to version control
- [ ] Document issues: Note why drift occurred, prevent recurrence
---
## Preventing Future Drift
**After recovery:**
1. **Use workflows properly**
- `/create-story` - Adds to sprint-status.yaml automatically
- `/dev-story` - Updates both Status: and sprint-status.yaml
- Autonomous workflows - Now update tracking
2. **Run sync regularly**
- Weekly: `pnpm sync:sprint-status:dry-run` (check health)
- After manual Status: updates: `pnpm sync:sprint-status`
3. **CI/CD validation** (coming soon)
- Blocks PRs with out-of-sync tracking
- Forces sync before merge
---
## Troubleshooting
### "Recovery script shows 0 updates"
**Possible causes:**
- sprint-status.yaml already accurate
- Story files all have proper Status: fields
- No git commits found (check date range)
**Action:** Run `--dry-run` to see analysis, check `/tmp/recovery_results.json`
---
### "Low confidence on stories I know are done"
**Possible causes:**
- Story file < 10KB (not properly detailed)
- No git commits (work done outside git)
- No explicit Status: field
**Action:** Manually add Status: field to story, then run standard sync
---
### "Recovery marks incomplete stories as done"
**Possible causes:**
- Git commits exist but work abandoned
- Autonomous report lists story but implementation failed
- Tasks pre-checked incorrectly (brownfield error)
**Action:** Use conservative mode, manually verify, fix story files
---
## Output Files
**Created during recovery:**
- `.sprint-status-backups/sprint-status-recovery-{timestamp}.yaml` - Backup
- `/tmp/recovery_results.json` - Detailed analysis
- Updated `sprint-status.yaml` - Recovered status
---
**Last Updated:** 2026-01-02
**Status:** Production Ready
**Works On:** ANY BMAD project with sprint-status.yaml tracking

View File

@ -0,0 +1,30 @@
# Sprint Status Recovery Workflow
name: recover-sprint-status
description: "Recover sprint-status.yaml when tracking has drifted. Analyzes story files, git commits, and autonomous reports to rebuild accurate status."
author: "BMad"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
output_folder: "{config_source}:output_folder"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/recover-sprint-status"
instructions: "{installed_path}/instructions.md"
# Inputs
variables:
sprint_status_file: "{implementation_artifacts}/sprint-status.yaml"
story_directory: "{implementation_artifacts}"
recovery_mode: "interactive" # Options: interactive, conservative, aggressive
# Recovery script location
recovery_script: "{project-root}/scripts/recover-sprint-status.sh"
# Standalone so IDE commands get generated
standalone: true
# No web bundle needed
web_bundle: false

View File

@ -0,0 +1,158 @@
<workflow>
<critical>The workflow execution engine is governed by: {project-root}/_bmad/core/tasks/workflow.xml</critical>
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>This validates EVERY epic in the project - comprehensive health check</critical>
<step n="1" goal="Discover all epics">
<action>Load {{sprint_status_file}}</action>
<check if="file not found">
<output>❌ sprint-status.yaml not found
Run /bmad:bmm:workflows:sprint-planning first.
</output>
<action>HALT</action>
</check>
<action>Parse development_status section</action>
<action>Extract all epic keys (entries starting with "epic-")</action>
<action>Filter out retrospectives (ending with "-retrospective")</action>
<action>Store as {{epic_list}}</action>
<output>🔍 **Comprehensive Epic Validation**
Found {{epic_count}} epics to validate:
{{#each epic_list}}
- {{this}}
{{/each}}
Starting validation...
</output>
</step>
<step n="2" goal="Validate each epic">
<critical>Run validate-epic-status for EACH epic</critical>
<action>Initialize counters:
- total_stories_scanned = 0
- total_valid_stories = 0
- total_invalid_stories = 0
- total_updates_applied = 0
- epics_validated = []
</action>
<loop foreach="{{epic_list}}">
<action>Set {{current_epic}} = current loop item</action>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Validating {{current_epic}}...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
</output>
<!-- Use Python script for validation logic -->
<action>Execute validation script:
python3 scripts/lib/sprint-status-updater.py --epic {{current_epic}} --mode validate
</action>
<action>Parse script output:
- Story count
- Valid/invalid/missing counts
- Inferred statuses
- Updates needed
</action>
<check if="{{validation_mode}} == fix">
<action>Execute fix script:
python3 scripts/lib/sprint-status-updater.py --epic {{current_epic}} --mode fix
</action>
<action>Count updates applied</action>
<action>Add to total_updates_applied</action>
</check>
<action>Store validation results for {{current_epic}}</action>
<action>Increment totals</action>
<output>✓ {{current_epic}}: {{story_count}} stories, {{valid_count}} valid, {{updates_applied}} updates
</output>
</loop>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All Epics Validated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
</output>
</step>
<step n="3" goal="Consolidate and report">
<output>
📊 **COMPREHENSIVE VALIDATION RESULTS**
**Epics Validated:** {{epic_count}}
**Stories Analyzed:** {{total_stories_scanned}}
Valid: {{total_valid_stories}} (>=10KB, >=5 tasks)
Invalid: {{total_invalid_stories}} (<10KB or <5 tasks)
Missing: {{total_missing_files}}
**Updates Applied:** {{total_updates_applied}}
**Epic Status Summary:**
{{#each_epic_with_status}}
{{epic_key}}: {{status}} ({{done_count}}/{{total_count}} done)
{{/each}}
**Top Issues:**
{{#if_invalid_stories_exist}}
⚠️ {{total_invalid_stories}} stories need regeneration (/create-story)
{{/if}}
{{#if_missing_files_exist}}
⚠️ {{total_missing_files}} story files missing (create or remove from sprint-status.yaml)
{{/if}}
{{#if_conflicting_evidence}}
⚠️ {{conflict_count}} stories have conflicting evidence (manual review)
{{/if}}
**Health Score:** {{health_score}}/100
(100 = perfect, all stories valid with correct status)
</output>
<action>Write comprehensive report to {{default_output_file}}</action>
<output>💾 Full report: {{default_output_file}}</output>
</step>
<step n="4" goal="Provide actionable recommendations">
<output>
🎯 **RECOMMENDED ACTIONS**
{{#if_health_score_lt_80}}
**Priority 1: Fix Invalid Stories ({{total_invalid_stories}})**
{{#each_invalid_story}}
/create-story-with-gap-analysis # Regenerate {{story_id}}
{{/each}}
{{/if}}
{{#if_missing_files_gt_0}}
**Priority 2: Create Missing Story Files ({{total_missing_files}})**
{{#each_missing}}
/create-story # Create {{story_id}}
{{/each}}
{{/if}}
{{#if_health_score_gte_80}}
✅ **Sprint status is healthy!**
Continue with normal development:
/sprint-status # Check what's next
{{/if}}
**Maintenance:**
- Run /validate-all-epics weekly to catch drift
- After autonomous work, run validation
- Before sprint reviews, validate status accuracy
</output>
</step>
</workflow>

View File

@ -0,0 +1,30 @@
name: validate-all-epics
description: "Validate and fix sprint-status.yaml for ALL epics. Runs validate-epic-status on every epic in parallel, consolidates results, rebuilds accurate sprint-status.yaml."
author: "BMad"
version: "1.0.0"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-all-epics"
instructions: "{installed_path}/instructions.xml"
# Variables
variables:
sprint_status_file: "{implementation_artifacts}/sprint-status.yaml"
validation_mode: "fix" # Options: "report-only", "fix"
parallel_validation: true # Validate epics in parallel for speed
# Sub-workflow
validate_epic_workflow: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-epic-status/workflow.yaml"
# Output
default_output_file: "{story_dir}/.all-epics-validation-report.md"
standalone: true
web_bundle: false

View File

@ -0,0 +1,338 @@
<workflow>
<critical>The workflow execution engine is governed by: {project-root}/_bmad/core/tasks/workflow.xml</critical>
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>This is the COMPREHENSIVE AUDIT - validates all stories using Haiku agents</critical>
<critical>Cost: ~$76 for 511 stories with Haiku (vs $793 with Sonnet)</critical>
<step n="1" goal="Discover all story files">
<action>Find all .md files in {{story_dir}}</action>
<action>Filter out meta-documents:
- Files starting with "EPIC-" (completion reports)
- Files starting with "." (progress files)
- Files containing: COMPLETION, SUMMARY, REPORT, SESSION-, REVIEW-, README, INDEX
- Files like "atdd-checklist-", "gap-analysis-", "review-"
</action>
<check if="{{epic_filter}} provided">
<action>Filter to stories matching: {{epic_filter}}-*.md</action>
</check>
<action>Store as {{story_list}}</action>
<action>Count {{story_count}}</action>
<output>🔍 **Comprehensive Story Audit**
{{#if epic_filter}}**Epic Filter:** {{epic_filter}}{{else}}**Scope:** All epics{{/if}}
**Stories to Validate:** {{story_count}}
**Agent Model:** Haiku 4.5
**Batch Size:** {{batch_size}}
**Estimated Cost:** ~${{estimated_cost}} ({{story_count}} × $0.15/story)
**Estimated Time:** {{estimated_hours}} hours
Starting batch validation...
</output>
</step>
<step n="2" goal="Batch validate all stories">
<action>Initialize counters:
- stories_validated = 0
- verified_complete = 0
- needs_rework = 0
- false_positives = 0
- in_progress = 0
- total_false_positive_tasks = 0
- total_critical_issues = 0
</action>
<action>Split {{story_list}} into batches of {{batch_size}}</action>
<loop foreach="{{batches}}">
<action>Set {{current_batch}} = current batch</action>
<action>Set {{batch_number}} = loop index + 1</action>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Batch {{batch_number}}/{{total_batches}} ({{batch_size}} stories)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
</output>
<!-- Validate each story in batch -->
<loop foreach="{{current_batch}}">
<action>Set {{story_file}} = current story path</action>
<action>Extract {{story_id}} from filename</action>
<output>{{stories_validated + 1}}/{{story_count}}: Validating {{story_id}}...</output>
<!-- Invoke validate-story-deep workflow -->
<invoke-workflow path="{{validate_story_workflow}}">
<input name="story_file" value="{{story_file}}" />
</invoke-workflow>
<action>Parse validation results:
- category (VERIFIED_COMPLETE, FALSE_POSITIVE, etc.)
- verification_score
- false_positive_count
- false_negative_count
- critical_issues_count
</action>
<action>Store results for {{story_id}}</action>
<action>Increment counters based on category</action>
<output> → {{category}} (Score: {{verification_score}}/100{{#if false_positives > 0}}, {{false_positives}} false positives{{/if}})</output>
<action>Increment stories_validated</action>
</loop>
<output>Batch {{batch_number}} complete. {{stories_validated}}/{{story_count}} total validated.</output>
<!-- Save progress after each batch -->
<action>Write progress to {{progress_file}}:
- stories_validated
- current_batch
- results_so_far
</action>
</loop>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All Stories Validated
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Total Validated:** {{story_count}}
**Total Tasks Checked:** {{total_tasks_verified}}
</output>
</step>
<step n="3" goal="Consolidate results and calculate platform health">
<action>Calculate platform-wide metrics:
- Overall health score: (verified_complete / story_count) × 100
- False positive rate: (false_positive_stories / story_count) × 100
- Total rework estimate: false_positive_stories × 3h + needs_rework × 2h
</action>
<action>Group results by epic</action>
<action>Identify worst offenders (highest false positive rates)</action>
<output>
📊 **PLATFORM HEALTH ASSESSMENT**
**Overall Health Score:** {{health_score}}/100
**Story Categories:**
- ✅ VERIFIED_COMPLETE: {{verified_complete}} ({{verified_complete_pct}}%)
- ⚠️ NEEDS_REWORK: {{needs_rework}} ({{needs_rework_pct}}%)
- ❌ FALSE_POSITIVES: {{false_positives}} ({{false_positives_pct}}%)
- 🔄 IN_PROGRESS: {{in_progress}} ({{in_progress_pct}}%)
**Task-Level Issues:**
- False positive tasks: {{total_false_positive_tasks}}
- CRITICAL code quality issues: {{total_critical_issues}}
**Estimated Rework:** {{total_rework_hours}} hours
**Epic Breakdown:**
{{#each epic_summary}}
- Epic {{this.epic}}: {{this.health_score}}/100 ({{this.false_positives}} false positives)
{{/each}}
**Worst Offenders (Most False Positives):**
{{#each worst_offenders limit=10}}
- {{this.story_id}}: {{this.false_positive_count}} tasks, score {{this.score}}/100
{{/each}}
</output>
</step>
<step n="4" goal="Generate comprehensive audit report">
<template-output>
# Comprehensive Platform Audit Report
**Generated:** {{date}}
**Stories Validated:** {{story_count}}
**Agent Model:** Haiku 4.5
**Total Cost:** ~${{actual_cost}}
---
## Executive Summary
**Platform Health Score:** {{health_score}}/100
{{#if health_score >= 90}}
✅ **EXCELLENT** - Platform is production-ready with high confidence
{{else if health_score >= 75}}
⚠️ **GOOD** - Minor issues to address, generally solid
{{else if health_score >= 60}}
⚠️ **NEEDS WORK** - Significant rework required before production
{{else}}
❌ **CRITICAL** - Major quality issues found, not production-ready
{{/if}}
**Key Findings:**
- {{verified_complete}} stories verified complete ({{verified_complete_pct}}%)
- {{false_positives}} stories are false positives ({{false_positives_pct}}%)
- {{total_false_positive_tasks}} tasks claimed done but not implemented
- {{total_critical_issues}} CRITICAL code quality issues found
---
## ❌ False Positive Stories ({{false_positives}} total)
**These stories are marked "done" but have significant missing/stubbed code:**
{{#each false_positive_stories}}
### {{this.story_id}} (Score: {{this.score}}/100)
**Current Status:** {{this.current_status}}
**Should Be:** in-progress or ready-for-dev
**Missing/Stubbed:**
{{#each this.false_positive_tasks}}
- {{this.task}}
- {{this.evidence}}
{{/each}}
**Estimated Fix:** {{this.estimated_hours}}h
---
{{/each}}
**Total Rework:** {{false_positive_rework_hours}} hours
---
## ⚠️ Stories Needing Rework ({{needs_rework}} total)
{{#each needs_rework_stories}}
### {{this.story_id}} (Score: {{this.score}}/100)
**Issues:**
- {{this.false_positive_count}} incomplete tasks
- {{this.critical_issues}} CRITICAL quality issues
- {{this.high_issues}} HIGH priority issues
**Top Issues:**
{{#each this.top_issues limit=5}}
- {{this}}
{{/each}}
---
{{/each}}
**Total Rework:** {{needs_rework_hours}} hours
---
## ✅ Verified Complete Stories ({{verified_complete}} total)
**These stories are production-ready with verified code:**
{{#each verified_complete_stories}}
- {{this.story_id}} ({{this.score}}/100)
{{/each}}
---
## 📊 Epic Health Breakdown
{{#each epic_summary}}
### Epic {{this.epic}}
**Stories:** {{this.total}}
**Verified Complete:** {{this.verified}} ({{this.verified_pct}}%)
**False Positives:** {{this.false_positives}}
**Needs Rework:** {{this.needs_rework}}
**Health Score:** {{this.health_score}}/100
{{#if this.health_score < 70}}
⚠️ **ATTENTION NEEDED** - This epic has quality issues
{{/if}}
**Top Issues:**
{{#each this.top_issues limit=3}}
- {{this}}
{{/each}}
---
{{/each}}
---
## 🎯 Recommended Action Plan
### Phase 1: Fix False Positives (CRITICAL - {{false_positive_rework_hours}}h)
{{#each false_positive_stories limit=20}}
{{@index + 1}}. **{{this.story_id}}** ({{this.estimated_hours}}h)
- {{this.false_positive_count}} tasks to implement
- Update status to in-progress
{{/each}}
{{#if false_positives > 20}}
... and {{false_positives - 20}} more (see full list above)
{{/if}}
### Phase 2: Address Rework Items (HIGH - {{needs_rework_hours}}h)
{{#each needs_rework_stories limit=10}}
{{@index + 1}}. **{{this.story_id}}** ({{this.estimated_hours}}h)
- Fix {{this.critical_issues}} CRITICAL issues
- Complete {{this.false_positive_count}} tasks
{{/each}}
### Phase 3: Fix False Negatives (LOW - batch update)
- {{total_false_negative_tasks}} unchecked tasks that are actually complete
- Can batch update checkboxes (low priority)
---
## 💰 Audit Cost Analysis
**This Validation Run:**
- Stories validated: {{story_count}}
- Agent sessions: {{story_count}} (one Haiku agent per story)
- Tokens used: ~{{tokens_used_millions}}M
- Cost: ~${{actual_cost}}
**Remediation Cost:**
- Estimated hours: {{total_rework_hours}}h
- At AI velocity: {{ai_velocity_days}} days of work
- Token cost: ~${{remediation_token_cost}}
**Total Investment:** ${{actual_cost}} (audit) + ${{remediation_token_cost}} (fixes) = ${{total_cost}}
---
## 📅 Next Steps
1. **Immediate:** Fix {{false_positives}} false positive stories
2. **This Week:** Address {{total_critical_issues}} CRITICAL issues
3. **Next Week:** Rework {{needs_rework}} stories
4. **Ongoing:** Re-validate fixed stories to confirm
**Commands:**
```bash
# Validate specific story
/validate-story-deep docs/sprint-artifacts/16e-6-ecs-task-definitions-tier3.md
# Validate specific epic
/validate-all-stories-deep --epic 16e
# Re-run full audit (after fixes)
/validate-all-stories-deep
```
---
**Report Generated By:** validate-all-stories-deep workflow
**Validation Method:** LLM-powered (Haiku 4.5 agents read actual code)
**Confidence Level:** Very High (code-based verification, not regex patterns)
</template-output>
</step>
</workflow>

View File

@ -0,0 +1,36 @@
name: validate-all-stories-deep
description: "Comprehensive platform audit using Haiku agents. Validates ALL stories by reading actual code. The bulletproof validation for production readiness."
author: "BMad"
version: "1.0.0"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-all-stories-deep"
instructions: "{installed_path}/instructions.xml"
# Input variables
variables:
epic_filter: "" # Optional: Only validate specific epic (e.g., "16e")
batch_size: 5 # Validate 5 stories at a time (prevents spawning 511 agents at once!)
concurrent_limit: 5 # Max 5 agents running concurrently
auto_fix: false # If true, auto-update statuses based on validation
pause_between_batches: 30 # Seconds to wait between batches (rate limiting)
# Sub-workflow
validate_story_workflow: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-story-deep/workflow.yaml"
# Agent configuration
agent_model: "haiku" # Cost: ~$66 for 511 stories vs $793 with Sonnet
# Output
default_output_file: "{story_dir}/.comprehensive-audit-{date}.md"
progress_file: "{story_dir}/.validation-progress-{date}.yaml"
standalone: true
web_bundle: false

View File

@ -0,0 +1,411 @@
<workflow>
<critical>The workflow execution engine is governed by: {project-root}/_bmad/core/tasks/workflow.xml</critical>
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>This is the COMPREHENSIVE AUDIT - validates every story's tasks against actual codebase</critical>
<step n="1" goal="Discover and categorize stories">
<action>Find all story files in {{story_dir}}</action>
<action>Filter out meta-documents:
- Files starting with "EPIC-" (completion reports)
- Files with "COMPLETION", "SUMMARY", "REPORT" in name
- Files starting with "." (hidden progress files)
- Files like "README", "INDEX", "SESSION-", "REVIEW-"
</action>
<check if="{{epic_filter}} provided">
<action>Filter to stories starting with {{epic_filter}}- (e.g., "16e-")</action>
</check>
<action>Store as {{story_list}}</action>
<action>Count {{story_count}}</action>
<output>🔍 **Comprehensive Story Validation**
{{#if epic_filter}}
**Epic Filter:** {{epic_filter}} only
{{/if}}
**Stories to Validate:** {{story_count}}
**Validation Depth:** {{validation_depth}}
**Parallel Mode:** {{parallel_validation}}
**Estimated Time:** {{estimated_minutes}} minutes
**Estimated Cost:** ~${{estimated_cost}} ({{story_count}} × ~$0.50/story)
This will:
1. Verify all tasks against actual codebase (task-verification-engine.py)
2. Run code quality reviews on files with issues
3. Check for regressions and integration failures
4. Categorize stories: VERIFIED_COMPLETE, NEEDS_REWORK, FALSE_POSITIVE, etc.
5. Generate comprehensive audit report
Starting validation...
</output>
</step>
<step n="2" goal="Run task verification on all stories">
<action>Initialize counters:
- stories_validated = 0
- verified_complete = 0
- needs_rework = 0
- false_positives = 0
- in_progress = 0
- total_false_positive_tasks = 0
- total_tasks_verified = 0
</action>
<loop foreach="{{story_list}}">
<action>Set {{current_story}} = current story file</action>
<action>Extract {{story_id}} from filename</action>
<output>Validating {{counter}}/{{story_count}}: {{story_id}}...</output>
<!-- Run task verification engine -->
<action>Execute: python3 {{task_verification_script}} {{current_story}}</action>
<action>Parse output:
- total_tasks
- checked_tasks
- false_positives
- false_negatives
- verification_score
- task_details (with evidence)
</action>
<action>Categorize story:
IF verification_score >= 95 AND false_positives == 0
→ category = "VERIFIED_COMPLETE"
ELSE IF verification_score >= 80 AND false_positives <= 2
→ category = "COMPLETE_WITH_MINOR_ISSUES"
ELSE IF false_positives > 5 OR verification_score < 50
→ category = "FALSE_POSITIVE" (claimed done but missing code)
ELSE IF verification_score < 80
→ category = "NEEDS_REWORK"
ELSE IF checked_tasks == 0
→ category = "NOT_STARTED"
ELSE
→ category = "IN_PROGRESS"
</action>
<action>Store result:
- story_id
- verification_score
- category
- false_positive_count
- false_negative_count
- current_status (from sprint-status.yaml)
- recommended_status
</action>
<action>Increment counters based on category</action>
<action>Add false_positive_count to total</action>
<action>Add total_tasks to total_tasks_verified</action>
<output> → {{category}} ({{verification_score}}/100, {{false_positives}} false positives)</output>
</loop>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Validation Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Stories Validated:** {{story_count}}
**Total Tasks Verified:** {{total_tasks_verified}}
**Total False Positives:** {{total_false_positive_tasks}}
</output>
</step>
<step n="3" goal="Code quality review on problem stories" if="{{validation_depth}} == deep OR comprehensive">
<action>Filter stories where:
- category = "FALSE_POSITIVE" OR
- category = "NEEDS_REWORK" OR
- false_positives > 3
</action>
<action>Count {{problem_story_count}}</action>
<check if="{{problem_story_count}} > 0">
<output>
🛡️ **Code Quality Review**
Found {{problem_story_count}} stories with quality issues.
Running multi-agent review on files from these stories...
</output>
<loop foreach="{{problem_stories}}">
<action>Extract file list from story Dev Agent Record</action>
<check if="files exist">
<action>Run /multi-agent-review on files:
- Security audit
- Silent failure detection
- Architecture compliance
- Type safety check
</action>
<action>Categorize review findings by severity</action>
<action>Add to story's issue list</action>
</check>
</loop>
</check>
<check if="{{problem_story_count}} == 0">
<output>✅ No problem stories found - all code quality looks good!</output>
</check>
</step>
<step n="4" goal="Integration verification" if="{{validation_depth}} == comprehensive">
<output>
🔗 **Integration Verification**
Checking for regressions and broken dependencies...
</output>
<action>For stories marked "VERIFIED_COMPLETE":
1. Extract service dependencies from story
2. Check if dependent services still exist
3. Run integration tests if they exist
4. Check for API contract breaking changes
</action>
<action>Detect overlaps:
- Multiple stories implementing same feature
- Duplicate files created
- Conflicting implementations
</action>
<output>
**Regressions Found:** {{regression_count}}
**Overlaps Detected:** {{overlap_count}}
**Integration Tests:** {{integration_tests_run}} ({{integration_tests_passing}} passing)
</output>
</step>
<step n="5" goal="Generate comprehensive report">
<template-output>
# Comprehensive Story Validation Report
**Generated:** {{date}}
**Stories Validated:** {{story_count}}
**Validation Depth:** {{validation_depth}}
**Epic Filter:** {{epic_filter}} {{#if_no_filter}}(all epics){{/if}}
---
## Executive Summary
**Overall Health Score:** {{overall_health_score}}/100
**Story Categories:**
- ✅ **VERIFIED_COMPLETE:** {{verified_complete}} ({{verified_complete_pct}}%)
- ⚠️ **NEEDS_REWORK:** {{needs_rework}} ({{needs_rework_pct}}%)
- ❌ **FALSE_POSITIVES:** {{false_positives}} ({{false_positives_pct}}%)
- 🔄 **IN_PROGRESS:** {{in_progress}} ({{in_progress_pct}}%)
- 📋 **NOT_STARTED:** {{not_started}} ({{not_started_pct}}%)
**Task Verification:**
- Total tasks verified: {{total_tasks_verified}}
- False positive tasks: {{total_false_positive_tasks}} ({{false_positive_rate}}%)
- False negative tasks: {{total_false_negative_tasks}}
**Code Quality:**
- CRITICAL issues: {{critical_issues_total}}
- HIGH issues: {{high_issues_total}}
- Files reviewed: {{files_reviewed}}
---
## ❌ False Positive Stories (Claimed Done, Not Implemented)
{{#each false_positive_stories}}
### {{this.story_id}} (Score: {{this.verification_score}}/100)
**Current Status:** {{this.current_status}}
**Recommended:** in-progress or ready-for-dev
**Issues:**
{{#each this.false_positive_tasks}}
- [ ] {{this.task}}
- Evidence: {{this.evidence}}
{{/each}}
**Action Required:**
- Uncheck {{this.false_positive_count}} tasks
- Implement missing code
- Update sprint-status.yaml to in-progress
{{/each}}
**Total:** {{false_positive_stories_count}} stories
---
## ⚠️ Stories Needing Rework
{{#each needs_rework_stories}}
### {{this.story_id}} (Score: {{this.verification_score}}/100)
**Issues:**
- {{this.false_positive_count}} false positive tasks
- {{this.critical_issue_count}} CRITICAL code quality issues
- {{this.high_issue_count}} HIGH priority issues
**Recommended:**
1. Fix CRITICAL issues first
2. Implement {{this.false_positive_count}} missing tasks
3. Re-run validation
{{/each}}
**Total:** {{needs_rework_count}} stories
---
## ✅ Verified Complete Stories
{{#each verified_complete_stories}}
- {{this.story_id}} ({{this.verification_score}}/100)
{{/each}}
**Total:** {{verified_complete_count}} stories (production-ready)
---
## 📊 Epic Breakdown
{{#each epic_summary}}
### Epic {{this.epic_num}}
**Stories:** {{this.total_count}}
**Verified Complete:** {{this.verified_count}} ({{this.verified_pct}}%)
**False Positives:** {{this.false_positive_count}}
**Needs Rework:** {{this.needs_rework_count}}
**Health Score:** {{this.health_score}}/100
{{/each}}
---
## 🎯 Recommended Actions
### Immediate (CRITICAL)
{{#if false_positive_stories_count > 0}}
**Fix {{false_positive_stories_count}} False Positive Stories:**
{{#each false_positive_stories limit=10}}
1. {{this.story_id}}: Update status to in-progress, implement {{this.false_positive_count}} missing tasks
{{/each}}
{{#if false_positive_stories_count > 10}}
... and {{false_positive_stories_count - 10}} more (see full list above)
{{/if}}
{{/if}}
### Short-term (HIGH Priority)
{{#if needs_rework_count > 0}}
**Address {{needs_rework_count}} Stories Needing Rework:**
- Fix {{critical_issues_total}} CRITICAL code quality issues
- Implement missing tasks
- Re-validate after fixes
{{/if}}
### Maintenance (MEDIUM Priority)
{{#if false_negative_count > 0}}
**Update {{false_negative_count}} False Negative Tasks:**
- Mark complete (code exists but checkbox unchecked)
- Low impact, can batch update
{{/if}}
---
## 💰 Cost Analysis
**Validation Run:**
- Stories validated: {{story_count}}
- API tokens used: ~{{tokens_used}}K
- Cost: ~${{cost}}
**Remediation Estimate:**
- False positives: {{false_positive_stories_count}} × 3h = {{remediation_hours_fp}}h
- Needs rework: {{needs_rework_count}} × 2h = {{remediation_hours_rework}}h
- **Total:** {{total_remediation_hours}}h estimated work
---
## 📅 Next Steps
1. **Fix false positive stories** ({{false_positive_stories_count}} stories)
2. **Address CRITICAL issues** ({{critical_issues_total}} issues)
3. **Re-run validation** on fixed stories
4. **Update sprint-status.yaml** with verified statuses
5. **Run weekly validation** to prevent future drift
---
**Generated by:** /validate-all-stories workflow
**Validation Engine:** task-verification-engine.py v2.0
**Multi-Agent Review:** {{multi_agent_review_enabled}}
</template-output>
</step>
<step n="6" goal="Auto-fix if enabled" if="{{fix_mode}} == true">
<output>
🔧 **Auto-Fix Mode Enabled**
Applying automatic fixes:
1. Update false negative checkboxes (code exists → mark [x])
2. Update sprint-status.yaml with verified statuses
3. Add validation scores to story files
</output>
<loop foreach="{{false_negative_tasks_list}}">
<action>Update story file: Change [ ] to [x] for verified tasks</action>
<output> ✓ {{story_id}}: Checked {{task_count}} false negative tasks</output>
</loop>
<loop foreach="{{status_updates_list}}">
<action>Update sprint-status.yaml using sprint-status-updater.py</action>
<output> ✓ {{story_id}}: {{old_status}} → {{new_status}}</output>
</loop>
<output>
✅ Auto-fix complete
- {{false_negatives_fixed}} tasks checked
- {{statuses_updated}} story statuses updated
</output>
</step>
<step n="7" goal="Summary and recommendations">
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
COMPREHENSIVE VALIDATION COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Overall Health:** {{overall_health_score}}/100
{{#if overall_health_score >= 90}}
✅ **EXCELLENT** - Platform is production-ready
{{else if overall_health_score >= 75}}
⚠️ **GOOD** - Minor issues to address before production
{{else if overall_health_score >= 60}}
⚠️ **NEEDS WORK** - Significant rework required
{{else}}
❌ **CRITICAL** - Major quality issues found
{{/if}}
**Top Priorities:**
1. Fix {{false_positive_stories_count}} false positive stories
2. Address {{critical_issues_total}} CRITICAL code quality issues
3. Complete {{in_progress_count}} in-progress stories
4. Re-validate after fixes
**Full Report:** {{default_output_file}}
**Summary JSON:** {{validation_summary_file}}
**Next Command:**
/validate-story <story-id> # Deep-dive on specific story
/validate-all-stories --epic 16e # Re-validate specific epic
</output>
</step>
</workflow>

View File

@ -0,0 +1,36 @@
name: validate-all-stories
description: "Comprehensive audit of ALL stories: verify tasks against codebase, run code quality reviews, check integrations. The bulletproof audit for production readiness."
author: "BMad"
version: "1.0.0"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-all-stories"
instructions: "{installed_path}/instructions.xml"
# Input variables
variables:
validation_depth: "deep" # Options: "quick" (tasks only), "deep" (tasks + review), "comprehensive" (full integration)
parallel_validation: true # Run story validations in parallel for speed
fix_mode: false # If true, auto-fix false negatives and update statuses
epic_filter: "" # Optional: Only validate stories from specific epic (e.g., "16e")
# Tools
task_verification_script: "{project-root}/scripts/lib/task-verification-engine.py"
sprint_status_updater: "{project-root}/scripts/lib/sprint-status-updater.py"
# Sub-workflow
validate_story_workflow: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-story/workflow.yaml"
# Output
default_output_file: "{story_dir}/.comprehensive-validation-report-{date}.md"
validation_summary_file: "{story_dir}/.validation-summary-{date}.json"
standalone: true
web_bundle: false

View File

@ -0,0 +1,302 @@
<workflow>
<critical>The workflow execution engine is governed by: {project-root}/_bmad/core/tasks/workflow.xml</critical>
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>This is VALIDATION-ONLY mode - NO implementation, only status correction</critical>
<critical>Uses same logic as autonomous-epic but READS instead of WRITES code</critical>
<step n="1" goal="Validate inputs and load epic">
<action>Check if {{epic_num}} was provided</action>
<check if="{{epic_num}} is empty">
<ask>Which epic should I validate? (e.g., 19, 16d, 16e, 9b)</ask>
<action>Store response as {{epic_num}}</action>
</check>
<action>Load {{sprint_status_file}}</action>
<check if="file not found">
<output>❌ sprint-status.yaml not found at: {{sprint_status_file}}
Run /bmad:bmm:workflows:sprint-planning to create it first.
</output>
<action>HALT</action>
</check>
<action>Search for epic-{{epic_num}} entry in sprint_status_file</action>
<action>Extract all story entries for epic-{{epic_num}} (pattern: {{epic_num}}-*)</action>
<action>Count stories found in sprint-status.yaml for this epic</action>
<output>🔍 **Validating Epic {{epic_num}}**
Found {{story_count}} stories in sprint-status.yaml
Scanning story files for REALITY check...
</output>
</step>
<step n="2" goal="Scan and validate all story files">
<critical>This is where we determine TRUTH - not from status fields, but from actual file analysis</critical>
<action>For each story in epic (from sprint-status.yaml):
1. Build story file path: {{story_dir}}/{{story_key}}.md
2. Check if file exists
3. If exists, read FULL file
4. Analyze file content
</action>
<action>For each story file, extract:
- File size in KB
- Total task count (count all "- [ ]" and "- [x]" lines)
- Checked task count (count "- [x]" lines)
- Completion rate (checked / total * 100)
- Explicit Status: field (if present)
- Has proper BMAD structure (12 sections)
- Section count (count ## headings)
</action>
<output>📊 **Story File Quality Analysis**
Analyzing {{story_count}} story files...
</output>
<action>For each story, classify quality:
VALID:
- File size >= 10KB
- Total tasks >= 5
- Has task list structure
INVALID:
- File size < 10KB (incomplete story)
- Total tasks < 5 (not detailed enough)
- File missing entirely
</action>
<action>Store results as {{story_quality_map}}</action>
<output>Quality Summary:
Valid stories: {{valid_count}}/{{story_count}}
Invalid stories: {{invalid_count}}
Missing files: {{missing_count}}
</output>
</step>
<step n="3" goal="Cross-reference git commits">
<action>Run git log to find commits mentioning epic stories:
Command: git log --oneline --since={{git_commit_lookback_days}} days ago
</action>
<action>Parse commit messages for story IDs matching pattern: {{epic_num}}-\d+[a-z]?</action>
<action>Build map of story_id → commit_count</action>
<output>Git Commit Evidence:
Stories with commits: {{stories_with_commits_count}}
Stories without commits: {{stories_without_commits_count}}
</output>
</step>
<step n="4" goal="Check autonomous completion reports">
<action>Search {{story_dir}} for files:
- .epic-{{epic_num}}-completion-report.md
- .autonomous-epic-{{epic_num}}-progress.yaml
</action>
<check if="autonomous report found">
<action>Parse completed_stories list from progress file OR
Parse ✅ story entries from completion report</action>
<action>Store as {{autonomous_completed_stories}}</action>
<output>📋 Autonomous Report Found:
{{autonomous_completed_count}} stories marked complete
</output>
</check>
<check if="no autonomous report">
<output> No autonomous completion report found (manual epic)</output>
</check>
</step>
<step n="5" goal="Infer correct status for each story">
<critical>Use MULTIPLE sources of truth, not just Status: field</critical>
<action>For each story in epic, determine correct status using this logic:</action>
<logic>
Priority 1: Autonomous completion report
IF story in autonomous_completed_stories
→ Status = "done" (VERY HIGH confidence)
Priority 2: Task completion rate + file quality
IF completion_rate >= 90% AND file is VALID (>10KB, >5 tasks)
→ Status = "done" (HIGH confidence)
IF completion_rate 50-89% AND file is VALID
→ Status = "in-progress" (MEDIUM confidence)
IF completion_rate < 50% AND file is VALID
→ Status = "ready-for-dev" (MEDIUM confidence)
Priority 3: Explicit Status: field (if no other evidence)
IF Status: field exists AND matches above inferences
→ Use it (MEDIUM confidence)
IF Status: field conflicts with task completion
→ Prefer task completion (tasks are ground truth)
Priority 4: Git commits (supporting evidence)
IF 3+ commits + task completion >=90%
→ Upgrade confidence to VERY HIGH
IF 1-2 commits but task completion <50%
→ Status = "in-progress" (work started but not done)
Quality Gates:
IF file size < 10KB OR total tasks < 5
→ DOWNGRADE status (can't be "done" if file is incomplete)
→ Mark as "ready-for-dev" (story needs proper creation)
→ Flag for regeneration with /create-story
Missing Files:
IF story file doesn't exist
→ Status = "backlog" (story not created yet)
</logic>
<action>Build map of story_id → inferred_status with evidence and confidence</action>
<output>📊 **Status Inference Complete**
Stories to update:
{{#each_story_needing_update}}
{{story_id}}:
Current: {{current_status_in_yaml}}
Inferred: {{inferred_status}}
Confidence: {{confidence}}
Evidence: {{evidence_summary}}
Quality: {{file_size_kb}}KB, {{total_tasks}} tasks, {{completion_rate}}% done
{{/each}}
</output>
</step>
<step n="6" goal="Apply updates or report findings">
<check if="{{validation_mode}} == report-only">
<output>📝 **REPORT-ONLY MODE** - No changes will be made
Recommendations saved to: {{default_output_file}}
</output>
<action>Write detailed report to {{default_output_file}}</action>
<action>EXIT workflow</action>
</check>
<check if="{{validation_mode}} == fix OR {{validation_mode}} == strict">
<output>🔧 **FIX MODE** - Updating sprint-status.yaml...
Backing up to: .sprint-status-backups/
</output>
<action>Create backup of {{sprint_status_file}}</action>
<action>For each story needing update:
1. Find story entry in development_status section
2. Update status to inferred_status
3. Add comment: "✅ Validated {{date}} - {{evidence_summary}}"
4. Preserve all other content and structure
</action>
<action>Update epic-{{epic_num}} status based on story completion:
IF all stories have status "done" AND all are valid files
→ epic status = "done"
IF any stories "in-progress" OR "review"
→ epic status = "in-progress"
IF all stories "backlog" OR "ready-for-dev"
→ epic status = "backlog"
</action>
<action>Update last_verified timestamp in header</action>
<action>Save {{sprint_status_file}}</action>
<output>✅ **sprint-status.yaml Updated**
Applied {{updates_count}} story status corrections
Epic {{epic_num}}: {{old_epic_status}} → {{new_epic_status}}
Backup: {{backup_path}}
</output>
</check>
</step>
<step n="7" goal="Identify problem stories requiring action">
<action>Flag stories with issues:
- Missing story files (in sprint-status.yaml but no .md file)
- Invalid files (< 10KB or < 5 tasks)
- Conflicting evidence (Status: says done, tasks unchecked)
- Poor quality (no BMAD sections)
</action>
<output>⚠️ **Problem Stories Requiring Attention:**
{{#if_missing_files}}
**Missing Files ({{missing_count}}):**
{{#each_missing}}
- {{story_id}}: Referenced in sprint-status.yaml but file not found
Action: Run /create-story OR remove from sprint-status.yaml
{{/each}}
{{/if}}
{{#if_invalid_quality}}
**Invalid Quality ({{invalid_count}}):**
{{#each_invalid}}
- {{story_id}}: {{file_size_kb}}KB, {{total_tasks}} tasks
Action: Regenerate with /create-story-with-gap-analysis
{{/each}}
{{/if}}
{{#if_conflicting_evidence}}
**Conflicting Evidence ({{conflict_count}}):**
{{#each_conflict}}
- {{story_id}}: Status: says "{{status_field}}" but {{completion_rate}}% tasks checked
Action: Manual review recommended
{{/each}}
{{/if}}
</output>
</step>
<step n="8" goal="Report results and recommendations">
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Epic {{epic_num}} Validation Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Epic Status:** {{epic_status}}
**Stories:**
Done: {{done_count}}
In-Progress: {{in_progress_count}}
Review: {{review_count}}
Ready-for-Dev: {{ready_count}}
Backlog: {{backlog_count}}
**Quality:**
Valid: {{valid_count}} (>=10KB, >=5 tasks)
Invalid: {{invalid_count}} (poor quality)
Missing: {{missing_count}} (file not found)
**Updates Applied:** {{updates_count}}
**Next Steps:**
{{#if_invalid_count_gt_0}}
1. Regenerate {{invalid_count}} invalid stories with /create-story
{{/if}}
{{#if_missing_count_gt_0}}
2. Create {{missing_count}} missing story files OR remove from sprint-status.yaml
{{/if}}
{{#if_done_count_eq_story_count}}
3. Epic complete! Consider running /retrospective
{{/if}}
{{#if_in_progress_count_gt_0}}
3. Continue with in-progress stories: /dev-story {{first_in_progress}}
{{/if}}
</output>
<output>💾 Detailed report saved to: {{default_output_file}}</output>
</step>
</workflow>

View File

@ -0,0 +1,34 @@
name: validate-epic-status
description: "Validate and fix sprint-status.yaml for a single epic. Scans story files for task completion, validates quality (>10KB, proper tasks), checks git commits, updates sprint-status.yaml to match REALITY."
author: "BMad"
version: "1.0.0"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-epic-status"
instructions: "{installed_path}/instructions.xml"
# Inputs
variables:
epic_num: "" # User provides (e.g., "19", "16d", "16e")
sprint_status_file: "{implementation_artifacts}/sprint-status.yaml"
validation_mode: "fix" # Options: "report-only", "fix", "strict"
# Validation criteria
validation_rules:
min_story_size_kb: 10 # Stories should be >= 10KB
min_tasks_required: 5 # Stories should have >= 5 tasks
completion_threshold: 90 # 90%+ tasks checked = "done"
git_commit_lookback_days: 30 # Search last 30 days for commits
# Output
default_output_file: "{story_dir}/.epic-{epic_num}-validation-report.md"
standalone: true
web_bundle: false

View File

@ -0,0 +1,370 @@
<workflow>
<critical>The workflow execution engine is governed by: {project-root}/_bmad/core/tasks/workflow.xml</critical>
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>This uses HAIKU AGENTS to read actual code and verify task completion - NOT regex patterns</critical>
<step n="1" goal="Load and parse story">
<action>Load story file from {{story_file}}</action>
<check if="file not found">
<output>❌ Story file not found: {{story_file}}</output>
<action>HALT</action>
</check>
<action>Extract story metadata:
- Story ID from filename
- Epic number from "Epic:" field
- Current status from "Status:" or "**Status:**" field
- Files created/modified from Dev Agent Record section
</action>
<action>Extract ALL tasks (pattern: "- [ ]" or "- [x]"):
- Parse checkbox state (checked/unchecked)
- Extract task text
- Count total, checked, unchecked
</action>
<output>📋 **Deep Story Validation: {{story_id}}**
**Epic:** {{epic_num}}
**Current Status:** {{current_status}}
**Tasks:** {{checked_count}}/{{total_count}} checked
**Files Referenced:** {{file_count}}
**Validation Method:** Haiku agents read actual code
**Cost Estimate:** ~$0.13 for this story
Starting task-by-task verification...
</output>
</step>
<step n="2" goal="Verify ALL tasks with single Haiku agent">
<critical>Spawn ONE Haiku agent to verify ALL tasks (avoids 50x agent startup overhead!)</critical>
<output>Spawning Haiku verification agent for {{total_count}} tasks...</output>
<!-- Spawn SINGLE Haiku agent to verify ALL tasks in this story -->
<invoke-task type="Task" model="haiku">
<description>Verify all {{total_count}} story tasks</description>
<prompt>
You are verifying ALL tasks for this user story by reading actual code.
**Story:** {{story_id}}
**Epic:** {{epic_num}}
**Total Tasks:** {{total_count}}
**Files from Story (Dev Agent Record):**
{{#each file_list}}
- {{this}}
{{/each}}
**Tasks to Verify:**
{{#each task_list}}
{{@index}}. [{{#if this.checked}}x{{else}} {{/if}}] {{this.text}}
{{/each}}
---
**Your Job:**
For EACH task above:
1. **Find relevant files** - Use Glob to find files mentioned in task
2. **Read the files** - Use Read tool to examine actual code
3. **Verify implementation:**
- Is code real or stubs/TODOs?
- Is there error handling?
- Multi-tenant isolation (dealerId filters)?
- Are there tests?
- Does it match task description?
4. **Make judgment for each task**
**Output Format - JSON array with one entry per task:**
```json
{
"story_id": "{{story_id}}",
"total_tasks": {{total_count}},
"tasks": [
{
"task_number": 0,
"task_text": "Implement UserService",
"is_checked": true,
"actually_complete": false,
"confidence": "high",
"evidence": "File exists but has 'TODO: Implement findById' on line 45, tests not found",
"issues_found": ["Stub implementation", "Missing tests", "No dealerId filter"],
"recommendation": "Implement real logic, add tests, add multi-tenant isolation"
},
{
"task_number": 1,
"task_text": "Add error handling",
"is_checked": true,
"actually_complete": true,
"confidence": "very_high",
"evidence": "Try-catch blocks in UserService.ts:67-89, proper error logging, tests verify error cases",
"issues_found": [],
"recommendation": "None - task complete"
}
]
}
```
**Be efficient:** Read files once, verify all tasks, return comprehensive JSON.
</prompt>
<subagent_type>general-purpose</subagent_type>
</invoke-task>
<action>Parse agent response (extract JSON)</action>
<action>For each task result:
- Determine verification_status (correct/false_positive/false_negative)
- Categorize into verified_complete, false_positives, false_negatives lists
- Count totals
</action>
<output>
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Task Verification Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**✅ Verified Complete:** {{verified_complete_count}}
**❌ False Positives:** {{false_positive_count}} (checked but code missing/poor)
**⚠️ False Negatives:** {{false_negative_count}} (unchecked but code exists)
**❓ Uncertain:** {{uncertain_count}}
**Verification Score:** {{verification_score}}/100
</output>
</step>
<step n="3" goal="Calculate overall story health">
<action>Calculate scores:
- Task accuracy: (correct / total) × 100
- False positive penalty: false_positive_count × -5
- Overall score: max(0, task_accuracy + penalty)
</action>
<action>Determine story category:
IF score >= 95 AND false_positives == 0
→ VERIFIED_COMPLETE
ELSE IF score >= 80 AND false_positives <= 2
→ COMPLETE_WITH_MINOR_ISSUES
ELSE IF false_positives > 5 OR score < 50
→ FALSE_POSITIVE (story claimed done but significant missing code)
ELSE IF false_positives > 0
→ NEEDS_REWORK
ELSE
→ IN_PROGRESS
</action>
<action>Determine recommended status:
VERIFIED_COMPLETE → "done"
COMPLETE_WITH_MINOR_ISSUES → "review"
FALSE_POSITIVE → "in-progress" or "ready-for-dev"
NEEDS_REWORK → "in-progress"
IN_PROGRESS → "in-progress"
</action>
<output>
📊 **STORY HEALTH ASSESSMENT**
**Current Status:** {{current_status}}
**Recommended Status:** {{recommended_status}}
**Overall Score:** {{overall_score}}/100
**Category:** {{category}}
{{#if category == "VERIFIED_COMPLETE"}}
✅ **Story is production-ready**
- All tasks verified complete
- Code quality confirmed
- No significant issues found
{{/if}}
{{#if category == "FALSE_POSITIVE"}}
❌ **Story claimed done but has significant missing code**
- {{false_positive_count}} tasks checked but not implemented
- Verification score: {{overall_score}}/100 (< 50% = false positive)
- Action: Update status to in-progress, implement missing tasks
{{/if}}
{{#if category == "NEEDS_REWORK"}}
⚠️ **Story needs rework before marking complete**
- {{false_positive_count}} tasks with missing/poor code
- Issues found in verification
- Action: Fix issues, re-verify
{{/if}}
</output>
</step>
<step n="4" goal="Generate detailed validation report">
<template-output>
# Story Validation Report: {{story_id}}
**Generated:** {{date}}
**Validation Method:** LLM-powered deep verification (Haiku 4.5)
**Overall Score:** {{overall_score}}/100
**Category:** {{category}}
---
## Summary
**Story:** {{story_id}}
**Epic:** {{epic_num}}
**Current Status:** {{current_status}}
**Recommended Status:** {{recommended_status}}
**Task Verification:**
- Total: {{total_count}}
- Checked: {{checked_count}}
- Verified Complete: {{verified_complete_count}}
- False Positives: {{false_positive_count}}
- False Negatives: {{false_negative_count}}
---
## Verification Details
{{#if false_positive_count > 0}}
### ❌ False Positives (CRITICAL - Code Claims vs Reality)
{{#each false_positives}}
**Task {{@index + 1}}:** {{this.task}}
**Claimed:** [x] Complete
**Reality:** Code missing or stub implementation
**Evidence:**
{{this.evidence}}
**Issues Found:**
{{#each this.issues_found}}
- {{this}}
{{/each}}
**Recommendation:** {{this.recommendation}}
---
{{/each}}
{{/if}}
{{#if false_negative_count > 0}}
### ⚠️ False Negatives (Unchecked But Working)
{{#each false_negatives}}
**Task {{@index + 1}}:** {{this.task}}
**Status:** [ ] Unchecked
**Reality:** Code exists and working
**Evidence:**
{{this.evidence}}
**Recommendation:** Mark task as complete [x]
---
{{/each}}
{{/if}}
{{#if verified_complete_count > 0}}
### ✅ Verified Complete Tasks
{{verified_complete_count}} tasks verified with actual code review.
{{#if show_all_verified}}
{{#each verified_complete}}
- {{this.task}} ({{this.confidence}} confidence)
{{/each}}
{{/if}}
{{/if}}
---
## Final Verdict
**Overall Score:** {{overall_score}}/100
{{#if category == "VERIFIED_COMPLETE"}}
✅ **VERIFIED COMPLETE**
This story is production-ready:
- All {{total_count}} tasks verified complete
- Code quality confirmed through review
- No significant issues found
- Status "done" is accurate
**Action:** None needed - story is solid
{{/if}}
{{#if category == "FALSE_POSITIVE"}}
❌ **FALSE POSITIVE - Story NOT Actually Complete**
**Problems:**
- {{false_positive_count}} tasks checked but code missing/stubbed
- Verification score: {{overall_score}}/100 (< 50%)
- Story marked "{{current_status}}" but significant work remains
**Required Actions:**
1. Update sprint-status.yaml: {{story_id}} → in-progress
2. Uncheck {{false_positive_count}} false positive tasks
3. Implement missing code
4. Re-run validation after implementation
**Estimated Rework:** {{estimated_rework_hours}} hours
{{/if}}
{{#if category == "NEEDS_REWORK"}}
⚠️ **NEEDS REWORK**
**Problems:**
- {{false_positive_count}} tasks with quality issues
- Some code exists but has problems (TODOs, missing features, poor quality)
**Required Actions:**
{{#each action_items}}
- [ ] {{this}}
{{/each}}
**Estimated Fix Time:** {{estimated_fix_hours}} hours
{{/if}}
{{#if category == "IN_PROGRESS"}}
🔄 **IN PROGRESS** (accurate status)
- {{checked_count}}/{{total_count}} tasks complete
- {{unchecked_count}} tasks remaining
- Current status reflects reality
**No action needed** - continue implementation
{{/if}}
---
**Validation Cost:** ~${{validation_cost}}
**Agent Model:** {{agent_model}}
**Tasks Verified:** {{total_count}}
</template-output>
</step>
<step n="5" goal="Update sprint-status if needed">
<check if="{{recommended_status}} != {{current_status}}">
<ask>Story status should be updated from "{{current_status}}" to "{{recommended_status}}". Update sprint-status.yaml? (y/n)</ask>
<check if="user says yes">
<action>Update sprint-status.yaml:
python3 scripts/lib/sprint-status-updater.py --epic {{epic_num}} --mode fix
</action>
<action>Add validation note to story file Dev Agent Record</action>
<output>✅ Updated {{story_id}}: {{current_status}} → {{recommended_status}}</output>
</check>
</check>
<check if="{{recommended_status}} == {{current_status}}">
<output>✅ Story status is accurate - no changes needed</output>
</check>
</step>
</workflow>

View File

@ -0,0 +1,29 @@
name: validate-story-deep
description: "Deep story validation using Haiku agents to read and verify actual code. Each task gets micro code review to verify implementation quality."
author: "BMad"
version: "1.0.0"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-story-deep"
instructions: "{installed_path}/instructions.xml"
# Input variables
variables:
story_file: "" # Path to story file to validate
# Agent configuration
agent_model: "haiku" # Use Haiku 4.5 for cost efficiency ($0.13/story vs $1.50)
parallel_tasks: true # Validate tasks in parallel (faster)
# Output
default_output_file: "{story_dir}/.validation-{story_id}-{date}.md"
standalone: true
web_bundle: false

View File

@ -0,0 +1,395 @@
<workflow>
<critical>The workflow execution engine is governed by: {project-root}/_bmad/core/tasks/workflow.xml</critical>
<critical>You MUST have already loaded and processed: {installed_path}/workflow.yaml</critical>
<critical>This performs DEEP validation - not just checkbox counting, but verifying code actually exists and works</critical>
<step n="1" goal="Load and parse story file">
<action>Load story file from {{story_file}}</action>
<check if="file not found">
<output>❌ Story file not found: {{story_file}}
Please provide a valid story file path.
</output>
<action>HALT</action>
</check>
<action>Extract story metadata:
- Story ID (from filename)
- Epic number
- Current status from Status: field
- Priority
- Estimated effort
</action>
<action>Extract all tasks:
- Pattern: "- [ ]" or "- [x]"
- Count total tasks
- Count checked tasks
- Count unchecked tasks
- Calculate completion percentage
</action>
<action>Extract file references from Dev Agent Record:
- Files created
- Files modified
- Files deleted
</action>
<output>📋 **Story Validation: {{story_id}}**
**Epic:** {{epic_num}}
**Current Status:** {{current_status}}
**Tasks:** {{checked_count}}/{{total_count}} complete ({{completion_pct}}%)
**Files Referenced:** {{file_count}}
Starting deep validation...
</output>
</step>
<step n="2" goal="Task-based verification (Deep)">
<critical>Use task-verification-engine.py for DEEP verification (not just file existence)</critical>
<action>For each task in story:
1. Extract task text
2. Note if checked [x] or unchecked [ ]
3. Pass to task-verification-engine.py
4. Receive verification result with:
- should_be_checked: true/false
- confidence: very high/high/medium/low
- evidence: list of findings
- verification_status: correct/false_positive/false_negative/uncertain
</action>
<action>Categorize tasks by verification status:
- ✅ CORRECT: Checkbox matches reality
- ❌ FALSE POSITIVE: Checked but code missing/stubbed
- ⚠️ FALSE NEGATIVE: Unchecked but code exists
- ❓ UNCERTAIN: Cannot verify (low confidence)
</action>
<action>Calculate verification score:
- (correct_tasks / total_tasks) × 100
- Penalize false positives heavily (-5 points each)
- Penalize false negatives lightly (-2 points each)
</action>
<output>
🔍 **Task Verification Results**
**Total Tasks:** {{total_count}}
**✅ CORRECT:** {{correct_count}} tasks (checkbox matches reality)
**❌ FALSE POSITIVES:** {{false_positive_count}} tasks (checked but code missing/stubbed)
**⚠️ FALSE NEGATIVES:** {{false_negative_count}} tasks (unchecked but code exists)
**❓ UNCERTAIN:** {{uncertain_count}} tasks (cannot verify)
**Verification Score:** {{verification_score}}/100
{{#if false_positive_count > 0}}
### ❌ False Positives (CRITICAL - Code Claims vs Reality)
{{#each false_positives}}
**Task:** {{this.task}}
**Claimed:** [x] Complete
**Reality:** {{this.evidence}}
**Action Required:** {{this.recommended_action}}
{{/each}}
{{/if}}
{{#if false_negative_count > 0}}
### ⚠️ False Negatives (Unchecked but Working)
{{#each false_negatives}}
**Task:** {{this.task}}
**Status:** [ ] Unchecked
**Reality:** {{this.evidence}}
**Recommendation:** Mark as complete [x]
{{/each}}
{{/if}}
</output>
</step>
<step n="3" goal="Code quality review" if="{{validation_depth}} == deep OR comprehensive">
<action>Extract all files from Dev Agent Record file list</action>
<check if="no files listed">
<output>⚠️ No files listed in Dev Agent Record - cannot perform code review</output>
<action>Skip to step 4</action>
</check>
<action>For each file:
1. Check if file exists
2. Read file content
3. Check for quality issues:
- TODO/FIXME comments without GitHub issues
- any types in TypeScript
- Hardcoded values (siteId, dealerId, API keys)
- Missing error handling
- Missing multi-tenant isolation (dealerId filters)
- Missing audit logging on mutations
- Security vulnerabilities (SQL injection, XSS)
</action>
<action>Run multi-agent review if files exist:
- Security audit
- Silent failure detection
- Architecture compliance
- Performance analysis
</action>
<action>Categorize issues by severity:
- CRITICAL: Security, data loss, breaking changes
- HIGH: Missing features, poor quality, technical debt
- MEDIUM: Code smells, minor violations
- LOW: Style issues, nice-to-haves
</action>
<output>
🛡️ **Code Quality Review**
**Files Reviewed:** {{files_reviewed}}
**Files Missing:** {{files_missing}}
**Issues Found:** {{total_issues}}
CRITICAL: {{critical_count}}
HIGH: {{high_count}}
MEDIUM: {{medium_count}}
LOW: {{low_count}}
{{#if critical_count > 0}}
### 🚨 CRITICAL Issues (Must Fix)
{{#each critical_issues}}
**File:** {{this.file}}
**Issue:** {{this.description}}
**Impact:** {{this.impact}}
**Fix:** {{this.recommended_fix}}
{{/each}}
{{/if}}
{{#if high_count > 0}}
### ⚠️ HIGH Priority Issues
{{#each high_issues}}
**File:** {{this.file}}
**Issue:** {{this.description}}
{{/each}}
{{/if}}
**Code Quality Score:** {{quality_score}}/100
</output>
</step>
<step n="4" goal="Integration verification" if="{{validation_depth}} == comprehensive">
<action>Extract dependencies from story:
- Services called
- APIs consumed
- Database tables used
- Cache keys accessed
</action>
<action>For each dependency:
1. Check if dependency still exists
2. Check if API contract is still valid
3. Run integration tests if they exist
4. Check for breaking changes in dependent stories
</action>
<output>
🔗 **Integration Verification**
**Dependencies Checked:** {{dependency_count}}
{{#if broken_integrations}}
### ❌ Broken Integrations
{{#each broken_integrations}}
**Dependency:** {{this.name}}
**Issue:** {{this.problem}}
**Likely Cause:** {{this.cause}}
**Fix:** {{this.fix}}
{{/each}}
{{/if}}
{{#if all_integrations_ok}}
✅ All integrations verified working
{{/if}}
</output>
</step>
<step n="5" goal="Determine final story status">
<action>Calculate overall story health:
- Task verification score (0-100)
- Code quality score (0-100)
- Integration score (0-100)
- Overall score = weighted average
</action>
<action>Determine recommended status:
IF verification_score >= 95 AND quality_score >= 90 AND no CRITICAL issues
→ VERIFIED_COMPLETE
ELSE IF verification_score >= 80 AND quality_score >= 70
→ COMPLETE_WITH_ISSUES (document issues)
ELSE IF false_positives > 0 OR critical_issues > 0
→ NEEDS_REWORK (code missing or broken)
ELSE IF verification_score < 50
→ FALSE_POSITIVE (claimed done but not implemented)
ELSE
→ IN_PROGRESS (partially complete)
</action>
<output>
📊 **FINAL VERDICT**
**Story:** {{story_id}}
**Current Status:** {{current_status}}
**Recommended Status:** {{recommended_status}}
**Scores:**
Task Verification: {{verification_score}}/100
Code Quality: {{quality_score}}/100
Integration: {{integration_score}}/100
**Overall: {{overall_score}}/100**
**Confidence:** {{confidence_level}}
{{#if recommended_status != current_status}}
### ⚠️ Status Change Recommended
**Current:** {{current_status}}
**Should Be:** {{recommended_status}}
**Reason:**
{{status_change_reason}}
{{/if}}
</output>
</step>
<step n="6" goal="Generate actionable report">
<template-output>
# Story Validation Report: {{story_id}}
**Validation Date:** {{date}}
**Validation Depth:** {{validation_depth}}
**Overall Score:** {{overall_score}}/100
---
## Summary
**Story:** {{story_id}} - {{story_title}}
**Epic:** {{epic_num}}
**Current Status:** {{current_status}}
**Recommended Status:** {{recommended_status}}
**Task Completion:** {{checked_count}}/{{total_count}} ({{completion_pct}}%)
**Verification Score:** {{verification_score}}/100
**Code Quality Score:** {{quality_score}}/100
---
## Task Verification Details
{{task_verification_output}}
---
## Code Quality Review
{{code_quality_output}}
---
## Integration Verification
{{integration_output}}
---
## Recommended Actions
{{#if critical_issues}}
### Priority 1: Fix Critical Issues (BLOCKING)
{{#each critical_issues}}
- [ ] {{this.file}}: {{this.description}}
{{/each}}
{{/if}}
{{#if false_positives}}
### Priority 2: Fix False Positives (Code Claims vs Reality)
{{#each false_positives}}
- [ ] {{this.task}} - {{this.evidence}}
{{/each}}
{{/if}}
{{#if high_issues}}
### Priority 3: Address High Priority Issues
{{#each high_issues}}
- [ ] {{this.file}}: {{this.description}}
{{/each}}
{{/if}}
{{#if false_negatives}}
### Priority 4: Update Task Checkboxes (Low Impact)
{{#each false_negatives}}
- [ ] Mark complete: {{this.task}}
{{/each}}
{{/if}}
---
## Next Steps
{{#if recommended_status == "VERIFIED_COMPLETE"}}
✅ **Story is verified complete and production-ready**
- Update sprint-status.yaml: {{story_id}} = done
- No further action required
{{/if}}
{{#if recommended_status == "NEEDS_REWORK"}}
⚠️ **Story requires rework before marking complete**
- Fix {{critical_count}} CRITICAL issues
- Address {{false_positive_count}} false positive tasks
- Re-run validation after fixes
{{/if}}
{{#if recommended_status == "FALSE_POSITIVE"}}
❌ **Story is marked done but not actually implemented**
- Verification score: {{verification_score}}/100 (< 50%)
- Update sprint-status.yaml: {{story_id}} = in-progress or ready-for-dev
- Implement missing tasks before claiming done
{{/if}}
---
**Generated by:** /validate-story workflow
**Validation Engine:** task-verification-engine.py v2.0
</template-output>
</step>
<step n="7" goal="Update story file and sprint-status">
<ask>Apply recommended status change to sprint-status.yaml? (y/n)</ask>
<check if="user says yes">
<action>Update sprint-status.yaml:
- Use sprint-status-updater.py
- Update {{story_id}} to {{recommended_status}}
- Add comment: "Validated {{date}}, score {{overall_score}}/100"
</action>
<action>Update story file:
- Add validation report link to Dev Agent Record
- Add validation score to completion notes
- Update Status: field if changed
</action>
<output>✅ Updated {{story_id}} status: {{current_status}} → {{recommended_status}}</output>
</check>
<check if="user says no">
<output> Status not updated. Validation report saved for reference.</output>
</check>
</step>
</workflow>

View File

@ -0,0 +1,29 @@
name: validate-story
description: "Deep validation of a single story: verify tasks against codebase, run code quality review, check for regressions. Produces verification report with actionable findings."
author: "BMad"
version: "1.0.0"
# Critical variables from config
config_source: "{project-root}/_bmad/bmm/config.yaml"
user_name: "{config_source}:user_name"
communication_language: "{config_source}:communication_language"
implementation_artifacts: "{config_source}:implementation_artifacts"
story_dir: "{implementation_artifacts}"
# Workflow components
installed_path: "{project-root}/_bmad/bmm/workflows/4-implementation/validate-story"
instructions: "{installed_path}/instructions.xml"
# Input variables
variables:
story_file: "" # Path to story file (e.g., docs/sprint-artifacts/16e-6-ecs-task-definitions-tier3.md)
validation_depth: "deep" # Options: "quick" (tasks only), "deep" (tasks + code review), "comprehensive" (tasks + review + integration tests)
# Tools
task_verification_script: "{project-root}/scripts/lib/task-verification-engine.py"
# Output
default_output_file: "{story_dir}/.validation-{story_id}-{date}.md"
standalone: true
web_bundle: false