From 10dc25f43dd74d135258c8a3e10f6920004096dc Mon Sep 17 00:00:00 2001 From: Mario Semper Date: Sun, 23 Nov 2025 02:22:21 +0100 Subject: [PATCH 1/2] feat: Ring of Fire (ROF) Sessions - Multi-agent parallel collaboration Introduces Ring of Fire Sessions feature for BMad Method, enabling multi-agent collaborative sessions that run in parallel to user workflow. Key features: - User-controlled scope (2 agents/5min to 10 agents/2hrs) - Approval-gated tool access for safety - Flexible reporting (brief/detailed/live) - Parallel workflow support Origin: tellingCube project (masemIT e.U.) Real-world validated with successful multi-agent planning sessions. Command: *rof "" --agents [--report mode] --- docs/ring-of-fire-sessions.md | 256 ++++++++++++++++++++++++++++++++++ 1 file changed, 256 insertions(+) create mode 100644 docs/ring-of-fire-sessions.md diff --git a/docs/ring-of-fire-sessions.md b/docs/ring-of-fire-sessions.md new file mode 100644 index 00000000..7807a539 --- /dev/null +++ b/docs/ring-of-fire-sessions.md @@ -0,0 +1,256 @@ +# BMad Method PR #1: Ring of Fire (ROF) Sessions + +**Feature Type**: Core workflow enhancement +**Status**: Draft for community review +**Origin**: tellingCube project (masemIT e.U.) +**Author**: Mario Semper (@sempre) +**Date**: 2025-11-23 + +--- + +## Summary + +**Ring of Fire (ROF) Sessions** enable multi-agent collaborative sessions that run in parallel to the user's main workflow, allowing users to delegate complex multi-perspective analysis while continuing other work. + +--- + +## Problem Statement + +Current BMad Method requires **sequential agent interaction**. When users need multiple agents to collaborate on a complex topic, they must: +- Manually orchestrate each agent conversation +- Stay in the loop for every exchange +- Wait for sequential responses before proceeding +- Context-switch constantly between tasks + +This creates **bottlenecks** and prevents **parallel work streams**. + +--- + +## Proposed Solution: Ring of Fire Sessions + +A new command pattern that enables **scoped multi-agent collaboration sessions** that run while the user continues other work. + +### Command Syntax + +```bash +*rof "" --agents [--report brief|detailed|live] +``` + +### Example Usage + +```bash +*rof "API Refactoring Strategy" --agents dev,architect,qa --report brief +``` + +**What happens**: +1. Dev, Architect, and QA agents enter a collaborative session +2. They analyze the topic together (code review, design discussion, testing concerns) +3. When agents need tool access (read files, run commands), they request user approval +4. User continues working on other tasks in parallel +5. Session ends with consolidated report (brief: just recommendations, detailed: full transcript) + +--- + +## Key Features + +### 1. User-Controlled Scope +- **Small**: 2 agents, 5-minute quick discussion +- **Large**: 10 agents, 2-hour deep analysis +- User decides granularity based on complexity + +### 2. Approval-Gated Tool Access +- Agents can **discuss** freely within the session +- When agents need **tools** (read files, execute commands, make changes), they: + - Pause the session + - Request user approval + - Resume after user decision + +**Why**: Maintains user control, prevents runaway agent actions + +### 3. Flexible Reporting + +| Mode | Description | Use Case | +|------|-------------|----------| +| `brief` | Final recommendations only | "Just tell me what to do" | +| `detailed` | Full transcript + recommendations | "Show me the reasoning" | +| `live` | Real-time updates as agents discuss | "I want to observe" | + +**Default**: `brief` with Q&A available + +### 4. Parallel Workflows +- User works on **Task A** while ROF session tackles **Task B** +- No context-switching overhead +- Efficient use of time + +--- + +## Use Cases + +### 1. Architecture Reviews +```bash +*rof "Evaluate microservices vs monolith for new feature" --agents architect,dev,qa +``` +**Agents collaborate on**: Design trade-offs, implementation complexity, testing implications + +### 2. Code Refactoring +```bash +*rof "Refactor authentication module" --agents dev,architect --report detailed +``` +**Agents collaborate on**: Current code analysis, refactoring approach, migration strategy + +### 3. Feature Planning +```bash +*rof "Plan user notifications feature" --agents pm,ux,dev --report brief +``` +**Agents collaborate on**: Requirements, UX flow, technical feasibility, timeline + +### 4. Quality Gates +```bash +*rof "Investigate test failures in CI/CD" --agents qa,dev --report live +``` +**Agents collaborate on**: Root cause analysis, fix recommendations, regression prevention + +### 5. Documentation Sprints +```bash +*rof "Document API endpoints" --agents dev,pm,ux +``` +**Agents collaborate on**: Technical accuracy, user-friendly examples, completeness + +--- + +## User Experience Flow + +```mermaid +sequenceDiagram + User->>River: *rof "Topic" --agents dev,architect + River->>Dev: Join ROF session + River->>Architect: Join ROF session + River->>User: Session started, continue your work + + Dev->>Architect: Discuss approach + Architect->>Dev: Suggest alternatives + + Dev->>User: Need to read auth.ts - approve? + User->>Dev: Approved + Dev->>Architect: After reading file... + + Architect->>Dev: Recommendation + Dev->>River: Session complete + River->>User: Brief report: [Recommendations] +``` + +--- + +## Implementation Considerations + +### Technical Requirements +- **Session state management**: Track active ROF sessions, participating agents +- **Agent context sharing**: Agents share knowledge within session scope +- **User approval workflow**: Clear prompt for tool requests +- **Report generation**: Brief/detailed/live output formatting +- **Workflow integration**: Link ROF findings to existing workflow plans/todos + +### Open Questions for Community + +1. **Integration**: Core BMad feature or plugin/extension? +2. **Concurrency**: How to handle file conflicts if multiple agents want to edit? +3. **Cost Model**: Guidance for LLM call budgeting with multiple agents? +4. **Session Limits**: Recommended max agents/duration? +5. **Agent Communication**: Free-form discussion or structured turn-taking? + +--- + +## Real-World Validation + +**Origin Project**: tellingCube (BI dashboard, masemIT e.U.) + +**Validation Scenario**: +- **Topic**: "Next steps for tellingCube after validation test" +- **Agents**: River (orchestrator), Mary (analyst), Winston (architect) +- **Report Mode**: Brief +- **Outcome**: Successfully analyzed post-validation roadmap with 3 scenarios (GO/CHANGE/NO-GO), delivered consolidated recommendations in 5 minutes + +**User Feedback (Mario Semper)**: +> "This is exactly what I needed - I wanted multiple perspectives without having to orchestrate every conversation. The brief report gave me actionable next steps immediately." + +**Documentation**: `docs/_masemIT/readme.md` in tellingCube repository + +--- + +## Proposed Documentation Structure + +``` +.bmad-core/ + features/ + ring-of-fire.md # Feature specification + +docs/ + guides/ + using-rof-sessions.md # User guide with examples + + architecture/ + agent-collaboration.md # Technical design + rof-session-management.md # State handling approach +``` + +--- + +## Benefits + +✅ **Unlocks parallel workflows** - User productivity gains +✅ **Reduces context-switching** - Cognitive load reduction +✅ **Enables complex analysis** - Multi-perspective insights +✅ **Maintains user control** - Approval gates for tools +✅ **Scales flexibly** - From quick checks to deep dives + +--- + +## Comparison to Existing Patterns + +| Feature | Standard Agent Use | ROF Session | +|---------|-------------------|-------------| +| Agent collaboration | Sequential (one at a time) | Parallel (multiple simultaneously) | +| User involvement | Required for every exchange | Only for approvals | +| Parallel work | No (user waits) | Yes (user continues tasks) | +| Output | Chat transcript | Consolidated report | +| Use case | Single-perspective tasks | Multi-perspective analysis | + +--- + +## Next Steps + +1. **Community feedback** on approach and open questions +2. **Technical design** refinement (state management, agent communication) +3. **Prototype implementation** in BMad core or as extension +4. **Beta testing** with real projects (beyond tellingCube) +5. **Documentation** completion with examples + +--- + +## Alternatives Considered + +### Alt 1: "Breakout Session" +- **Pros**: Clear meeting metaphor +- **Cons**: Less evocative, doesn't convey "continuous collaborative space" + +### Alt 2: "Agent Huddle" +- **Pros**: Short, casual +- **Cons**: Implies quick/informal only + +### Alt 3: "Lagerfeuer" (original German name) +- **Pros**: Warm, campfire metaphor +- **Cons**: Poor i18n, hard to pronounce/remember for non-German speakers + +**Chosen**: **Ring of Fire** - evokes continuous collaboration circle, internationally understood, memorable, shortcut "ROF" works well + +--- + +## References + +- **Source Project**: tellingCube (https://github.com/masemIT/telling-cube) [if public] +- **Documentation**: `docs/_masemIT/readme.md` +- **Discussion**: [Link to BMad community discussion if applicable] + +--- + +**Contribution ready for review.** Feedback welcome! 🔥 From 12e0840c62a6cd0be35859bc47c145d745b7a329 Mon Sep 17 00:00:00 2001 From: Mario Semper Date: Sun, 23 Nov 2025 02:24:24 +0100 Subject: [PATCH 2/2] feat: Agent Task Pre-Flight Protocol - Safety framework for high-risk tasks Introduces mandatory safety checks for high-risk agent tasks (marketing, legal, deployment) to prevent factual errors, trademark violations, and privacy breaches. Key features: - Mandatory verification before high-risk outputs - Cross-agent fact-checking - Critical guidelines framework - User approval gates Origin: Lessons from tellingCube project Prevents: Wrong pricing, trademark violations, privacy leaks, feature hallucinations Includes: CRITICAL-GUIDELINES.md template and implementation examples --- docs/agent-preflight-protocol.md | 383 +++++++++++++++++++++++++++++++ 1 file changed, 383 insertions(+) create mode 100644 docs/agent-preflight-protocol.md diff --git a/docs/agent-preflight-protocol.md b/docs/agent-preflight-protocol.md new file mode 100644 index 00000000..28869de6 --- /dev/null +++ b/docs/agent-preflight-protocol.md @@ -0,0 +1,383 @@ +# BMad Method PR #2: Agent Task Pre-Flight Protocol + +**Feature Type**: Safety & quality framework +**Status**: Draft for community review +**Origin**: tellingCube project learnings (masemIT e.U.) +**Author**: Mario Semper (@sempre) +**Date**: 2025-11-23 + +--- + +## Summary + +**Agent Task Pre-Flight Protocol** establishes mandatory safety checks for high-risk agent tasks (marketing, legal, deployment) to prevent factual errors, trademark violations, privacy breaches, and assumption-based mistakes. + +--- + +## Problem Statement + +### Real-World Failure Case + +**Scenario**: Marketing agent (Sophie) created LinkedIn launch posts for tellingCube without: +- Reading existing project documentation +- Verifying pricing against actual implementation +- Checking trademark compliance rules +- Reviewing privacy guidelines + +**Result**: Multiple critical errors: +- ❌ Mentioned user's day job title (privacy/legal risk) +- ❌ Used family member's name (privacy violation) +- ❌ Claimed "60 seconds" generation time (factually wrong) +- ❌ Advertised "€9/month" pricing (doesn't exist - actual: €29-€999 ONE-TIME) +- ❌ Used "IBCS-compliant" (trademark violation - should be "inspired by IBCS©") + +**Root Cause**: Agent operated independently without pre-task verification protocol. + +--- + +## Current BMad Behavior (Risky) + +```yaml +User: "Sophie, create LinkedIn launch posts" + +Sophie: + 1. Generates content based on general knowledge + 2. Makes assumptions about features/pricing + 3. Uses marketing best practices + 4. Presents to user + +❌ Problem: No verification step before creation +``` + +--- + +## Proposed Solution: Pre-Flight Protocol + +### Mandatory Checks Before High-Risk Tasks + +```yaml +Agent Task Pre-Flight Protocol: + +BEFORE executing tasks with external impact: + 1. Discover Critical Context + - Search for CRITICAL-GUIDELINES.md or similar + - Read recent related work in project + - Check actual implementation (code, configs, not assumptions) + + 2. Verify Assumptions + - Pricing: Read Stripe config / pricing components + - Features: Grep codebase for actual capabilities + - Legal/Trademark: Check documented compliance rules + - Privacy: Verify no personal info in public content + + 3. Cross-Agent Review (for high-risk outputs) + - Orchestrator reviews before user sees + - Fact-checker agent validates claims + - Minimum 2 agents verify before publishing + + 4. User Approval Gate + - Present content as DRAFT + - Highlight assumptions made + - Get explicit approval before finalizing +``` + +--- + +## High-Risk Task Categories + +### 1. Marketing & Public Content +**Examples**: LinkedIn posts, press releases, demo videos, website copy + +**Pre-Flight Required**: +- [ ] Read `CRITICAL-GUIDELINES.md` (legal, trademark, privacy rules) +- [ ] Verify pricing from actual Stripe/payment config +- [ ] Verify features from actual codebase (not roadmap ideas) +- [ ] Check trademark compliance (e.g., "IBCS©" usage rules) +- [ ] Privacy review (no personal identifiers without consent) +- [ ] Cross-agent fact-check before presenting to user + +### 2. Legal & Compliance +**Examples**: Terms of service, privacy policy, license agreements + +**Pre-Flight Required**: +- [ ] Read existing legal docs (don't start from scratch) +- [ ] Check jurisdiction-specific requirements +- [ ] Verify against actual product behavior (data handling, cookies, etc.) +- [ ] Legal expert review (human or specialized agent) +- [ ] User final approval required + +### 3. Deployment & Infrastructure +**Examples**: Database migrations, production deployments, DNS changes + +**Pre-Flight Required**: +- [ ] Read deployment runbooks/checklists +- [ ] Verify current production state +- [ ] Check for breaking changes +- [ ] Backup strategy confirmed +- [ ] Rollback plan documented +- [ ] User explicit approval with understanding of risks + +### 4. Financial & Billing +**Examples**: Stripe configuration, pricing changes, refund policies + +**Pre-Flight Required**: +- [ ] Read current Stripe dashboard state +- [ ] Verify tax/legal implications +- [ ] Check grandfather clause impacts +- [ ] Financial impact assessment +- [ ] User approval with revenue projections + +--- + +## Implementation Guidelines + +### For Agent Developers + +**In agent YAML definition**: + +```yaml +agent: + name: Sophie + id: marketing + high_risk_tasks: true # Triggers pre-flight protocol + +pre_flight: + required_reads: + - docs/marketing/CRITICAL-GUIDELINES.md + - components/landing/PricingSection.tsx + - docs/_masemIT/readme.md + + verification_steps: + - Grep for actual pricing tiers in codebase + - Check trademark compliance rules + - Privacy scan (no personal names/details) + + cross_check: + agents: [river, mary] + approval_required: true + +tasks: + create-linkedin-post: + pre_flight_mandatory: true + approval_gate: user +``` + +### For Orchestrators (River-like agents) + +**Orchestrator responsibilities**: + +```python +def execute_high_risk_task(agent, task, user_request): + # Step 1: Pre-flight checks + critical_docs = discover_critical_guidelines() + agent.read(critical_docs) + + # Step 2: Agent executes with verification + draft_output = agent.execute_task(task) + + # Step 3: Cross-agent review + fact_check_agent = get_agent("mary") + verification = fact_check_agent.verify(draft_output, codebase) + + # Step 4: Present as DRAFT to user + if verification.has_issues: + present_issues_to_user(verification.issues) + + present_as_draft(draft_output) + + # Step 5: User approval gate + approval = get_user_approval() + + if approval: + finalize(draft_output) +``` + +--- + +## Example: Correct Marketing Flow + +### Before (Risky) + +``` +User: "Create LinkedIn launch posts" +Sophie: [Generates 3 posts with assumptions] +Sophie: "Here are your posts!" + +❌ Contains errors user must catch +``` + +### After (Safe) + +``` +User: "Create LinkedIn launch posts" + +River: "Sophie, this is a high-risk task. Running pre-flight..." + +Sophie: + ✅ Read CRITICAL-GUIDELINES.md + ✅ Read PricingSection.tsx (actual pricing: €29-€999) + ✅ Checked IBCS© compliance rules (must say "inspired by") + ✅ Privacy check (no "Product Owner", no "brother") + +Sophie: [Generates 3 posts with verified facts] + +River: "Mary, fact-check Sophie's output..." + +Mary: + ✅ Pricing correct (€29-€999 lifetime) + ✅ No trademark violations ("inspired by IBCS©") + ✅ No privacy issues + ✅ Generation time accurate ("minutes") + +River: "Sempre, here's the DRAFT (pre-flight verified). Approve?" + +User: [Reviews, approves] + +✅ No errors, factually accurate +``` + +--- + +## Critical Guidelines Template + +**Every project should have**: `docs/PROJECT-NAME/CRITICAL-GUIDELINES.md` + +```markdown +# CRITICAL Guidelines for [Project Name] + +## ❌ NEVER MENTION +- Confidential info (list specific items) +- Personal details (family, private life) +- Competitor names (if under NDA) + +## ✅ ALWAYS VERIFY +- Pricing: Check [file path] +- Features: Grep [codebase location] +- Legal: Comply with [trademark/license rules] + +## Trademark Compliance +- "IBCS©" → Always say "inspired by IBCS©" (not "compliant") +- [Other trademarks...] + +## Privacy Rules +- No personal job titles in public content +- No family member names +- [Other privacy rules...] + +## Approval Requirements +- Marketing content: River + Mary review +- Legal docs: Legal expert review +- Deployment: User explicit approval +``` + +--- + +## Benefits + +✅ **Prevents costly mistakes** - Catches errors before they're public +✅ **Protects legal compliance** - Trademark, privacy, licensing +✅ **Ensures factual accuracy** - Features/pricing match reality +✅ **Builds user trust** - Agents don't hallucinate facts +✅ **Scalable safety** - Works across all BMad projects + +--- + +## Tradeoffs & Considerations + +### Slower Task Execution +- **Before**: Agent outputs in 30 seconds +- **After**: Pre-flight adds 1-2 minutes +- **Worth it?**: YES for high-risk tasks (marketing, legal, deployment) + +### More Agent Coordination +- Requires orchestrator (River) to manage pre-flight +- Cross-agent reviews add complexity +- **Mitigation**: Only for high-risk tasks, not every task + +### User Approval Friction +- Adds approval gate before finalization +- **Mitigation**: Present as DRAFT with verification status +- User can fast-track if comfortable + +--- + +## Rollout Strategy + +### Phase 1: Opt-In (Recommended) +- Projects mark agents as `high_risk_tasks: true` +- Orchestrators enforce pre-flight for marked agents +- Community feedback on friction/benefits + +### Phase 2: Default for Risky Categories +- Marketing, legal, deployment agents default to pre-flight +- Other agents opt-in if needed + +### Phase 3: Configurable Per-Task +- Users set risk level per task +- `*create-post --risk high` triggers pre-flight +- `*create-post --risk low` skips for drafts + +--- + +## Real-World Validation + +**Origin Project**: tellingCube (masemIT e.U.) + +**Failure Scenario**: +- Marketing agent created launch posts without verification +- 5 critical errors caught by user (should have been caught earlier) +- 30 minutes of rework to fix + +**After Implementing Protocol**: +- CRITICAL-GUIDELINES.md created +- Pre-flight checklist enforced +- Cross-agent review (River → Sophie → Mary → User) +- **Result**: Zero errors in final content + +**User Feedback (Mario Semper)**: +> "I love BMad, but I don't want to repeat the ChatGPT hallucination nightmare. This protocol gives me confidence that agents verify facts before presenting them." + +--- + +## Open Questions for Community + +1. **Scope**: Which task types should default to pre-flight? +2. **Performance**: Is 1-2 minute overhead acceptable for high-risk tasks? +3. **Configurability**: Per-project, per-agent, or per-task risk settings? +4. **Tooling**: Should pre-flight be a separate tool or built into agent execution? +5. **Enforcement**: Optional best practice or mandatory for certain agents? + +--- + +## Next Steps + +1. **Community feedback** on protocol design +2. **Reference implementation** in BMad core +3. **Agent template updates** to include pre-flight hooks +4. **Documentation** with examples for common scenarios +5. **Testing** across different project types + +--- + +## Comparison to Similar Patterns + +| Pattern | Focus | When to Use | +|---------|-------|-------------| +| **Pre-Flight Protocol** | Safety & accuracy | High-risk external outputs | +| **Code Review** | Code quality | Before merging code | +| **QA Gates** | Testing | Before production deployment | +| **Approval Workflows** | Governance | Multi-stakeholder decisions | + +**Pre-Flight Protocol** = "Code review + QA gate" for **agent outputs**. + +--- + +## References + +- **Source Project**: tellingCube (https://github.com/masemIT/telling-cube) [if public] +- **Failure Case**: `docs/bmad-contributions/` (this document) +- **Implementation**: `docs/marketing/CRITICAL-GUIDELINES.md` (tellingCube) + +--- + +**Contribution ready for review.** This came from painful real-world experience - let's make BMad safer for everyone! 🛡️