Merge 4fd8b9018f into 23f650ff4d

2025-12-18 09:53:12 +07:00 · 2025-12-18 09:53:12 +07:00 · 367937769d
parent 23f650ff4d 4fd8b9018f
commit 367937769d
2 changed files with 639 additions and 0 deletions
--- a/docs/agent-preflight-protocol.md
+++ b/docs/agent-preflight-protocol.md
@ -0,0 +1,383 @@
+# BMad Method PR #2: Agent Task Pre-Flight Protocol
+
+**Feature Type**: Safety & quality framework
+**Status**: Draft for community review
+**Origin**: tellingCube project learnings (masemIT e.U.)
+**Author**: Mario Semper (@sempre)
+**Date**: 2025-11-23
+
+---
+
+## Summary
+
+**Agent Task Pre-Flight Protocol** establishes mandatory safety checks for high-risk agent tasks (marketing, legal, deployment) to prevent factual errors, trademark violations, privacy breaches, and assumption-based mistakes.
+
+---
+
+## Problem Statement
+
+### Real-World Failure Case
+
+**Scenario**: Marketing agent (Sophie) created LinkedIn launch posts for tellingCube without:
+- Reading existing project documentation
+- Verifying pricing against actual implementation
+- Checking trademark compliance rules
+- Reviewing privacy guidelines
+
+**Result**: Multiple critical errors:
+- ❌ Mentioned user's day job title (privacy/legal risk)
+- ❌ Used family member's name (privacy violation)
+- ❌ Claimed "60 seconds" generation time (factually wrong)
+- ❌ Advertised "€9/month" pricing (doesn't exist - actual: €29-€999 ONE-TIME)
+- ❌ Used "IBCS-compliant" (trademark violation - should be "inspired by IBCS©")
+
+**Root Cause**: Agent operated independently without pre-task verification protocol.
+
+---
+
+## Current BMad Behavior (Risky)
+
+```yaml
+User: "Sophie, create LinkedIn launch posts"
+
+Sophie:
+  1. Generates content based on general knowledge
+  2. Makes assumptions about features/pricing
+  3. Uses marketing best practices
+  4. Presents to user
+
+❌ Problem: No verification step before creation
+```
+
+---
+
+## Proposed Solution: Pre-Flight Protocol
+
+### Mandatory Checks Before High-Risk Tasks
+
+```yaml
+Agent Task Pre-Flight Protocol:
+
+BEFORE executing tasks with external impact:
+  1. Discover Critical Context
+     - Search for CRITICAL-GUIDELINES.md or similar
+     - Read recent related work in project
+     - Check actual implementation (code, configs, not assumptions)
+
+  2. Verify Assumptions
+     - Pricing: Read Stripe config / pricing components
+     - Features: Grep codebase for actual capabilities
+     - Legal/Trademark: Check documented compliance rules
+     - Privacy: Verify no personal info in public content
+
+  3. Cross-Agent Review (for high-risk outputs)
+     - Orchestrator reviews before user sees
+     - Fact-checker agent validates claims
+     - Minimum 2 agents verify before publishing
+
+  4. User Approval Gate
+     - Present content as DRAFT
+     - Highlight assumptions made
+     - Get explicit approval before finalizing
+```
+
+---
+
+## High-Risk Task Categories
+
+### 1. Marketing & Public Content
+**Examples**: LinkedIn posts, press releases, demo videos, website copy
+
+**Pre-Flight Required**:
+- [ ] Read `CRITICAL-GUIDELINES.md` (legal, trademark, privacy rules)
+- [ ] Verify pricing from actual Stripe/payment config
+- [ ] Verify features from actual codebase (not roadmap ideas)
+- [ ] Check trademark compliance (e.g., "IBCS©" usage rules)
+- [ ] Privacy review (no personal identifiers without consent)
+- [ ] Cross-agent fact-check before presenting to user
+
+### 2. Legal & Compliance
+**Examples**: Terms of service, privacy policy, license agreements
+
+**Pre-Flight Required**:
+- [ ] Read existing legal docs (don't start from scratch)
+- [ ] Check jurisdiction-specific requirements
+- [ ] Verify against actual product behavior (data handling, cookies, etc.)
+- [ ] Legal expert review (human or specialized agent)
+- [ ] User final approval required
+
+### 3. Deployment & Infrastructure
+**Examples**: Database migrations, production deployments, DNS changes
+
+**Pre-Flight Required**:
+- [ ] Read deployment runbooks/checklists
+- [ ] Verify current production state
+- [ ] Check for breaking changes
+- [ ] Backup strategy confirmed
+- [ ] Rollback plan documented
+- [ ] User explicit approval with understanding of risks
+
+### 4. Financial & Billing
+**Examples**: Stripe configuration, pricing changes, refund policies
+
+**Pre-Flight Required**:
+- [ ] Read current Stripe dashboard state
+- [ ] Verify tax/legal implications
+- [ ] Check grandfather clause impacts
+- [ ] Financial impact assessment
+- [ ] User approval with revenue projections
+
+---
+
+## Implementation Guidelines
+
+### For Agent Developers
+
+**In agent YAML definition**:
+
+```yaml
+agent:
+  name: Sophie
+  id: marketing
+  high_risk_tasks: true  # Triggers pre-flight protocol
+
+pre_flight:
+  required_reads:
+    - docs/marketing/CRITICAL-GUIDELINES.md
+    - components/landing/PricingSection.tsx
+    - docs/_masemIT/readme.md
+
+  verification_steps:
+    - Grep for actual pricing tiers in codebase
+    - Check trademark compliance rules
+    - Privacy scan (no personal names/details)
+
+  cross_check:
+    agents: [river, mary]
+    approval_required: true
+
+tasks:
+  create-linkedin-post:
+    pre_flight_mandatory: true
+    approval_gate: user
+```
+
+### For Orchestrators (River-like agents)
+
+**Orchestrator responsibilities**:
+
+```python
+def execute_high_risk_task(agent, task, user_request):
+    # Step 1: Pre-flight checks
+    critical_docs = discover_critical_guidelines()
+    agent.read(critical_docs)
+
+    # Step 2: Agent executes with verification
+    draft_output = agent.execute_task(task)
+
+    # Step 3: Cross-agent review
+    fact_check_agent = get_agent("mary")
+    verification = fact_check_agent.verify(draft_output, codebase)
+
+    # Step 4: Present as DRAFT to user
+    if verification.has_issues:
+        present_issues_to_user(verification.issues)
+
+    present_as_draft(draft_output)
+
+    # Step 5: User approval gate
+    approval = get_user_approval()
+
+    if approval:
+        finalize(draft_output)
+```
+
+---
+
+## Example: Correct Marketing Flow
+
+### Before (Risky)
+
+```
+User: "Create LinkedIn launch posts"
+Sophie: [Generates 3 posts with assumptions]
+Sophie: "Here are your posts!"
+
+❌ Contains errors user must catch
+```
+
+### After (Safe)
+
+```
+User: "Create LinkedIn launch posts"
+
+River: "Sophie, this is a high-risk task. Running pre-flight..."
+
+Sophie:
+  ✅ Read CRITICAL-GUIDELINES.md
+  ✅ Read PricingSection.tsx (actual pricing: €29-€999)
+  ✅ Checked IBCS© compliance rules (must say "inspired by")
+  ✅ Privacy check (no "Product Owner", no "brother")
+
+Sophie: [Generates 3 posts with verified facts]
+
+River: "Mary, fact-check Sophie's output..."
+
+Mary:
+  ✅ Pricing correct (€29-€999 lifetime)
+  ✅ No trademark violations ("inspired by IBCS©")
+  ✅ No privacy issues
+  ✅ Generation time accurate ("minutes")
+
+River: "Sempre, here's the DRAFT (pre-flight verified). Approve?"
+
+User: [Reviews, approves]
+
+✅ No errors, factually accurate
+```
+
+---
+
+## Critical Guidelines Template
+
+**Every project should have**: `docs/PROJECT-NAME/CRITICAL-GUIDELINES.md`
+
+```markdown
+# CRITICAL Guidelines for [Project Name]
+
+## ❌ NEVER MENTION
+- Confidential info (list specific items)
+- Personal details (family, private life)
+- Competitor names (if under NDA)
+
+## ✅ ALWAYS VERIFY
+- Pricing: Check [file path]
+- Features: Grep [codebase location]
+- Legal: Comply with [trademark/license rules]
+
+## Trademark Compliance
+- "IBCS©" → Always say "inspired by IBCS©" (not "compliant")
+- [Other trademarks...]
+
+## Privacy Rules
+- No personal job titles in public content
+- No family member names
+- [Other privacy rules...]
+
+## Approval Requirements
+- Marketing content: River + Mary review
+- Legal docs: Legal expert review
+- Deployment: User explicit approval
+```
+
+---
+
+## Benefits
+
+✅ **Prevents costly mistakes** - Catches errors before they're public
+✅ **Protects legal compliance** - Trademark, privacy, licensing
+✅ **Ensures factual accuracy** - Features/pricing match reality
+✅ **Builds user trust** - Agents don't hallucinate facts
+✅ **Scalable safety** - Works across all BMad projects
+
+---
+
+## Tradeoffs & Considerations
+
+### Slower Task Execution
+- **Before**: Agent outputs in 30 seconds
+- **After**: Pre-flight adds 1-2 minutes
+- **Worth it?**: YES for high-risk tasks (marketing, legal, deployment)
+
+### More Agent Coordination
+- Requires orchestrator (River) to manage pre-flight
+- Cross-agent reviews add complexity
+- **Mitigation**: Only for high-risk tasks, not every task
+
+### User Approval Friction
+- Adds approval gate before finalization
+- **Mitigation**: Present as DRAFT with verification status
+- User can fast-track if comfortable
+
+---
+
+## Rollout Strategy
+
+### Phase 1: Opt-In (Recommended)
+- Projects mark agents as `high_risk_tasks: true`
+- Orchestrators enforce pre-flight for marked agents
+- Community feedback on friction/benefits
+
+### Phase 2: Default for Risky Categories
+- Marketing, legal, deployment agents default to pre-flight
+- Other agents opt-in if needed
+
+### Phase 3: Configurable Per-Task
+- Users set risk level per task
+- `*create-post --risk high` triggers pre-flight
+- `*create-post --risk low` skips for drafts
+
+---
+
+## Real-World Validation
+
+**Origin Project**: tellingCube (masemIT e.U.)
+
+**Failure Scenario**:
+- Marketing agent created launch posts without verification
+- 5 critical errors caught by user (should have been caught earlier)
+- 30 minutes of rework to fix
+
+**After Implementing Protocol**:
+- CRITICAL-GUIDELINES.md created
+- Pre-flight checklist enforced
+- Cross-agent review (River → Sophie → Mary → User)
+- **Result**: Zero errors in final content
+
+**User Feedback (Mario Semper)**:
+> "I love BMad, but I don't want to repeat the ChatGPT hallucination nightmare. This protocol gives me confidence that agents verify facts before presenting them."
+
+---
+
+## Open Questions for Community
+
+1. **Scope**: Which task types should default to pre-flight?
+2. **Performance**: Is 1-2 minute overhead acceptable for high-risk tasks?
+3. **Configurability**: Per-project, per-agent, or per-task risk settings?
+4. **Tooling**: Should pre-flight be a separate tool or built into agent execution?
+5. **Enforcement**: Optional best practice or mandatory for certain agents?
+
+---
+
+## Next Steps
+
+1. **Community feedback** on protocol design
+2. **Reference implementation** in BMad core
+3. **Agent template updates** to include pre-flight hooks
+4. **Documentation** with examples for common scenarios
+5. **Testing** across different project types
+
+---
+
+## Comparison to Similar Patterns
+
+| Pattern | Focus | When to Use |
+|---------|-------|-------------|
+| **Pre-Flight Protocol** | Safety & accuracy | High-risk external outputs |
+| **Code Review** | Code quality | Before merging code |
+| **QA Gates** | Testing | Before production deployment |
+| **Approval Workflows** | Governance | Multi-stakeholder decisions |
+
+**Pre-Flight Protocol** = "Code review + QA gate" for **agent outputs**.
+
+---
+
+## References
+
+- **Source Project**: tellingCube (https://github.com/masemIT/telling-cube) [if public]
+- **Failure Case**: `docs/bmad-contributions/` (this document)
+- **Implementation**: `docs/marketing/CRITICAL-GUIDELINES.md` (tellingCube)
+
+---
+
+**Contribution ready for review.** This came from painful real-world experience - let's make BMad safer for everyone! 🛡️
--- a/docs/ring-of-fire-sessions.md
+++ b/docs/ring-of-fire-sessions.md
@ -0,0 +1,256 @@
+# BMad Method PR #1: Ring of Fire (ROF) Sessions
+
+**Feature Type**: Core workflow enhancement
+**Status**: Draft for community review
+**Origin**: tellingCube project (masemIT e.U.)
+**Author**: Mario Semper (@sempre)
+**Date**: 2025-11-23
+
+---
+
+## Summary
+
+**Ring of Fire (ROF) Sessions** enable multi-agent collaborative sessions that run in parallel to the user's main workflow, allowing users to delegate complex multi-perspective analysis while continuing other work.
+
+---
+
+## Problem Statement
+
+Current BMad Method requires **sequential agent interaction**. When users need multiple agents to collaborate on a complex topic, they must:
+- Manually orchestrate each agent conversation
+- Stay in the loop for every exchange
+- Wait for sequential responses before proceeding
+- Context-switch constantly between tasks
+
+This creates **bottlenecks** and prevents **parallel work streams**.
+
+---
+
+## Proposed Solution: Ring of Fire Sessions
+
+A new command pattern that enables **scoped multi-agent collaboration sessions** that run while the user continues other work.
+
+### Command Syntax
+
+```bash
+*rof "<topic>" --agents <agent-list> [--report brief|detailed|live]
+```
+
+### Example Usage
+
+```bash
+*rof "API Refactoring Strategy" --agents dev,architect,qa --report brief
+```
+
+**What happens**:
+1. Dev, Architect, and QA agents enter a collaborative session
+2. They analyze the topic together (code review, design discussion, testing concerns)
+3. When agents need tool access (read files, run commands), they request user approval
+4. User continues working on other tasks in parallel
+5. Session ends with consolidated report (brief: just recommendations, detailed: full transcript)
+
+---
+
+## Key Features
+
+### 1. User-Controlled Scope
+- **Small**: 2 agents, 5-minute quick discussion
+- **Large**: 10 agents, 2-hour deep analysis
+- User decides granularity based on complexity
+
+### 2. Approval-Gated Tool Access
+- Agents can **discuss** freely within the session
+- When agents need **tools** (read files, execute commands, make changes), they:
+  - Pause the session
+  - Request user approval
+  - Resume after user decision
+
+**Why**: Maintains user control, prevents runaway agent actions
+
+### 3. Flexible Reporting
+
+| Mode | Description | Use Case |
+|------|-------------|----------|
+| `brief` | Final recommendations only | "Just tell me what to do" |
+| `detailed` | Full transcript + recommendations | "Show me the reasoning" |
+| `live` | Real-time updates as agents discuss | "I want to observe" |
+
+**Default**: `brief` with Q&A available
+
+### 4. Parallel Workflows
+- User works on **Task A** while ROF session tackles **Task B**
+- No context-switching overhead
+- Efficient use of time
+
+---
+
+## Use Cases
+
+### 1. Architecture Reviews
+```bash
+*rof "Evaluate microservices vs monolith for new feature" --agents architect,dev,qa
+```
+**Agents collaborate on**: Design trade-offs, implementation complexity, testing implications
+
+### 2. Code Refactoring
+```bash
+*rof "Refactor authentication module" --agents dev,architect --report detailed
+```
+**Agents collaborate on**: Current code analysis, refactoring approach, migration strategy
+
+### 3. Feature Planning
+```bash
+*rof "Plan user notifications feature" --agents pm,ux,dev --report brief
+```
+**Agents collaborate on**: Requirements, UX flow, technical feasibility, timeline
+
+### 4. Quality Gates
+```bash
+*rof "Investigate test failures in CI/CD" --agents qa,dev --report live
+```
+**Agents collaborate on**: Root cause analysis, fix recommendations, regression prevention
+
+### 5. Documentation Sprints
+```bash
+*rof "Document API endpoints" --agents dev,pm,ux
+```
+**Agents collaborate on**: Technical accuracy, user-friendly examples, completeness
+
+---
+
+## User Experience Flow
+
+```mermaid
+sequenceDiagram
+    User->>River: *rof "Topic" --agents dev,architect
+    River->>Dev: Join ROF session
+    River->>Architect: Join ROF session
+    River->>User: Session started, continue your work
+
+    Dev->>Architect: Discuss approach
+    Architect->>Dev: Suggest alternatives
+
+    Dev->>User: Need to read auth.ts - approve?
+    User->>Dev: Approved
+    Dev->>Architect: After reading file...
+
+    Architect->>Dev: Recommendation
+    Dev->>River: Session complete
+    River->>User: Brief report: [Recommendations]
+```
+
+---
+
+## Implementation Considerations
+
+### Technical Requirements
+- **Session state management**: Track active ROF sessions, participating agents
+- **Agent context sharing**: Agents share knowledge within session scope
+- **User approval workflow**: Clear prompt for tool requests
+- **Report generation**: Brief/detailed/live output formatting
+- **Workflow integration**: Link ROF findings to existing workflow plans/todos
+
+### Open Questions for Community
+
+1. **Integration**: Core BMad feature or plugin/extension?
+2. **Concurrency**: How to handle file conflicts if multiple agents want to edit?
+3. **Cost Model**: Guidance for LLM call budgeting with multiple agents?
+4. **Session Limits**: Recommended max agents/duration?
+5. **Agent Communication**: Free-form discussion or structured turn-taking?
+
+---
+
+## Real-World Validation
+
+**Origin Project**: tellingCube (BI dashboard, masemIT e.U.)
+
+**Validation Scenario**:
+- **Topic**: "Next steps for tellingCube after validation test"
+- **Agents**: River (orchestrator), Mary (analyst), Winston (architect)
+- **Report Mode**: Brief
+- **Outcome**: Successfully analyzed post-validation roadmap with 3 scenarios (GO/CHANGE/NO-GO), delivered consolidated recommendations in 5 minutes
+
+**User Feedback (Mario Semper)**:
+> "This is exactly what I needed - I wanted multiple perspectives without having to orchestrate every conversation. The brief report gave me actionable next steps immediately."
+
+**Documentation**: `docs/_masemIT/readme.md` in tellingCube repository
+
+---
+
+## Proposed Documentation Structure
+
+```
+.bmad-core/
+  features/
+    ring-of-fire.md              # Feature specification
+
+docs/
+  guides/
+    using-rof-sessions.md        # User guide with examples
+
+  architecture/
+    agent-collaboration.md       # Technical design
+    rof-session-management.md    # State handling approach
+```
+
+---
+
+## Benefits
+
+✅ **Unlocks parallel workflows** - User productivity gains
+✅ **Reduces context-switching** - Cognitive load reduction
+✅ **Enables complex analysis** - Multi-perspective insights
+✅ **Maintains user control** - Approval gates for tools
+✅ **Scales flexibly** - From quick checks to deep dives
+
+---
+
+## Comparison to Existing Patterns
+
+| Feature | Standard Agent Use | ROF Session |
+|---------|-------------------|-------------|
+| Agent collaboration | Sequential (one at a time) | Parallel (multiple simultaneously) |
+| User involvement | Required for every exchange | Only for approvals |
+| Parallel work | No (user waits) | Yes (user continues tasks) |
+| Output | Chat transcript | Consolidated report |
+| Use case | Single-perspective tasks | Multi-perspective analysis |
+
+---
+
+## Next Steps
+
+1. **Community feedback** on approach and open questions
+2. **Technical design** refinement (state management, agent communication)
+3. **Prototype implementation** in BMad core or as extension
+4. **Beta testing** with real projects (beyond tellingCube)
+5. **Documentation** completion with examples
+
+---
+
+## Alternatives Considered
+
+### Alt 1: "Breakout Session"
+- **Pros**: Clear meeting metaphor
+- **Cons**: Less evocative, doesn't convey "continuous collaborative space"
+
+### Alt 2: "Agent Huddle"
+- **Pros**: Short, casual
+- **Cons**: Implies quick/informal only
+
+### Alt 3: "Lagerfeuer" (original German name)
+- **Pros**: Warm, campfire metaphor
+- **Cons**: Poor i18n, hard to pronounce/remember for non-German speakers
+
+**Chosen**: **Ring of Fire** - evokes continuous collaboration circle, internationally understood, memorable, shortcut "ROF" works well
+
+---
+
+## References
+
+- **Source Project**: tellingCube (https://github.com/masemIT/telling-cube) [if public]
+- **Documentation**: `docs/_masemIT/readme.md`
+- **Discussion**: [Link to BMad community discussion if applicable]
+
+---
+
+**Contribution ready for review.** Feedback welcome! 🔥