This commit is contained in:
Mario Semper 2025-12-18 09:53:12 +07:00 committed by GitHub
commit 367937769d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 639 additions and 0 deletions

View File

@ -0,0 +1,383 @@
# BMad Method PR #2: Agent Task Pre-Flight Protocol
**Feature Type**: Safety & quality framework
**Status**: Draft for community review
**Origin**: tellingCube project learnings (masemIT e.U.)
**Author**: Mario Semper (@sempre)
**Date**: 2025-11-23
---
## Summary
**Agent Task Pre-Flight Protocol** establishes mandatory safety checks for high-risk agent tasks (marketing, legal, deployment) to prevent factual errors, trademark violations, privacy breaches, and assumption-based mistakes.
---
## Problem Statement
### Real-World Failure Case
**Scenario**: Marketing agent (Sophie) created LinkedIn launch posts for tellingCube without:
- Reading existing project documentation
- Verifying pricing against actual implementation
- Checking trademark compliance rules
- Reviewing privacy guidelines
**Result**: Multiple critical errors:
- ❌ Mentioned user's day job title (privacy/legal risk)
- ❌ Used family member's name (privacy violation)
- ❌ Claimed "60 seconds" generation time (factually wrong)
- ❌ Advertised "€9/month" pricing (doesn't exist - actual: €29-€999 ONE-TIME)
- ❌ Used "IBCS-compliant" (trademark violation - should be "inspired by IBCS©")
**Root Cause**: Agent operated independently without pre-task verification protocol.
---
## Current BMad Behavior (Risky)
```yaml
User: "Sophie, create LinkedIn launch posts"
Sophie:
1. Generates content based on general knowledge
2. Makes assumptions about features/pricing
3. Uses marketing best practices
4. Presents to user
❌ Problem: No verification step before creation
```
---
## Proposed Solution: Pre-Flight Protocol
### Mandatory Checks Before High-Risk Tasks
```yaml
Agent Task Pre-Flight Protocol:
BEFORE executing tasks with external impact:
1. Discover Critical Context
- Search for CRITICAL-GUIDELINES.md or similar
- Read recent related work in project
- Check actual implementation (code, configs, not assumptions)
2. Verify Assumptions
- Pricing: Read Stripe config / pricing components
- Features: Grep codebase for actual capabilities
- Legal/Trademark: Check documented compliance rules
- Privacy: Verify no personal info in public content
3. Cross-Agent Review (for high-risk outputs)
- Orchestrator reviews before user sees
- Fact-checker agent validates claims
- Minimum 2 agents verify before publishing
4. User Approval Gate
- Present content as DRAFT
- Highlight assumptions made
- Get explicit approval before finalizing
```
---
## High-Risk Task Categories
### 1. Marketing & Public Content
**Examples**: LinkedIn posts, press releases, demo videos, website copy
**Pre-Flight Required**:
- [ ] Read `CRITICAL-GUIDELINES.md` (legal, trademark, privacy rules)
- [ ] Verify pricing from actual Stripe/payment config
- [ ] Verify features from actual codebase (not roadmap ideas)
- [ ] Check trademark compliance (e.g., "IBCS©" usage rules)
- [ ] Privacy review (no personal identifiers without consent)
- [ ] Cross-agent fact-check before presenting to user
### 2. Legal & Compliance
**Examples**: Terms of service, privacy policy, license agreements
**Pre-Flight Required**:
- [ ] Read existing legal docs (don't start from scratch)
- [ ] Check jurisdiction-specific requirements
- [ ] Verify against actual product behavior (data handling, cookies, etc.)
- [ ] Legal expert review (human or specialized agent)
- [ ] User final approval required
### 3. Deployment & Infrastructure
**Examples**: Database migrations, production deployments, DNS changes
**Pre-Flight Required**:
- [ ] Read deployment runbooks/checklists
- [ ] Verify current production state
- [ ] Check for breaking changes
- [ ] Backup strategy confirmed
- [ ] Rollback plan documented
- [ ] User explicit approval with understanding of risks
### 4. Financial & Billing
**Examples**: Stripe configuration, pricing changes, refund policies
**Pre-Flight Required**:
- [ ] Read current Stripe dashboard state
- [ ] Verify tax/legal implications
- [ ] Check grandfather clause impacts
- [ ] Financial impact assessment
- [ ] User approval with revenue projections
---
## Implementation Guidelines
### For Agent Developers
**In agent YAML definition**:
```yaml
agent:
name: Sophie
id: marketing
high_risk_tasks: true # Triggers pre-flight protocol
pre_flight:
required_reads:
- docs/marketing/CRITICAL-GUIDELINES.md
- components/landing/PricingSection.tsx
- docs/_masemIT/readme.md
verification_steps:
- Grep for actual pricing tiers in codebase
- Check trademark compliance rules
- Privacy scan (no personal names/details)
cross_check:
agents: [river, mary]
approval_required: true
tasks:
create-linkedin-post:
pre_flight_mandatory: true
approval_gate: user
```
### For Orchestrators (River-like agents)
**Orchestrator responsibilities**:
```python
def execute_high_risk_task(agent, task, user_request):
# Step 1: Pre-flight checks
critical_docs = discover_critical_guidelines()
agent.read(critical_docs)
# Step 2: Agent executes with verification
draft_output = agent.execute_task(task)
# Step 3: Cross-agent review
fact_check_agent = get_agent("mary")
verification = fact_check_agent.verify(draft_output, codebase)
# Step 4: Present as DRAFT to user
if verification.has_issues:
present_issues_to_user(verification.issues)
present_as_draft(draft_output)
# Step 5: User approval gate
approval = get_user_approval()
if approval:
finalize(draft_output)
```
---
## Example: Correct Marketing Flow
### Before (Risky)
```
User: "Create LinkedIn launch posts"
Sophie: [Generates 3 posts with assumptions]
Sophie: "Here are your posts!"
❌ Contains errors user must catch
```
### After (Safe)
```
User: "Create LinkedIn launch posts"
River: "Sophie, this is a high-risk task. Running pre-flight..."
Sophie:
✅ Read CRITICAL-GUIDELINES.md
✅ Read PricingSection.tsx (actual pricing: €29-€999)
✅ Checked IBCS© compliance rules (must say "inspired by")
✅ Privacy check (no "Product Owner", no "brother")
Sophie: [Generates 3 posts with verified facts]
River: "Mary, fact-check Sophie's output..."
Mary:
✅ Pricing correct (€29-€999 lifetime)
✅ No trademark violations ("inspired by IBCS©")
✅ No privacy issues
✅ Generation time accurate ("minutes")
River: "Sempre, here's the DRAFT (pre-flight verified). Approve?"
User: [Reviews, approves]
✅ No errors, factually accurate
```
---
## Critical Guidelines Template
**Every project should have**: `docs/PROJECT-NAME/CRITICAL-GUIDELINES.md`
```markdown
# CRITICAL Guidelines for [Project Name]
## ❌ NEVER MENTION
- Confidential info (list specific items)
- Personal details (family, private life)
- Competitor names (if under NDA)
## ✅ ALWAYS VERIFY
- Pricing: Check [file path]
- Features: Grep [codebase location]
- Legal: Comply with [trademark/license rules]
## Trademark Compliance
- "IBCS©" → Always say "inspired by IBCS©" (not "compliant")
- [Other trademarks...]
## Privacy Rules
- No personal job titles in public content
- No family member names
- [Other privacy rules...]
## Approval Requirements
- Marketing content: River + Mary review
- Legal docs: Legal expert review
- Deployment: User explicit approval
```
---
## Benefits
**Prevents costly mistakes** - Catches errors before they're public
**Protects legal compliance** - Trademark, privacy, licensing
**Ensures factual accuracy** - Features/pricing match reality
**Builds user trust** - Agents don't hallucinate facts
**Scalable safety** - Works across all BMad projects
---
## Tradeoffs & Considerations
### Slower Task Execution
- **Before**: Agent outputs in 30 seconds
- **After**: Pre-flight adds 1-2 minutes
- **Worth it?**: YES for high-risk tasks (marketing, legal, deployment)
### More Agent Coordination
- Requires orchestrator (River) to manage pre-flight
- Cross-agent reviews add complexity
- **Mitigation**: Only for high-risk tasks, not every task
### User Approval Friction
- Adds approval gate before finalization
- **Mitigation**: Present as DRAFT with verification status
- User can fast-track if comfortable
---
## Rollout Strategy
### Phase 1: Opt-In (Recommended)
- Projects mark agents as `high_risk_tasks: true`
- Orchestrators enforce pre-flight for marked agents
- Community feedback on friction/benefits
### Phase 2: Default for Risky Categories
- Marketing, legal, deployment agents default to pre-flight
- Other agents opt-in if needed
### Phase 3: Configurable Per-Task
- Users set risk level per task
- `*create-post --risk high` triggers pre-flight
- `*create-post --risk low` skips for drafts
---
## Real-World Validation
**Origin Project**: tellingCube (masemIT e.U.)
**Failure Scenario**:
- Marketing agent created launch posts without verification
- 5 critical errors caught by user (should have been caught earlier)
- 30 minutes of rework to fix
**After Implementing Protocol**:
- CRITICAL-GUIDELINES.md created
- Pre-flight checklist enforced
- Cross-agent review (River → Sophie → Mary → User)
- **Result**: Zero errors in final content
**User Feedback (Mario Semper)**:
> "I love BMad, but I don't want to repeat the ChatGPT hallucination nightmare. This protocol gives me confidence that agents verify facts before presenting them."
---
## Open Questions for Community
1. **Scope**: Which task types should default to pre-flight?
2. **Performance**: Is 1-2 minute overhead acceptable for high-risk tasks?
3. **Configurability**: Per-project, per-agent, or per-task risk settings?
4. **Tooling**: Should pre-flight be a separate tool or built into agent execution?
5. **Enforcement**: Optional best practice or mandatory for certain agents?
---
## Next Steps
1. **Community feedback** on protocol design
2. **Reference implementation** in BMad core
3. **Agent template updates** to include pre-flight hooks
4. **Documentation** with examples for common scenarios
5. **Testing** across different project types
---
## Comparison to Similar Patterns
| Pattern | Focus | When to Use |
|---------|-------|-------------|
| **Pre-Flight Protocol** | Safety & accuracy | High-risk external outputs |
| **Code Review** | Code quality | Before merging code |
| **QA Gates** | Testing | Before production deployment |
| **Approval Workflows** | Governance | Multi-stakeholder decisions |
**Pre-Flight Protocol** = "Code review + QA gate" for **agent outputs**.
---
## References
- **Source Project**: tellingCube (https://github.com/masemIT/telling-cube) [if public]
- **Failure Case**: `docs/bmad-contributions/` (this document)
- **Implementation**: `docs/marketing/CRITICAL-GUIDELINES.md` (tellingCube)
---
**Contribution ready for review.** This came from painful real-world experience - let's make BMad safer for everyone! 🛡️

View File

@ -0,0 +1,256 @@
# BMad Method PR #1: Ring of Fire (ROF) Sessions
**Feature Type**: Core workflow enhancement
**Status**: Draft for community review
**Origin**: tellingCube project (masemIT e.U.)
**Author**: Mario Semper (@sempre)
**Date**: 2025-11-23
---
## Summary
**Ring of Fire (ROF) Sessions** enable multi-agent collaborative sessions that run in parallel to the user's main workflow, allowing users to delegate complex multi-perspective analysis while continuing other work.
---
## Problem Statement
Current BMad Method requires **sequential agent interaction**. When users need multiple agents to collaborate on a complex topic, they must:
- Manually orchestrate each agent conversation
- Stay in the loop for every exchange
- Wait for sequential responses before proceeding
- Context-switch constantly between tasks
This creates **bottlenecks** and prevents **parallel work streams**.
---
## Proposed Solution: Ring of Fire Sessions
A new command pattern that enables **scoped multi-agent collaboration sessions** that run while the user continues other work.
### Command Syntax
```bash
*rof "<topic>" --agents <agent-list> [--report brief|detailed|live]
```
### Example Usage
```bash
*rof "API Refactoring Strategy" --agents dev,architect,qa --report brief
```
**What happens**:
1. Dev, Architect, and QA agents enter a collaborative session
2. They analyze the topic together (code review, design discussion, testing concerns)
3. When agents need tool access (read files, run commands), they request user approval
4. User continues working on other tasks in parallel
5. Session ends with consolidated report (brief: just recommendations, detailed: full transcript)
---
## Key Features
### 1. User-Controlled Scope
- **Small**: 2 agents, 5-minute quick discussion
- **Large**: 10 agents, 2-hour deep analysis
- User decides granularity based on complexity
### 2. Approval-Gated Tool Access
- Agents can **discuss** freely within the session
- When agents need **tools** (read files, execute commands, make changes), they:
- Pause the session
- Request user approval
- Resume after user decision
**Why**: Maintains user control, prevents runaway agent actions
### 3. Flexible Reporting
| Mode | Description | Use Case |
|------|-------------|----------|
| `brief` | Final recommendations only | "Just tell me what to do" |
| `detailed` | Full transcript + recommendations | "Show me the reasoning" |
| `live` | Real-time updates as agents discuss | "I want to observe" |
**Default**: `brief` with Q&A available
### 4. Parallel Workflows
- User works on **Task A** while ROF session tackles **Task B**
- No context-switching overhead
- Efficient use of time
---
## Use Cases
### 1. Architecture Reviews
```bash
*rof "Evaluate microservices vs monolith for new feature" --agents architect,dev,qa
```
**Agents collaborate on**: Design trade-offs, implementation complexity, testing implications
### 2. Code Refactoring
```bash
*rof "Refactor authentication module" --agents dev,architect --report detailed
```
**Agents collaborate on**: Current code analysis, refactoring approach, migration strategy
### 3. Feature Planning
```bash
*rof "Plan user notifications feature" --agents pm,ux,dev --report brief
```
**Agents collaborate on**: Requirements, UX flow, technical feasibility, timeline
### 4. Quality Gates
```bash
*rof "Investigate test failures in CI/CD" --agents qa,dev --report live
```
**Agents collaborate on**: Root cause analysis, fix recommendations, regression prevention
### 5. Documentation Sprints
```bash
*rof "Document API endpoints" --agents dev,pm,ux
```
**Agents collaborate on**: Technical accuracy, user-friendly examples, completeness
---
## User Experience Flow
```mermaid
sequenceDiagram
User->>River: *rof "Topic" --agents dev,architect
River->>Dev: Join ROF session
River->>Architect: Join ROF session
River->>User: Session started, continue your work
Dev->>Architect: Discuss approach
Architect->>Dev: Suggest alternatives
Dev->>User: Need to read auth.ts - approve?
User->>Dev: Approved
Dev->>Architect: After reading file...
Architect->>Dev: Recommendation
Dev->>River: Session complete
River->>User: Brief report: [Recommendations]
```
---
## Implementation Considerations
### Technical Requirements
- **Session state management**: Track active ROF sessions, participating agents
- **Agent context sharing**: Agents share knowledge within session scope
- **User approval workflow**: Clear prompt for tool requests
- **Report generation**: Brief/detailed/live output formatting
- **Workflow integration**: Link ROF findings to existing workflow plans/todos
### Open Questions for Community
1. **Integration**: Core BMad feature or plugin/extension?
2. **Concurrency**: How to handle file conflicts if multiple agents want to edit?
3. **Cost Model**: Guidance for LLM call budgeting with multiple agents?
4. **Session Limits**: Recommended max agents/duration?
5. **Agent Communication**: Free-form discussion or structured turn-taking?
---
## Real-World Validation
**Origin Project**: tellingCube (BI dashboard, masemIT e.U.)
**Validation Scenario**:
- **Topic**: "Next steps for tellingCube after validation test"
- **Agents**: River (orchestrator), Mary (analyst), Winston (architect)
- **Report Mode**: Brief
- **Outcome**: Successfully analyzed post-validation roadmap with 3 scenarios (GO/CHANGE/NO-GO), delivered consolidated recommendations in 5 minutes
**User Feedback (Mario Semper)**:
> "This is exactly what I needed - I wanted multiple perspectives without having to orchestrate every conversation. The brief report gave me actionable next steps immediately."
**Documentation**: `docs/_masemIT/readme.md` in tellingCube repository
---
## Proposed Documentation Structure
```
.bmad-core/
features/
ring-of-fire.md # Feature specification
docs/
guides/
using-rof-sessions.md # User guide with examples
architecture/
agent-collaboration.md # Technical design
rof-session-management.md # State handling approach
```
---
## Benefits
**Unlocks parallel workflows** - User productivity gains
**Reduces context-switching** - Cognitive load reduction
**Enables complex analysis** - Multi-perspective insights
**Maintains user control** - Approval gates for tools
**Scales flexibly** - From quick checks to deep dives
---
## Comparison to Existing Patterns
| Feature | Standard Agent Use | ROF Session |
|---------|-------------------|-------------|
| Agent collaboration | Sequential (one at a time) | Parallel (multiple simultaneously) |
| User involvement | Required for every exchange | Only for approvals |
| Parallel work | No (user waits) | Yes (user continues tasks) |
| Output | Chat transcript | Consolidated report |
| Use case | Single-perspective tasks | Multi-perspective analysis |
---
## Next Steps
1. **Community feedback** on approach and open questions
2. **Technical design** refinement (state management, agent communication)
3. **Prototype implementation** in BMad core or as extension
4. **Beta testing** with real projects (beyond tellingCube)
5. **Documentation** completion with examples
---
## Alternatives Considered
### Alt 1: "Breakout Session"
- **Pros**: Clear meeting metaphor
- **Cons**: Less evocative, doesn't convey "continuous collaborative space"
### Alt 2: "Agent Huddle"
- **Pros**: Short, casual
- **Cons**: Implies quick/informal only
### Alt 3: "Lagerfeuer" (original German name)
- **Pros**: Warm, campfire metaphor
- **Cons**: Poor i18n, hard to pronounce/remember for non-German speakers
**Chosen**: **Ring of Fire** - evokes continuous collaboration circle, internationally understood, memorable, shortcut "ROF" works well
---
## References
- **Source Project**: tellingCube (https://github.com/masemIT/telling-cube) [if public]
- **Documentation**: `docs/_masemIT/readme.md`
- **Discussion**: [Link to BMad community discussion if applicable]
---
**Contribution ready for review.** Feedback welcome! 🔥