BMAD-METHOD/docs/batch-processing-architectu...

345 lines
18 KiB
Markdown

# Batch Processing Architecture for Rate Limit Efficiency
> **📣 Request for Feedback**
>
> This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on:
> - Does this model tiering approach resonate with your experience?
> - Are there edge cases where Sonnet struggles with implementation tasks?
> - How can we improve the UX for model switching prompts?
> - Would a formal BBP module be valuable to the community?
>
> Please share thoughts in Discord `#suggestions-feedback` or comment on the PR. We're here to collaborate, not prescribe.
---
> **Scope**: This architecture is specifically designed for **Anthropic Claude models** on **Claude Max subscriptions** (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure.
> **Origin**: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption.
## Platform Context: Anthropic Claude Max
This pattern assumes access to **Claude Code** with a **Claude Max subscription**, which provides:
| Model | Capability | Rate Limit Behavior |
|-------|------------|---------------------|
| **Claude Opus 4** | Highest reasoning, best for complex decisions | Most restrictive limits |
| **Claude Sonnet 4** | Strong coding, fast execution | More generous limits |
| **Claude Haiku** | Quick tasks, simple operations | Most generous limits |
The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity.
**Note**: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity).
## The Problem: Opus Rate Limits in Extended Development Sessions
Working entirely on Claude Opus for software development quickly becomes impractical:
- **Rate limits hit fast**: Complex stories with multiple files, tests, and iterations consume tokens rapidly
- **Context reloading**: Each new conversation requires re-establishing project context
- **Expensive operations**: Every file read, search, and edit counts against limits
- **Development stalls**: Hitting rate limits mid-story forces context-switching or waiting
## The Solution: Strategic Model Tiering
The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity:
```
┌─────────────────────────────────────────────────────────────────┐
│ OPUS (Strategic Layer) │
│ • Epic-level story planning (high-stakes decisions) │
│ • Cross-story pattern detection during review │
│ • Architecture-impacting decisions │
│ • Final quality gates │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SONNET (Execution Layer) │
│ • Story implementation (bulk of token usage) │
│ • Test writing and execution │
│ • Targeted rework from review feedback │
│ • File operations and searches │
└─────────────────────────────────────────────────────────────────┘
```
## Why This Works: Quality Preservation Through Staging
### The Key Insight: Quality Comes From Context, Not Just Model Size
The common assumption is "bigger model = better code." In practice, **code quality is primarily determined by**:
1. **Clarity of specifications** - What exactly needs to be built
2. **Available context** - Project patterns, architecture decisions, existing code
3. **Scope constraints** - Focused task vs. open-ended exploration
When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus.
### How Staging Preserves Quality
```
┌─────────────────────────────────────────────────────────────────────────┐
│ OPUS PLANNING PHASE │
│ │
│ Input: PRD, Architecture, Epic requirements (ambiguous, high-level) │
│ │
│ Opus does the HARD work: │
│ • Interprets ambiguous requirements │
│ • Makes architectural judgment calls │
│ • Resolves conflicting constraints │
│ • Defines precise acceptance criteria │
│ • Sequences tasks in optimal order │
│ │
│ Output: Crystal-clear story files with: │
│ • Unambiguous task definitions │
│ • Specific file paths to modify │
│ • Exact acceptance criteria │
│ • Test scenarios spelled out │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ SONNET EXECUTION PHASE │
│ │
│ Input: Crystal-clear story files (all ambiguity resolved) │
│ │
│ Sonnet does MECHANICAL work: │
│ • Translates specs to code (no interpretation needed) │
│ • Follows established patterns (already defined) │
│ • Writes tests against clear criteria │
│ • Makes localized decisions only │
│ │
│ Why Sonnet excels here: │
│ • Strong coding capabilities (not inferior to Opus for implementation) │
│ • Faster execution │
│ • More generous rate limits │
│ • Sufficient reasoning for scoped tasks │
└─────────────────────────────────────────────────────────────────────────┘
```
### Quality Is Front-Loaded, Not Distributed
Traditional approach (quality spread thin):
```
Story 1: [interpret─────implement─────test─────] → Quality depends on each step
Story 2: [interpret─────implement─────test─────] → Repeated interpretation risk
Story 3: [interpret─────implement─────test─────] → Inconsistency accumulates
```
Batched approach (quality concentrated):
```
PLAN: [interpret ALL stories with full context] → One-time, high-quality
DEV: [implement─implement─implement──────────] → Mechanical, consistent
TEST: [test──────test──────test───────────────] → Pattern-based
REVIEW: [review ALL with cross-story visibility ] → Catches what batch missed
```
**The quality gates are at phase boundaries, not scattered throughout.**
### Why Sonnet Doesn't Degrade Quality
| Task Type | Why Sonnet Matches Opus | Evidence |
|-----------|------------------------|----------|
| Code generation | Both models trained extensively on code; Sonnet optimized for speed | Benchmarks show near-parity on coding tasks |
| Following specs | No interpretation needed when specs are clear | Deterministic translation |
| Pattern matching | Existing code provides template | Copy-modify-adapt pattern |
| Test writing | Acceptance criteria explicit | Direct mapping to assertions |
| Syntax/formatting | Both models equivalent | Mechanical task |
### Where Opus Is Still Required
| Task Type | Why Opus Needed | Stage |
|-----------|-----------------|-------|
| Requirement interpretation | Ambiguity resolution requires deeper reasoning | Planning |
| Architecture decisions | Trade-off analysis, long-term implications | Planning |
| Cross-story patterns | Seeing connections across multiple changes | Review |
| Novel problem solving | No existing pattern to follow | Planning |
| Quality judgment | "Is this good enough?" requires taste | Review |
### The Batch Review Multiplier
Reviewing stories in batch actually **improves** quality over individual review:
```
Individual Review:
Story 1: [review] → Misses that auth pattern will conflict with Story 3
Story 2: [review] → Doesn't know Story 1 already solved similar problem
Story 3: [review] → Unaware of auth pattern from Story 1
Batch Review:
Stories 1,2,3: [review together]
→ "Story 1 and 3 both touch auth - ensure consistent approach"
→ "Story 2 duplicates utility from Story 1 - extract shared helper"
→ "All three stories need the same error handling pattern"
```
**Cross-story visibility enables pattern detection impossible in isolated review.**
## Token Efficiency: Why Batching Saves Tokens
### Dramatic Rate Limit Savings
Typical story implementation breakdown:
```
File reads/searches: 40% of tokens → Sonnet
Code generation: 35% of tokens → Sonnet
Test writing: 15% of tokens → Sonnet
Planning/decisions: 10% of tokens → Opus
```
By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work.
### Context Amortization: Batching Amplifies Efficiency
Instead of:
```
Story 1: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
Story 2: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
Story 3: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
```
BBP does:
```
Planning Phase: Plan stories 1,2,3 together (Opus, warm context)
Dev Phase: Implement stories 1,2,3 (Sonnet, parallel-ready)
Test Phase: Test stories 1,2,3 (Sonnet, shared fixtures)
Review Phase: Review stories 1,2,3 together (Opus, pattern detection)
```
Benefits:
- **Context loading amortized** across multiple stories
- **Pattern recognition** improved by seeing related changes together
- **Parallelization potential** for independent stories
## Pipeline Flow
```
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ OPUS PLAN │───▶│ SONNET DEV │───▶│ SONNET TEST │───▶│ OPUS REVIEW │
│ │ │ │ │ │ │ │
│ • All stories│ │ • Implement │ │ • Write tests│ │ • Quality │
│ planned │ │ each story │ │ • Run tests │ │ gate │
│ • Context │ │ • Warm │ │ • Fix fails │ │ • Pattern │
│ established│ │ context │ │ │ │ detection │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
┌──────────────┐
│SONNET REWORK │
│ (if needed) │
│ │
│ • Targeted │
│ fixes from │
│ review │
└──────────────┘
```
## Implementation Notes
### Warm Context Strategy
Each phase loads relevant context before execution:
- **Planning**: PRD, Architecture, Epic definitions
- **Development**: Story file, project-context.md, related code
- **Testing**: Story AC, existing test patterns, fixtures
- **Review**: All changed files, architecture constraints, story requirements
### Checkpoint System
Pipeline state tracked in `_bmad-output/bbp-status.yaml`:
```yaml
epic: 3
phase: dev
stories:
story-3.1:
status: completed
dev: done
test: done
story-3.2:
status: in_progress
dev: done
test: pending
story-3.3:
status: pending
```
Enables:
- Resume after interruption
- Skip completed work
- Parallel execution tracking
### When to Use Opus vs Sonnet
| Task | Model | Reason |
|------|-------|--------|
| Story planning | Opus | Requirement interpretation, scope decisions |
| Code implementation | Sonnet | Mechanical translation of specs to code |
| Test writing | Sonnet | Pattern-based, well-defined expectations |
| Code review | Opus | Cross-cutting concerns, architectural judgment |
| Targeted fixes | Sonnet | Specific, well-scoped corrections |
| Architecture decisions | Opus | Long-term impact, trade-off analysis |
| File operations | Sonnet | Token-heavy, mechanical operations |
## Results
In practice, this approach enables:
- **3-5x more stories** completed per rate limit window
- **Equivalent quality** on implementation tasks
- **Better review quality** through batch pattern detection
- **Sustainable development pace** without rate limit interruptions
## Current Limitations: Manual Model Switching
**Important**: Model switching in Claude Code is currently a manual operation via the `/model` command.
```bash
/model sonnet # Switch to Sonnet for implementation
/model opus # Switch to Opus for planning/review
```
### UX Challenge (Work in Progress)
The pipeline workflows need to clearly signal to users when to switch models:
```
┌─────────────────────────────────────────────────────────────┐
│ ⚠️ PHASE COMPLETE: Planning finished │
│ │
│ Next phase: Development (Sonnet recommended) │
│ │
│ To switch models: /model sonnet │
│ To continue: /bmad:bbp:workflows:sonnet-dev-batch │
└─────────────────────────────────────────────────────────────┘
```
Current approach:
- Each workflow is named with its intended model (`opus-plan-epic`, `sonnet-dev-batch`)
- Phase completion messages prompt the user to switch
- Status workflow shows current phase and recommended model
**Known improvement needed**: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach.
Future improvements being explored:
- Clearer visual cues at phase boundaries (boxed prompts, color hints)
- Model recommendation prominently displayed in status output
- Standardized "SWITCH MODEL" callout format across all phase transitions
- Potential Claude Code hooks for model suggestions
### Why Manual Switching is Acceptable
Despite the manual step, this approach works because:
1. **Phase boundaries are natural pause points** - good time to review progress anyway
2. **Explicit control** - users know exactly which model is doing what
3. **No surprise costs** - transparent about when Opus vs Sonnet is used
4. **Workflow names are self-documenting** - `sonnet-*` vs `opus-*` prefixes
## Future Considerations
- **Haiku tier**: For even lighter operations (file searches, simple edits)
- **Adaptive routing**: Dynamic model selection based on task complexity signals
- **Parallel batches**: Multiple Sonnet sessions for independent stories
- **Automated model hints**: Claude Code integration for model recommendations
---
*This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.*