docs: add batch processing architecture rationale

Document the core insight that working entirely on Opus is impractical due to rate limits, and that strategic model tiering (Opus for planning/review, Sonnet for implementation/testing) produces equivalent results with dramatically lower rate limit consumption. Key points: - 90% of tokens go to mechanical tasks suitable for Sonnet - Opus reserved for strategic decisions and pattern detection - Batching amplifies efficiency through context amortization - 3-5x more stories completed per rate limit window Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-16 00:52:56 +05:30 · 2026-01-16 00:52:56 +05:30 · ab579b966b
parent 05ded858bc
commit ab579b966b
1 changed files with 344 additions and 0 deletions
--- a/docs/batch-processing-architecture.md
+++ b/docs/batch-processing-architecture.md
@ -0,0 +1,344 @@
 # Batch Processing Architecture for Rate Limit Efficiency
 > **📣 Request for Feedback**
 >
 > This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on:
 > - Does this model tiering approach resonate with your experience?
 > - Are there edge cases where Sonnet struggles with implementation tasks?
 > - How can we improve the UX for model switching prompts?
 > - Would a formal BBP module be valuable to the community?
 >
 > Please share thoughts in Discord `#suggestions-feedback` or comment on the PR. We're here to collaborate, not prescribe.
 ---
 > **Scope**: This architecture is specifically designed for **Anthropic Claude models** on **Claude Max subscriptions** (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure.
 > **Origin**: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption.
 ## Platform Context: Anthropic Claude Max
 This pattern assumes access to **Claude Code** with a **Claude Max subscription**, which provides:
 | Model | Capability | Rate Limit Behavior |
 |-------|------------|---------------------|
 | **Claude Opus 4** | Highest reasoning, best for complex decisions | Most restrictive limits |
 | **Claude Sonnet 4** | Strong coding, fast execution | More generous limits |
 | **Claude Haiku** | Quick tasks, simple operations | Most generous limits |
 The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity.
 **Note**: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity).
 ## The Problem: Opus Rate Limits in Extended Development Sessions
 Working entirely on Claude Opus for software development quickly becomes impractical:
 - **Rate limits hit fast**: Complex stories with multiple files, tests, and iterations consume tokens rapidly
 - **Context reloading**: Each new conversation requires re-establishing project context
 - **Expensive operations**: Every file read, search, and edit counts against limits
 - **Development stalls**: Hitting rate limits mid-story forces context-switching or waiting
 ## The Solution: Strategic Model Tiering
 The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity:
 ```
 ┌─────────────────────────────────────────────────────────────────┐
 │                    OPUS (Strategic Layer)                        │
 │  • Epic-level story planning (high-stakes decisions)            │
 │  • Cross-story pattern detection during review                  │
 │  • Architecture-impacting decisions                             │
 │  • Final quality gates                                          │
 └─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
 ┌─────────────────────────────────────────────────────────────────┐
 │                   SONNET (Execution Layer)                       │
 │  • Story implementation (bulk of token usage)                   │
 │  • Test writing and execution                                   │
 │  • Targeted rework from review feedback                         │
 │  • File operations and searches                                 │
 └─────────────────────────────────────────────────────────────────┘
 ```
 ## Why This Works: Quality Preservation Through Staging
 ### The Key Insight: Quality Comes From Context, Not Just Model Size
 The common assumption is "bigger model = better code." In practice, **code quality is primarily determined by**:
 1. **Clarity of specifications** - What exactly needs to be built
 2. **Available context** - Project patterns, architecture decisions, existing code
 3. **Scope constraints** - Focused task vs. open-ended exploration
 When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus.
 ### How Staging Preserves Quality
 ```
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                     OPUS PLANNING PHASE                                  │
 │                                                                          │
 │  Input:  PRD, Architecture, Epic requirements (ambiguous, high-level)   │
 │                                                                          │
 │  Opus does the HARD work:                                               │
 │  • Interprets ambiguous requirements                                    │
 │  • Makes architectural judgment calls                                   │
 │  • Resolves conflicting constraints                                     │
 │  • Defines precise acceptance criteria                                  │
 │  • Sequences tasks in optimal order                                     │
 │                                                                          │
 │  Output: Crystal-clear story files with:                                │
 │  • Unambiguous task definitions                                         │
 │  • Specific file paths to modify                                        │
 │  • Exact acceptance criteria                                            │
 │  • Test scenarios spelled out                                           │
 └─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
 ┌─────────────────────────────────────────────────────────────────────────┐
 │                    SONNET EXECUTION PHASE                                │
 │                                                                          │
 │  Input: Crystal-clear story files (all ambiguity resolved)              │
 │                                                                          │
 │  Sonnet does MECHANICAL work:                                           │
 │  • Translates specs to code (no interpretation needed)                  │
 │  • Follows established patterns (already defined)                       │
 │  • Writes tests against clear criteria                                  │
 │  • Makes localized decisions only                                       │
 │                                                                          │
 │  Why Sonnet excels here:                                                │
 │  • Strong coding capabilities (not inferior to Opus for implementation) │
 │  • Faster execution                                                     │
 │  • More generous rate limits                                            │
 │  • Sufficient reasoning for scoped tasks                                │
 └─────────────────────────────────────────────────────────────────────────┘
 ```
 ### Quality Is Front-Loaded, Not Distributed
 Traditional approach (quality spread thin):
 ```
 Story 1: [interpret─────implement─────test─────] → Quality depends on each step
 Story 2: [interpret─────implement─────test─────] → Repeated interpretation risk
 Story 3: [interpret─────implement─────test─────] → Inconsistency accumulates
 ```
 Batched approach (quality concentrated):
 ```
 PLAN:    [interpret ALL stories with full context] → One-time, high-quality
 DEV:     [implement─implement─implement──────────] → Mechanical, consistent
 TEST:    [test──────test──────test───────────────] → Pattern-based
 REVIEW:  [review ALL with cross-story visibility ] → Catches what batch missed
 ```
 **The quality gates are at phase boundaries, not scattered throughout.**
 ### Why Sonnet Doesn't Degrade Quality
 | Task Type | Why Sonnet Matches Opus | Evidence |
 |-----------|------------------------|----------|
 | Code generation | Both models trained extensively on code; Sonnet optimized for speed | Benchmarks show near-parity on coding tasks |
 | Following specs | No interpretation needed when specs are clear | Deterministic translation |
 | Pattern matching | Existing code provides template | Copy-modify-adapt pattern |
 | Test writing | Acceptance criteria explicit | Direct mapping to assertions |
 | Syntax/formatting | Both models equivalent | Mechanical task |
 ### Where Opus Is Still Required
 | Task Type | Why Opus Needed | Stage |
 |-----------|-----------------|-------|
 | Requirement interpretation | Ambiguity resolution requires deeper reasoning | Planning |
 | Architecture decisions | Trade-off analysis, long-term implications | Planning |
 | Cross-story patterns | Seeing connections across multiple changes | Review |
 | Novel problem solving | No existing pattern to follow | Planning |
 | Quality judgment | "Is this good enough?" requires taste | Review |
 ### The Batch Review Multiplier
 Reviewing stories in batch actually **improves** quality over individual review:
 ```
 Individual Review:
 Story 1: [review] → Misses that auth pattern will conflict with Story 3
 Story 2: [review] → Doesn't know Story 1 already solved similar problem
 Story 3: [review] → Unaware of auth pattern from Story 1
 Batch Review:
 Stories 1,2,3: [review together]
  → "Story 1 and 3 both touch auth - ensure consistent approach"
  → "Story 2 duplicates utility from Story 1 - extract shared helper"
  → "All three stories need the same error handling pattern"
 ```
 **Cross-story visibility enables pattern detection impossible in isolated review.**
 ## Token Efficiency: Why Batching Saves Tokens
 ### Dramatic Rate Limit Savings
 Typical story implementation breakdown:
 ```
 File reads/searches:     40% of tokens  → Sonnet
 Code generation:         35% of tokens  → Sonnet
 Test writing:            15% of tokens  → Sonnet
 Planning/decisions:      10% of tokens  → Opus
 ```
 By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work.
 ### Context Amortization: Batching Amplifies Efficiency
 Instead of:
 ```
 Story 1: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
 Story 2: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
 Story 3: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
 ```
 BBP does:
 ```
 Planning Phase:    Plan stories 1,2,3 together     (Opus, warm context)
 Dev Phase:         Implement stories 1,2,3         (Sonnet, parallel-ready)
 Test Phase:        Test stories 1,2,3              (Sonnet, shared fixtures)
 Review Phase:      Review stories 1,2,3 together   (Opus, pattern detection)
 ```
 Benefits:
 - **Context loading amortized** across multiple stories
 - **Pattern recognition** improved by seeing related changes together
 - **Parallelization potential** for independent stories
 ## Pipeline Flow
 ```
 ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
 │  OPUS PLAN   │───▶│  SONNET DEV  │───▶│ SONNET TEST  │───▶│ OPUS REVIEW  │
 │              │    │              │    │              │    │              │
 │ • All stories│    │ • Implement  │    │ • Write tests│    │ • Quality    │
 │   planned    │    │   each story │    │ • Run tests  │    │   gate       │
 │ • Context    │    │ • Warm       │    │ • Fix fails  │    │ • Pattern    │
 │   established│    │   context    │    │              │    │   detection  │
 └──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘
                                                                   │
                                                                   ▼
                                                           ┌──────────────┐
                                                           │SONNET REWORK │
                                                           │ (if needed)  │
                                                           │              │
                                                           │ • Targeted   │
                                                           │   fixes from │
                                                           │   review     │
                                                           └──────────────┘
 ```
 ## Implementation Notes
 ### Warm Context Strategy
 Each phase loads relevant context before execution:
 - **Planning**: PRD, Architecture, Epic definitions
 - **Development**: Story file, project-context.md, related code
 - **Testing**: Story AC, existing test patterns, fixtures
 - **Review**: All changed files, architecture constraints, story requirements
 ### Checkpoint System
 Pipeline state tracked in `_bmad-output/bbp-status.yaml`:
 ```yaml
 epic: 3
 phase: dev
 stories:
  story-3.1:
    status: completed
    dev: done
    test: done
  story-3.2:
    status: in_progress
    dev: done
    test: pending
  story-3.3:
    status: pending
 ```
 Enables:
 - Resume after interruption
 - Skip completed work
 - Parallel execution tracking
 ### When to Use Opus vs Sonnet
 | Task | Model | Reason |
 |------|-------|--------|
 | Story planning | Opus | Requirement interpretation, scope decisions |
 | Code implementation | Sonnet | Mechanical translation of specs to code |
 | Test writing | Sonnet | Pattern-based, well-defined expectations |
 | Code review | Opus | Cross-cutting concerns, architectural judgment |
 | Targeted fixes | Sonnet | Specific, well-scoped corrections |
 | Architecture decisions | Opus | Long-term impact, trade-off analysis |
 | File operations | Sonnet | Token-heavy, mechanical operations |
 ## Results
 In practice, this approach enables:
 - **3-5x more stories** completed per rate limit window
 - **Equivalent quality** on implementation tasks
 - **Better review quality** through batch pattern detection
 - **Sustainable development pace** without rate limit interruptions
 ## Current Limitations: Manual Model Switching
 **Important**: Model switching in Claude Code is currently a manual operation via the `/model` command.
 ```bash
 /model sonnet    # Switch to Sonnet for implementation
 /model opus      # Switch to Opus for planning/review
 ```
 ### UX Challenge (Work in Progress)
 The pipeline workflows need to clearly signal to users when to switch models:
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │  ⚠️  PHASE COMPLETE: Planning finished                      │
 │                                                             │
 │  Next phase: Development (Sonnet recommended)               │
 │                                                             │
 │  To switch models:  /model sonnet                           │
 │  To continue:       /bmad:bbp:workflows:sonnet-dev-batch    │
 └─────────────────────────────────────────────────────────────┘
 ```
 Current approach:
 - Each workflow is named with its intended model (`opus-plan-epic`, `sonnet-dev-batch`)
 - Phase completion messages prompt the user to switch
 - Status workflow shows current phase and recommended model
 **Known improvement needed**: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach.
 Future improvements being explored:
 - Clearer visual cues at phase boundaries (boxed prompts, color hints)
 - Model recommendation prominently displayed in status output
 - Standardized "SWITCH MODEL" callout format across all phase transitions
 - Potential Claude Code hooks for model suggestions
 ### Why Manual Switching is Acceptable
 Despite the manual step, this approach works because:
 1. **Phase boundaries are natural pause points** - good time to review progress anyway
 2. **Explicit control** - users know exactly which model is doing what
 3. **No surprise costs** - transparent about when Opus vs Sonnet is used
 4. **Workflow names are self-documenting** - `sonnet-*` vs `opus-*` prefixes
 ## Future Considerations
 - **Haiku tier**: For even lighter operations (file searches, simple edits)
 - **Adaptive routing**: Dynamic model selection based on task complexity signals
 - **Parallel batches**: Multiple Sonnet sessions for independent stories
 - **Automated model hints**: Claude Code integration for model recommendations
 ---
 *This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.*