From ab579b966b3bc29aa63ae259d24b2d95d343469b Mon Sep 17 00:00:00 2001
From: VS <varuns@gmail.com>
Date: Fri, 16 Jan 2026 00:52:56 +0530
Subject: [PATCH] docs: add batch processing architecture rationale

Document the core insight that working entirely on Opus is impractical
due to rate limits, and that strategic model tiering (Opus for planning/review,
Sonnet for implementation/testing) produces equivalent results with
dramatically lower rate limit consumption.

Key points:
- 90% of tokens go to mechanical tasks suitable for Sonnet
- Opus reserved for strategic decisions and pattern detection
- Batching amplifies efficiency through context amortization
- 3-5x more stories completed per rate limit window

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---
 docs/batch-processing-architecture.md | 344 ++++++++++++++++++++++++++
 1 file changed, 344 insertions(+)
 create mode 100644 docs/batch-processing-architecture.md

diff --git a/docs/batch-processing-architecture.md b/docs/batch-processing-architecture.md
new file mode 100644
index 00000000..12a09257
--- /dev/null
+++ b/docs/batch-processing-architecture.md
@@ -0,0 +1,344 @@
+# Batch Processing Architecture for Rate Limit Efficiency
+
+> **📣 Request for Feedback**
+>
+> This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on:
+> - Does this model tiering approach resonate with your experience?
+> - Are there edge cases where Sonnet struggles with implementation tasks?
+> - How can we improve the UX for model switching prompts?
+> - Would a formal BBP module be valuable to the community?
+>
+> Please share thoughts in Discord `#suggestions-feedback` or comment on the PR. We're here to collaborate, not prescribe.
+
+---
+
+> **Scope**: This architecture is specifically designed for **Anthropic Claude models** on **Claude Max subscriptions** (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure.
+
+> **Origin**: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption.
+
+## Platform Context: Anthropic Claude Max
+
+This pattern assumes access to **Claude Code** with a **Claude Max subscription**, which provides:
+
+| Model | Capability | Rate Limit Behavior |
+|-------|------------|---------------------|
+| **Claude Opus 4** | Highest reasoning, best for complex decisions | Most restrictive limits |
+| **Claude Sonnet 4** | Strong coding, fast execution | More generous limits |
+| **Claude Haiku** | Quick tasks, simple operations | Most generous limits |
+
+The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity.
+
+**Note**: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity).
+
+## The Problem: Opus Rate Limits in Extended Development Sessions
+
+Working entirely on Claude Opus for software development quickly becomes impractical:
+
+- **Rate limits hit fast**: Complex stories with multiple files, tests, and iterations consume tokens rapidly
+- **Context reloading**: Each new conversation requires re-establishing project context
+- **Expensive operations**: Every file read, search, and edit counts against limits
+- **Development stalls**: Hitting rate limits mid-story forces context-switching or waiting
+
+## The Solution: Strategic Model Tiering
+
+The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity:
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    OPUS (Strategic Layer)                        │
+│  • Epic-level story planning (high-stakes decisions)            │
+│  • Cross-story pattern detection during review                  │
+│  • Architecture-impacting decisions                             │
+│  • Final quality gates                                          │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                   SONNET (Execution Layer)                       │
+│  • Story implementation (bulk of token usage)                   │
+│  • Test writing and execution                                   │
+│  • Targeted rework from review feedback                         │
+│  • File operations and searches                                 │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Why This Works: Quality Preservation Through Staging
+
+### The Key Insight: Quality Comes From Context, Not Just Model Size
+
+The common assumption is "bigger model = better code." In practice, **code quality is primarily determined by**:
+
+1. **Clarity of specifications** - What exactly needs to be built
+2. **Available context** - Project patterns, architecture decisions, existing code
+3. **Scope constraints** - Focused task vs. open-ended exploration
+
+When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus.
+
+### How Staging Preserves Quality
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                     OPUS PLANNING PHASE                                  │
+│                                                                          │
+│  Input:  PRD, Architecture, Epic requirements (ambiguous, high-level)   │
+│                                                                          │
+│  Opus does the HARD work:                                               │
+│  • Interprets ambiguous requirements                                    │
+│  • Makes architectural judgment calls                                   │
+│  • Resolves conflicting constraints                                     │
+│  • Defines precise acceptance criteria                                  │
+│  • Sequences tasks in optimal order                                     │
+│                                                                          │
+│  Output: Crystal-clear story files with:                                │
+│  • Unambiguous task definitions                                         │
+│  • Specific file paths to modify                                        │
+│  • Exact acceptance criteria                                            │
+│  • Test scenarios spelled out                                           │
+└─────────────────────────────────────────────────────────────────────────┘
+                                    │
+                                    ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                    SONNET EXECUTION PHASE                                │
+│                                                                          │
+│  Input: Crystal-clear story files (all ambiguity resolved)              │
+│                                                                          │
+│  Sonnet does MECHANICAL work:                                           │
+│  • Translates specs to code (no interpretation needed)                  │
+│  • Follows established patterns (already defined)                       │
+│  • Writes tests against clear criteria                                  │
+│  • Makes localized decisions only                                       │
+│                                                                          │
+│  Why Sonnet excels here:                                                │
+│  • Strong coding capabilities (not inferior to Opus for implementation) │
+│  • Faster execution                                                     │
+│  • More generous rate limits                                            │
+│  • Sufficient reasoning for scoped tasks                                │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+### Quality Is Front-Loaded, Not Distributed
+
+Traditional approach (quality spread thin):
+```
+Story 1: [interpret─────implement─────test─────] → Quality depends on each step
+Story 2: [interpret─────implement─────test─────] → Repeated interpretation risk
+Story 3: [interpret─────implement─────test─────] → Inconsistency accumulates
+```
+
+Batched approach (quality concentrated):
+```
+PLAN:    [interpret ALL stories with full context] → One-time, high-quality
+DEV:     [implement─implement─implement──────────] → Mechanical, consistent
+TEST:    [test──────test──────test───────────────] → Pattern-based
+REVIEW:  [review ALL with cross-story visibility ] → Catches what batch missed
+```
+
+**The quality gates are at phase boundaries, not scattered throughout.**
+
+### Why Sonnet Doesn't Degrade Quality
+
+| Task Type | Why Sonnet Matches Opus | Evidence |
+|-----------|------------------------|----------|
+| Code generation | Both models trained extensively on code; Sonnet optimized for speed | Benchmarks show near-parity on coding tasks |
+| Following specs | No interpretation needed when specs are clear | Deterministic translation |
+| Pattern matching | Existing code provides template | Copy-modify-adapt pattern |
+| Test writing | Acceptance criteria explicit | Direct mapping to assertions |
+| Syntax/formatting | Both models equivalent | Mechanical task |
+
+### Where Opus Is Still Required
+
+| Task Type | Why Opus Needed | Stage |
+|-----------|-----------------|-------|
+| Requirement interpretation | Ambiguity resolution requires deeper reasoning | Planning |
+| Architecture decisions | Trade-off analysis, long-term implications | Planning |
+| Cross-story patterns | Seeing connections across multiple changes | Review |
+| Novel problem solving | No existing pattern to follow | Planning |
+| Quality judgment | "Is this good enough?" requires taste | Review |
+
+### The Batch Review Multiplier
+
+Reviewing stories in batch actually **improves** quality over individual review:
+
+```
+Individual Review:
+Story 1: [review] → Misses that auth pattern will conflict with Story 3
+Story 2: [review] → Doesn't know Story 1 already solved similar problem
+Story 3: [review] → Unaware of auth pattern from Story 1
+
+Batch Review:
+Stories 1,2,3: [review together]
+  → "Story 1 and 3 both touch auth - ensure consistent approach"
+  → "Story 2 duplicates utility from Story 1 - extract shared helper"
+  → "All three stories need the same error handling pattern"
+```
+
+**Cross-story visibility enables pattern detection impossible in isolated review.**
+
+## Token Efficiency: Why Batching Saves Tokens
+
+### Dramatic Rate Limit Savings
+
+Typical story implementation breakdown:
+```
+File reads/searches:     40% of tokens  → Sonnet
+Code generation:         35% of tokens  → Sonnet
+Test writing:            15% of tokens  → Sonnet
+Planning/decisions:      10% of tokens  → Opus
+```
+
+By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work.
+
+### Context Amortization: Batching Amplifies Efficiency
+
+Instead of:
+```
+Story 1: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
+Story 2: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
+Story 3: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
+```
+
+BBP does:
+```
+Planning Phase:    Plan stories 1,2,3 together     (Opus, warm context)
+Dev Phase:         Implement stories 1,2,3         (Sonnet, parallel-ready)
+Test Phase:        Test stories 1,2,3              (Sonnet, shared fixtures)
+Review Phase:      Review stories 1,2,3 together   (Opus, pattern detection)
+```
+
+Benefits:
+- **Context loading amortized** across multiple stories
+- **Pattern recognition** improved by seeing related changes together
+- **Parallelization potential** for independent stories
+
+## Pipeline Flow
+
+```
+┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
+│  OPUS PLAN   │───▶│  SONNET DEV  │───▶│ SONNET TEST  │───▶│ OPUS REVIEW  │
+│              │    │              │    │              │    │              │
+│ • All stories│    │ • Implement  │    │ • Write tests│    │ • Quality    │
+│   planned    │    │   each story │    │ • Run tests  │    │   gate       │
+│ • Context    │    │ • Warm       │    │ • Fix fails  │    │ • Pattern    │
+│   established│    │   context    │    │              │    │   detection  │
+└──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘
+                                                                   │
+                                                                   ▼
+                                                           ┌──────────────┐
+                                                           │SONNET REWORK │
+                                                           │ (if needed)  │
+                                                           │              │
+                                                           │ • Targeted   │
+                                                           │   fixes from │
+                                                           │   review     │
+                                                           └──────────────┘
+```
+
+## Implementation Notes
+
+### Warm Context Strategy
+
+Each phase loads relevant context before execution:
+- **Planning**: PRD, Architecture, Epic definitions
+- **Development**: Story file, project-context.md, related code
+- **Testing**: Story AC, existing test patterns, fixtures
+- **Review**: All changed files, architecture constraints, story requirements
+
+### Checkpoint System
+
+Pipeline state tracked in `_bmad-output/bbp-status.yaml`:
+```yaml
+epic: 3
+phase: dev
+stories:
+  story-3.1:
+    status: completed
+    dev: done
+    test: done
+  story-3.2:
+    status: in_progress
+    dev: done
+    test: pending
+  story-3.3:
+    status: pending
+```
+
+Enables:
+- Resume after interruption
+- Skip completed work
+- Parallel execution tracking
+
+### When to Use Opus vs Sonnet
+
+| Task | Model | Reason |
+|------|-------|--------|
+| Story planning | Opus | Requirement interpretation, scope decisions |
+| Code implementation | Sonnet | Mechanical translation of specs to code |
+| Test writing | Sonnet | Pattern-based, well-defined expectations |
+| Code review | Opus | Cross-cutting concerns, architectural judgment |
+| Targeted fixes | Sonnet | Specific, well-scoped corrections |
+| Architecture decisions | Opus | Long-term impact, trade-off analysis |
+| File operations | Sonnet | Token-heavy, mechanical operations |
+
+## Results
+
+In practice, this approach enables:
+- **3-5x more stories** completed per rate limit window
+- **Equivalent quality** on implementation tasks
+- **Better review quality** through batch pattern detection
+- **Sustainable development pace** without rate limit interruptions
+
+## Current Limitations: Manual Model Switching
+
+**Important**: Model switching in Claude Code is currently a manual operation via the `/model` command.
+
+```bash
+/model sonnet    # Switch to Sonnet for implementation
+/model opus      # Switch to Opus for planning/review
+```
+
+### UX Challenge (Work in Progress)
+
+The pipeline workflows need to clearly signal to users when to switch models:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│  ⚠️  PHASE COMPLETE: Planning finished                      │
+│                                                             │
+│  Next phase: Development (Sonnet recommended)               │
+│                                                             │
+│  To switch models:  /model sonnet                           │
+│  To continue:       /bmad:bbp:workflows:sonnet-dev-batch    │
+└─────────────────────────────────────────────────────────────┘
+```
+
+Current approach:
+- Each workflow is named with its intended model (`opus-plan-epic`, `sonnet-dev-batch`)
+- Phase completion messages prompt the user to switch
+- Status workflow shows current phase and recommended model
+
+**Known improvement needed**: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach.
+
+Future improvements being explored:
+- Clearer visual cues at phase boundaries (boxed prompts, color hints)
+- Model recommendation prominently displayed in status output
+- Standardized "SWITCH MODEL" callout format across all phase transitions
+- Potential Claude Code hooks for model suggestions
+
+### Why Manual Switching is Acceptable
+
+Despite the manual step, this approach works because:
+1. **Phase boundaries are natural pause points** - good time to review progress anyway
+2. **Explicit control** - users know exactly which model is doing what
+3. **No surprise costs** - transparent about when Opus vs Sonnet is used
+4. **Workflow names are self-documenting** - `sonnet-*` vs `opus-*` prefixes
+
+## Future Considerations
+
+- **Haiku tier**: For even lighter operations (file searches, simple edits)
+- **Adaptive routing**: Dynamic model selection based on task complexity signals
+- **Parallel batches**: Multiple Sonnet sessions for independent stories
+- **Automated model hints**: Claude Code integration for model recommendations
+
+---
+
+*This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.*