# Batch Processing Architecture for Rate Limit Efficiency > **πŸ“£ Request for Feedback** > > This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on: > - Does this model tiering approach resonate with your experience? > - Are there edge cases where Sonnet struggles with implementation tasks? > - How can we improve the UX for model switching prompts? > - Would a formal BBP module be valuable to the community? > > Please share thoughts in Discord `#suggestions-feedback` or comment on the PR. We're here to collaborate, not prescribe. --- > **Scope**: This architecture is specifically designed for **Anthropic Claude models** on **Claude Max subscriptions** (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure. > **Origin**: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption. ## Platform Context: Anthropic Claude Max This pattern assumes access to **Claude Code** with a **Claude Max subscription**, which provides: | Model | Capability | Rate Limit Behavior | |-------|------------|---------------------| | **Claude Opus 4** | Highest reasoning, best for complex decisions | Most restrictive limits | | **Claude Sonnet 4** | Strong coding, fast execution | More generous limits | | **Claude Haiku** | Quick tasks, simple operations | Most generous limits | The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity. **Note**: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity). ## The Problem: Opus Rate Limits in Extended Development Sessions Working entirely on Claude Opus for software development quickly becomes impractical: - **Rate limits hit fast**: Complex stories with multiple files, tests, and iterations consume tokens rapidly - **Context reloading**: Each new conversation requires re-establishing project context - **Expensive operations**: Every file read, search, and edit counts against limits - **Development stalls**: Hitting rate limits mid-story forces context-switching or waiting ## The Solution: Strategic Model Tiering The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OPUS (Strategic Layer) β”‚ β”‚ β€’ Epic-level story planning (high-stakes decisions) β”‚ β”‚ β€’ Cross-story pattern detection during review β”‚ β”‚ β€’ Architecture-impacting decisions β”‚ β”‚ β€’ Final quality gates β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SONNET (Execution Layer) β”‚ β”‚ β€’ Story implementation (bulk of token usage) β”‚ β”‚ β€’ Test writing and execution β”‚ β”‚ β€’ Targeted rework from review feedback β”‚ β”‚ β€’ File operations and searches β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Why This Works: Quality Preservation Through Staging ### The Key Insight: Quality Comes From Context, Not Just Model Size The common assumption is "bigger model = better code." In practice, **code quality is primarily determined by**: 1. **Clarity of specifications** - What exactly needs to be built 2. **Available context** - Project patterns, architecture decisions, existing code 3. **Scope constraints** - Focused task vs. open-ended exploration When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus. ### How Staging Preserves Quality ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OPUS PLANNING PHASE β”‚ β”‚ β”‚ β”‚ Input: PRD, Architecture, Epic requirements (ambiguous, high-level) β”‚ β”‚ β”‚ β”‚ Opus does the HARD work: β”‚ β”‚ β€’ Interprets ambiguous requirements β”‚ β”‚ β€’ Makes architectural judgment calls β”‚ β”‚ β€’ Resolves conflicting constraints β”‚ β”‚ β€’ Defines precise acceptance criteria β”‚ β”‚ β€’ Sequences tasks in optimal order β”‚ β”‚ β”‚ β”‚ Output: Crystal-clear story files with: β”‚ β”‚ β€’ Unambiguous task definitions β”‚ β”‚ β€’ Specific file paths to modify β”‚ β”‚ β€’ Exact acceptance criteria β”‚ β”‚ β€’ Test scenarios spelled out β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ SONNET EXECUTION PHASE β”‚ β”‚ β”‚ β”‚ Input: Crystal-clear story files (all ambiguity resolved) β”‚ β”‚ β”‚ β”‚ Sonnet does MECHANICAL work: β”‚ β”‚ β€’ Translates specs to code (no interpretation needed) β”‚ β”‚ β€’ Follows established patterns (already defined) β”‚ β”‚ β€’ Writes tests against clear criteria β”‚ β”‚ β€’ Makes localized decisions only β”‚ β”‚ β”‚ β”‚ Why Sonnet excels here: β”‚ β”‚ β€’ Strong coding capabilities (not inferior to Opus for implementation) β”‚ β”‚ β€’ Faster execution β”‚ β”‚ β€’ More generous rate limits β”‚ β”‚ β€’ Sufficient reasoning for scoped tasks β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ### Quality Is Front-Loaded, Not Distributed Traditional approach (quality spread thin): ``` Story 1: [interpret─────implement─────test─────] β†’ Quality depends on each step Story 2: [interpret─────implement─────test─────] β†’ Repeated interpretation risk Story 3: [interpret─────implement─────test─────] β†’ Inconsistency accumulates ``` Batched approach (quality concentrated): ``` PLAN: [interpret ALL stories with full context] β†’ One-time, high-quality DEV: [implement─implement─implement──────────] β†’ Mechanical, consistent TEST: [test──────test──────test───────────────] β†’ Pattern-based REVIEW: [review ALL with cross-story visibility ] β†’ Catches what batch missed ``` **The quality gates are at phase boundaries, not scattered throughout.** ### Why Sonnet Doesn't Degrade Quality | Task Type | Why Sonnet Matches Opus | Evidence | |-----------|------------------------|----------| | Code generation | Both models trained extensively on code; Sonnet optimized for speed | Benchmarks show near-parity on coding tasks | | Following specs | No interpretation needed when specs are clear | Deterministic translation | | Pattern matching | Existing code provides template | Copy-modify-adapt pattern | | Test writing | Acceptance criteria explicit | Direct mapping to assertions | | Syntax/formatting | Both models equivalent | Mechanical task | ### Where Opus Is Still Required | Task Type | Why Opus Needed | Stage | |-----------|-----------------|-------| | Requirement interpretation | Ambiguity resolution requires deeper reasoning | Planning | | Architecture decisions | Trade-off analysis, long-term implications | Planning | | Cross-story patterns | Seeing connections across multiple changes | Review | | Novel problem solving | No existing pattern to follow | Planning | | Quality judgment | "Is this good enough?" requires taste | Review | ### The Batch Review Multiplier Reviewing stories in batch actually **improves** quality over individual review: ``` Individual Review: Story 1: [review] β†’ Misses that auth pattern will conflict with Story 3 Story 2: [review] β†’ Doesn't know Story 1 already solved similar problem Story 3: [review] β†’ Unaware of auth pattern from Story 1 Batch Review: Stories 1,2,3: [review together] β†’ "Story 1 and 3 both touch auth - ensure consistent approach" β†’ "Story 2 duplicates utility from Story 1 - extract shared helper" β†’ "All three stories need the same error handling pattern" ``` **Cross-story visibility enables pattern detection impossible in isolated review.** ## Token Efficiency: Why Batching Saves Tokens ### Dramatic Rate Limit Savings Typical story implementation breakdown: ``` File reads/searches: 40% of tokens β†’ Sonnet Code generation: 35% of tokens β†’ Sonnet Test writing: 15% of tokens β†’ Sonnet Planning/decisions: 10% of tokens β†’ Opus ``` By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work. ### Context Amortization: Batching Amplifies Efficiency Instead of: ``` Story 1: Plan(Opus) β†’ Implement(Opus) β†’ Test(Opus) β†’ Review(Opus) Story 2: Plan(Opus) β†’ Implement(Opus) β†’ Test(Opus) β†’ Review(Opus) Story 3: Plan(Opus) β†’ Implement(Opus) β†’ Test(Opus) β†’ Review(Opus) ``` BBP does: ``` Planning Phase: Plan stories 1,2,3 together (Opus, warm context) Dev Phase: Implement stories 1,2,3 (Sonnet, parallel-ready) Test Phase: Test stories 1,2,3 (Sonnet, shared fixtures) Review Phase: Review stories 1,2,3 together (Opus, pattern detection) ``` Benefits: - **Context loading amortized** across multiple stories - **Pattern recognition** improved by seeing related changes together - **Parallelization potential** for independent stories ## Pipeline Flow ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ OPUS PLAN │───▢│ SONNET DEV │───▢│ SONNET TEST │───▢│ OPUS REVIEW β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β€’ All storiesβ”‚ β”‚ β€’ Implement β”‚ β”‚ β€’ Write testsβ”‚ β”‚ β€’ Quality β”‚ β”‚ planned β”‚ β”‚ each story β”‚ β”‚ β€’ Run tests β”‚ β”‚ gate β”‚ β”‚ β€’ Context β”‚ β”‚ β€’ Warm β”‚ β”‚ β€’ Fix fails β”‚ β”‚ β€’ Pattern β”‚ β”‚ establishedβ”‚ β”‚ context β”‚ β”‚ β”‚ β”‚ detection β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚SONNET REWORK β”‚ β”‚ (if needed) β”‚ β”‚ β”‚ β”‚ β€’ Targeted β”‚ β”‚ fixes from β”‚ β”‚ review β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Implementation Notes ### Warm Context Strategy Each phase loads relevant context before execution: - **Planning**: PRD, Architecture, Epic definitions - **Development**: Story file, project-context.md, related code - **Testing**: Story AC, existing test patterns, fixtures - **Review**: All changed files, architecture constraints, story requirements ### Checkpoint System Pipeline state tracked in `_bmad-output/bbp-status.yaml`: ```yaml epic: 3 phase: dev stories: story-3.1: status: completed dev: done test: done story-3.2: status: in_progress dev: done test: pending story-3.3: status: pending ``` Enables: - Resume after interruption - Skip completed work - Parallel execution tracking ### When to Use Opus vs Sonnet | Task | Model | Reason | |------|-------|--------| | Story planning | Opus | Requirement interpretation, scope decisions | | Code implementation | Sonnet | Mechanical translation of specs to code | | Test writing | Sonnet | Pattern-based, well-defined expectations | | Code review | Opus | Cross-cutting concerns, architectural judgment | | Targeted fixes | Sonnet | Specific, well-scoped corrections | | Architecture decisions | Opus | Long-term impact, trade-off analysis | | File operations | Sonnet | Token-heavy, mechanical operations | ## Results In practice, this approach enables: - **3-5x more stories** completed per rate limit window - **Equivalent quality** on implementation tasks - **Better review quality** through batch pattern detection - **Sustainable development pace** without rate limit interruptions ## Current Limitations: Manual Model Switching **Important**: Model switching in Claude Code is currently a manual operation via the `/model` command. ```bash /model sonnet # Switch to Sonnet for implementation /model opus # Switch to Opus for planning/review ``` ### UX Challenge (Work in Progress) The pipeline workflows need to clearly signal to users when to switch models: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ⚠️ PHASE COMPLETE: Planning finished β”‚ β”‚ β”‚ β”‚ Next phase: Development (Sonnet recommended) β”‚ β”‚ β”‚ β”‚ To switch models: /model sonnet β”‚ β”‚ To continue: /bmad:bbp:workflows:sonnet-dev-batch β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` Current approach: - Each workflow is named with its intended model (`opus-plan-epic`, `sonnet-dev-batch`) - Phase completion messages prompt the user to switch - Status workflow shows current phase and recommended model **Known improvement needed**: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach. Future improvements being explored: - Clearer visual cues at phase boundaries (boxed prompts, color hints) - Model recommendation prominently displayed in status output - Standardized "SWITCH MODEL" callout format across all phase transitions - Potential Claude Code hooks for model suggestions ### Why Manual Switching is Acceptable Despite the manual step, this approach works because: 1. **Phase boundaries are natural pause points** - good time to review progress anyway 2. **Explicit control** - users know exactly which model is doing what 3. **No surprise costs** - transparent about when Opus vs Sonnet is used 4. **Workflow names are self-documenting** - `sonnet-*` vs `opus-*` prefixes ## Future Considerations - **Haiku tier**: For even lighter operations (file searches, simple edits) - **Adaptive routing**: Dynamic model selection based on task complexity signals - **Parallel batches**: Multiple Sonnet sessions for independent stories - **Automated model hints**: Claude Code integration for model recommendations --- *This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.*