BMAD-METHOD/docs/batch-processing-architectu...

18 KiB

Batch Processing Architecture for Rate Limit Efficiency

📣 Request for Feedback

This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on:

  • Does this model tiering approach resonate with your experience?
  • Are there edge cases where Sonnet struggles with implementation tasks?
  • How can we improve the UX for model switching prompts?
  • Would a formal BBP module be valuable to the community?

Please share thoughts in Discord #suggestions-feedback or comment on the PR. We're here to collaborate, not prescribe.


Scope: This architecture is specifically designed for Anthropic Claude models on Claude Max subscriptions (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure.

Origin: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption.

Platform Context: Anthropic Claude Max

This pattern assumes access to Claude Code with a Claude Max subscription, which provides:

Model Capability Rate Limit Behavior
Claude Opus 4 Highest reasoning, best for complex decisions Most restrictive limits
Claude Sonnet 4 Strong coding, fast execution More generous limits
Claude Haiku Quick tasks, simple operations Most generous limits

The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity.

Note: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity).

The Problem: Opus Rate Limits in Extended Development Sessions

Working entirely on Claude Opus for software development quickly becomes impractical:

  • Rate limits hit fast: Complex stories with multiple files, tests, and iterations consume tokens rapidly
  • Context reloading: Each new conversation requires re-establishing project context
  • Expensive operations: Every file read, search, and edit counts against limits
  • Development stalls: Hitting rate limits mid-story forces context-switching or waiting

The Solution: Strategic Model Tiering

The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity:

┌─────────────────────────────────────────────────────────────────┐
│                    OPUS (Strategic Layer)                        │
│  • Epic-level story planning (high-stakes decisions)            │
│  • Cross-story pattern detection during review                  │
│  • Architecture-impacting decisions                             │
│  • Final quality gates                                          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   SONNET (Execution Layer)                       │
│  • Story implementation (bulk of token usage)                   │
│  • Test writing and execution                                   │
│  • Targeted rework from review feedback                         │
│  • File operations and searches                                 │
└─────────────────────────────────────────────────────────────────┘

Why This Works: Quality Preservation Through Staging

The Key Insight: Quality Comes From Context, Not Just Model Size

The common assumption is "bigger model = better code." In practice, code quality is primarily determined by:

  1. Clarity of specifications - What exactly needs to be built
  2. Available context - Project patterns, architecture decisions, existing code
  3. Scope constraints - Focused task vs. open-ended exploration

When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus.

How Staging Preserves Quality

┌─────────────────────────────────────────────────────────────────────────┐
│                     OPUS PLANNING PHASE                                  │
│                                                                          │
│  Input:  PRD, Architecture, Epic requirements (ambiguous, high-level)   │
│                                                                          │
│  Opus does the HARD work:                                               │
│  • Interprets ambiguous requirements                                    │
│  • Makes architectural judgment calls                                   │
│  • Resolves conflicting constraints                                     │
│  • Defines precise acceptance criteria                                  │
│  • Sequences tasks in optimal order                                     │
│                                                                          │
│  Output: Crystal-clear story files with:                                │
│  • Unambiguous task definitions                                         │
│  • Specific file paths to modify                                        │
│  • Exact acceptance criteria                                            │
│  • Test scenarios spelled out                                           │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    SONNET EXECUTION PHASE                                │
│                                                                          │
│  Input: Crystal-clear story files (all ambiguity resolved)              │
│                                                                          │
│  Sonnet does MECHANICAL work:                                           │
│  • Translates specs to code (no interpretation needed)                  │
│  • Follows established patterns (already defined)                       │
│  • Writes tests against clear criteria                                  │
│  • Makes localized decisions only                                       │
│                                                                          │
│  Why Sonnet excels here:                                                │
│  • Strong coding capabilities (not inferior to Opus for implementation) │
│  • Faster execution                                                     │
│  • More generous rate limits                                            │
│  • Sufficient reasoning for scoped tasks                                │
└─────────────────────────────────────────────────────────────────────────┘

Quality Is Front-Loaded, Not Distributed

Traditional approach (quality spread thin):

Story 1: [interpret─────implement─────test─────] → Quality depends on each step
Story 2: [interpret─────implement─────test─────] → Repeated interpretation risk
Story 3: [interpret─────implement─────test─────] → Inconsistency accumulates

Batched approach (quality concentrated):

PLAN:    [interpret ALL stories with full context] → One-time, high-quality
DEV:     [implement─implement─implement──────────] → Mechanical, consistent
TEST:    [test──────test──────test───────────────] → Pattern-based
REVIEW:  [review ALL with cross-story visibility ] → Catches what batch missed

The quality gates are at phase boundaries, not scattered throughout.

Why Sonnet Doesn't Degrade Quality

Task Type Why Sonnet Matches Opus Evidence
Code generation Both models trained extensively on code; Sonnet optimized for speed Benchmarks show near-parity on coding tasks
Following specs No interpretation needed when specs are clear Deterministic translation
Pattern matching Existing code provides template Copy-modify-adapt pattern
Test writing Acceptance criteria explicit Direct mapping to assertions
Syntax/formatting Both models equivalent Mechanical task

Where Opus Is Still Required

Task Type Why Opus Needed Stage
Requirement interpretation Ambiguity resolution requires deeper reasoning Planning
Architecture decisions Trade-off analysis, long-term implications Planning
Cross-story patterns Seeing connections across multiple changes Review
Novel problem solving No existing pattern to follow Planning
Quality judgment "Is this good enough?" requires taste Review

The Batch Review Multiplier

Reviewing stories in batch actually improves quality over individual review:

Individual Review:
Story 1: [review] → Misses that auth pattern will conflict with Story 3
Story 2: [review] → Doesn't know Story 1 already solved similar problem
Story 3: [review] → Unaware of auth pattern from Story 1

Batch Review:
Stories 1,2,3: [review together]
  → "Story 1 and 3 both touch auth - ensure consistent approach"
  → "Story 2 duplicates utility from Story 1 - extract shared helper"
  → "All three stories need the same error handling pattern"

Cross-story visibility enables pattern detection impossible in isolated review.

Token Efficiency: Why Batching Saves Tokens

Dramatic Rate Limit Savings

Typical story implementation breakdown:

File reads/searches:     40% of tokens  → Sonnet
Code generation:         35% of tokens  → Sonnet
Test writing:            15% of tokens  → Sonnet
Planning/decisions:      10% of tokens  → Opus

By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work.

Context Amortization: Batching Amplifies Efficiency

Instead of:

Story 1: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
Story 2: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
Story 3: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)

BBP does:

Planning Phase:    Plan stories 1,2,3 together     (Opus, warm context)
Dev Phase:         Implement stories 1,2,3         (Sonnet, parallel-ready)
Test Phase:        Test stories 1,2,3              (Sonnet, shared fixtures)
Review Phase:      Review stories 1,2,3 together   (Opus, pattern detection)

Benefits:

  • Context loading amortized across multiple stories
  • Pattern recognition improved by seeing related changes together
  • Parallelization potential for independent stories

Pipeline Flow

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  OPUS PLAN   │───▶│  SONNET DEV  │───▶│ SONNET TEST  │───▶│ OPUS REVIEW  │
│              │    │              │    │              │    │              │
│ • All stories│    │ • Implement  │    │ • Write tests│    │ • Quality    │
│   planned    │    │   each story │    │ • Run tests  │    │   gate       │
│ • Context    │    │ • Warm       │    │ • Fix fails  │    │ • Pattern    │
│   established│    │   context    │    │              │    │   detection  │
└──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘
                                                                   │
                                                                   ▼
                                                           ┌──────────────┐
                                                           │SONNET REWORK │
                                                           │ (if needed)  │
                                                           │              │
                                                           │ • Targeted   │
                                                           │   fixes from │
                                                           │   review     │
                                                           └──────────────┘

Implementation Notes

Warm Context Strategy

Each phase loads relevant context before execution:

  • Planning: PRD, Architecture, Epic definitions
  • Development: Story file, project-context.md, related code
  • Testing: Story AC, existing test patterns, fixtures
  • Review: All changed files, architecture constraints, story requirements

Checkpoint System

Pipeline state tracked in _bmad-output/bbp-status.yaml:

epic: 3
phase: dev
stories:
  story-3.1:
    status: completed
    dev: done
    test: done
  story-3.2:
    status: in_progress
    dev: done
    test: pending
  story-3.3:
    status: pending

Enables:

  • Resume after interruption
  • Skip completed work
  • Parallel execution tracking

When to Use Opus vs Sonnet

Task Model Reason
Story planning Opus Requirement interpretation, scope decisions
Code implementation Sonnet Mechanical translation of specs to code
Test writing Sonnet Pattern-based, well-defined expectations
Code review Opus Cross-cutting concerns, architectural judgment
Targeted fixes Sonnet Specific, well-scoped corrections
Architecture decisions Opus Long-term impact, trade-off analysis
File operations Sonnet Token-heavy, mechanical operations

Results

In practice, this approach enables:

  • 3-5x more stories completed per rate limit window
  • Equivalent quality on implementation tasks
  • Better review quality through batch pattern detection
  • Sustainable development pace without rate limit interruptions

Current Limitations: Manual Model Switching

Important: Model switching in Claude Code is currently a manual operation via the /model command.

/model sonnet    # Switch to Sonnet for implementation
/model opus      # Switch to Opus for planning/review

UX Challenge (Work in Progress)

The pipeline workflows need to clearly signal to users when to switch models:

┌─────────────────────────────────────────────────────────────┐
│  ⚠️  PHASE COMPLETE: Planning finished                      │
│                                                             │
│  Next phase: Development (Sonnet recommended)               │
│                                                             │
│  To switch models:  /model sonnet                           │
│  To continue:       /bmad:bbp:workflows:sonnet-dev-batch    │
└─────────────────────────────────────────────────────────────┘

Current approach:

  • Each workflow is named with its intended model (opus-plan-epic, sonnet-dev-batch)
  • Phase completion messages prompt the user to switch
  • Status workflow shows current phase and recommended model

Known improvement needed: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach.

Future improvements being explored:

  • Clearer visual cues at phase boundaries (boxed prompts, color hints)
  • Model recommendation prominently displayed in status output
  • Standardized "SWITCH MODEL" callout format across all phase transitions
  • Potential Claude Code hooks for model suggestions

Why Manual Switching is Acceptable

Despite the manual step, this approach works because:

  1. Phase boundaries are natural pause points - good time to review progress anyway
  2. Explicit control - users know exactly which model is doing what
  3. No surprise costs - transparent about when Opus vs Sonnet is used
  4. Workflow names are self-documenting - sonnet-* vs opus-* prefixes

Future Considerations

  • Haiku tier: For even lighter operations (file searches, simple edits)
  • Adaptive routing: Dynamic model selection based on task complexity signals
  • Parallel batches: Multiple Sonnet sessions for independent stories
  • Automated model hints: Claude Code integration for model recommendations

This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.