18 KiB

Raw Blame History

Batch Processing Architecture for Rate Limit Efficiency

📣 Request for Feedback

This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on:

Does this model tiering approach resonate with your experience?

Are there edge cases where Sonnet struggles with implementation tasks?

How can we improve the UX for model switching prompts?

Would a formal BBP module be valuable to the community?

Please share thoughts in Discord #suggestions-feedback or comment on the PR. We're here to collaborate, not prescribe.

Scope: This architecture is specifically designed for Anthropic Claude models on Claude Max subscriptions (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure.

Origin: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption.

Platform Context: Anthropic Claude Max

This pattern assumes access to Claude Code with a Claude Max subscription, which provides:

Model	Capability	Rate Limit Behavior
Claude Opus 4	Highest reasoning, best for complex decisions	Most restrictive limits
Claude Sonnet 4	Strong coding, fast execution	More generous limits
Claude Haiku	Quick tasks, simple operations	Most generous limits

The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity.

Note: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity).

The Problem: Opus Rate Limits in Extended Development Sessions

Working entirely on Claude Opus for software development quickly becomes impractical:

Rate limits hit fast: Complex stories with multiple files, tests, and iterations consume tokens rapidly
Context reloading: Each new conversation requires re-establishing project context
Expensive operations: Every file read, search, and edit counts against limits
Development stalls: Hitting rate limits mid-story forces context-switching or waiting

The Solution: Strategic Model Tiering

The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity:

┌─────────────────────────────────────────────────────────────────┐
│                    OPUS (Strategic Layer)                        │
│  • Epic-level story planning (high-stakes decisions)            │
│  • Cross-story pattern detection during review                  │
│  • Architecture-impacting decisions                             │
│  • Final quality gates                                          │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   SONNET (Execution Layer)                       │
│  • Story implementation (bulk of token usage)                   │
│  • Test writing and execution                                   │
│  • Targeted rework from review feedback                         │
│  • File operations and searches                                 │
└─────────────────────────────────────────────────────────────────┘

Why This Works: Quality Preservation Through Staging

The Key Insight: Quality Comes From Context, Not Just Model Size

The common assumption is "bigger model = better code." In practice, code quality is primarily determined by:

Clarity of specifications - What exactly needs to be built
Available context - Project patterns, architecture decisions, existing code
Scope constraints - Focused task vs. open-ended exploration

When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus.

How Staging Preserves Quality

┌─────────────────────────────────────────────────────────────────────────┐
│                     OPUS PLANNING PHASE                                  │
│                                                                          │
│  Input:  PRD, Architecture, Epic requirements (ambiguous, high-level)   │
│                                                                          │
│  Opus does the HARD work:                                               │
│  • Interprets ambiguous requirements                                    │
│  • Makes architectural judgment calls                                   │
│  • Resolves conflicting constraints                                     │
│  • Defines precise acceptance criteria                                  │
│  • Sequences tasks in optimal order                                     │
│                                                                          │
│  Output: Crystal-clear story files with:                                │
│  • Unambiguous task definitions                                         │
│  • Specific file paths to modify                                        │
│  • Exact acceptance criteria                                            │
│  • Test scenarios spelled out                                           │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    SONNET EXECUTION PHASE                                │
│                                                                          │
│  Input: Crystal-clear story files (all ambiguity resolved)              │
│                                                                          │
│  Sonnet does MECHANICAL work:                                           │
│  • Translates specs to code (no interpretation needed)                  │
│  • Follows established patterns (already defined)                       │
│  • Writes tests against clear criteria                                  │
│  • Makes localized decisions only                                       │
│                                                                          │
│  Why Sonnet excels here:                                                │
│  • Strong coding capabilities (not inferior to Opus for implementation) │
│  • Faster execution                                                     │
│  • More generous rate limits                                            │
│  • Sufficient reasoning for scoped tasks                                │
└─────────────────────────────────────────────────────────────────────────┘

Quality Is Front-Loaded, Not Distributed

Traditional approach (quality spread thin):

Story 1: [interpret─────implement─────test─────] → Quality depends on each step
Story 2: [interpret─────implement─────test─────] → Repeated interpretation risk
Story 3: [interpret─────implement─────test─────] → Inconsistency accumulates

Batched approach (quality concentrated):

PLAN:    [interpret ALL stories with full context] → One-time, high-quality
DEV:     [implement─implement─implement──────────] → Mechanical, consistent
TEST:    [test──────test──────test───────────────] → Pattern-based
REVIEW:  [review ALL with cross-story visibility ] → Catches what batch missed

The quality gates are at phase boundaries, not scattered throughout.

Why Sonnet Doesn't Degrade Quality

Task Type	Why Sonnet Matches Opus	Evidence
Code generation	Both models trained extensively on code; Sonnet optimized for speed	Benchmarks show near-parity on coding tasks
Following specs	No interpretation needed when specs are clear	Deterministic translation
Pattern matching	Existing code provides template	Copy-modify-adapt pattern
Test writing	Acceptance criteria explicit	Direct mapping to assertions
Syntax/formatting	Both models equivalent	Mechanical task

Where Opus Is Still Required

Task Type	Why Opus Needed	Stage
Requirement interpretation	Ambiguity resolution requires deeper reasoning	Planning
Architecture decisions	Trade-off analysis, long-term implications	Planning
Cross-story patterns	Seeing connections across multiple changes	Review
Novel problem solving	No existing pattern to follow	Planning
Quality judgment	"Is this good enough?" requires taste	Review

The Batch Review Multiplier

Reviewing stories in batch actually improves quality over individual review:

Individual Review:
Story 1: [review] → Misses that auth pattern will conflict with Story 3
Story 2: [review] → Doesn't know Story 1 already solved similar problem
Story 3: [review] → Unaware of auth pattern from Story 1

Batch Review:
Stories 1,2,3: [review together]
  → "Story 1 and 3 both touch auth - ensure consistent approach"
  → "Story 2 duplicates utility from Story 1 - extract shared helper"
  → "All three stories need the same error handling pattern"

Cross-story visibility enables pattern detection impossible in isolated review.

Token Efficiency: Why Batching Saves Tokens

Dramatic Rate Limit Savings

Typical story implementation breakdown:

File reads/searches:     40% of tokens  → Sonnet
Code generation:         35% of tokens  → Sonnet
Test writing:            15% of tokens  → Sonnet
Planning/decisions:      10% of tokens  → Opus

By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work.

Context Amortization: Batching Amplifies Efficiency

Instead of:

Story 1: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
Story 2: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
Story 3: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)

BBP does:

Planning Phase:    Plan stories 1,2,3 together     (Opus, warm context)
Dev Phase:         Implement stories 1,2,3         (Sonnet, parallel-ready)
Test Phase:        Test stories 1,2,3              (Sonnet, shared fixtures)
Review Phase:      Review stories 1,2,3 together   (Opus, pattern detection)

Benefits:

Context loading amortized across multiple stories
Pattern recognition improved by seeing related changes together
Parallelization potential for independent stories

Pipeline Flow

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  OPUS PLAN   │───▶│  SONNET DEV  │───▶│ SONNET TEST  │───▶│ OPUS REVIEW  │
│              │    │              │    │              │    │              │
│ • All stories│    │ • Implement  │    │ • Write tests│    │ • Quality    │
│   planned    │    │   each story │    │ • Run tests  │    │   gate       │
│ • Context    │    │ • Warm       │    │ • Fix fails  │    │ • Pattern    │
│   established│    │   context    │    │              │    │   detection  │
└──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘
                                                                   │
                                                                   ▼
                                                           ┌──────────────┐
                                                           │SONNET REWORK │
                                                           │ (if needed)  │
                                                           │              │
                                                           │ • Targeted   │
                                                           │   fixes from │
                                                           │   review     │
                                                           └──────────────┘

Implementation Notes

Warm Context Strategy

Each phase loads relevant context before execution:

Planning: PRD, Architecture, Epic definitions
Development: Story file, project-context.md, related code
Testing: Story AC, existing test patterns, fixtures
Review: All changed files, architecture constraints, story requirements

Checkpoint System

Pipeline state tracked in _bmad-output/bbp-status.yaml:

epic: 3
phase: dev
stories:
  story-3.1:
    status: completed
    dev: done
    test: done
  story-3.2:
    status: in_progress
    dev: done
    test: pending
  story-3.3:
    status: pending

Enables:

Resume after interruption
Skip completed work
Parallel execution tracking

When to Use Opus vs Sonnet

Task	Model	Reason
Story planning	Opus	Requirement interpretation, scope decisions
Code implementation	Sonnet	Mechanical translation of specs to code
Test writing	Sonnet	Pattern-based, well-defined expectations
Code review	Opus	Cross-cutting concerns, architectural judgment
Targeted fixes	Sonnet	Specific, well-scoped corrections
Architecture decisions	Opus	Long-term impact, trade-off analysis
File operations	Sonnet	Token-heavy, mechanical operations

Results

In practice, this approach enables:

3-5x more stories completed per rate limit window
Equivalent quality on implementation tasks
Better review quality through batch pattern detection
Sustainable development pace without rate limit interruptions

Current Limitations: Manual Model Switching

Important: Model switching in Claude Code is currently a manual operation via the /model command.

/model sonnet    # Switch to Sonnet for implementation
/model opus      # Switch to Opus for planning/review

UX Challenge (Work in Progress)

The pipeline workflows need to clearly signal to users when to switch models:

┌─────────────────────────────────────────────────────────────┐
│  ⚠️  PHASE COMPLETE: Planning finished                      │
│                                                             │
│  Next phase: Development (Sonnet recommended)               │
│                                                             │
│  To switch models:  /model sonnet                           │
│  To continue:       /bmad:bbp:workflows:sonnet-dev-batch    │
└─────────────────────────────────────────────────────────────┘

Current approach:

Each workflow is named with its intended model (opus-plan-epic, sonnet-dev-batch)
Phase completion messages prompt the user to switch
Status workflow shows current phase and recommended model

Known improvement needed: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach.

Future improvements being explored:

Clearer visual cues at phase boundaries (boxed prompts, color hints)
Model recommendation prominently displayed in status output
Standardized "SWITCH MODEL" callout format across all phase transitions
Potential Claude Code hooks for model suggestions

Why Manual Switching is Acceptable

Despite the manual step, this approach works because:

Phase boundaries are natural pause points - good time to review progress anyway
Explicit control - users know exactly which model is doing what
No surprise costs - transparent about when Opus vs Sonnet is used
Workflow names are self-documenting - sonnet-* vs opus-* prefixes

Future Considerations

Haiku tier: For even lighter operations (file searches, simple edits)
Adaptive routing: Dynamic model selection based on task complexity signals
Parallel batches: Multiple Sonnet sessions for independent stories
Automated model hints: Claude Code integration for model recommendations

This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.

18 KiB Raw Blame History