docs: add batch processing architecture rationale
Document the core insight that working entirely on Opus is impractical due to rate limits, and that strategic model tiering (Opus for planning/review, Sonnet for implementation/testing) produces equivalent results with dramatically lower rate limit consumption. Key points: - 90% of tokens go to mechanical tasks suitable for Sonnet - Opus reserved for strategic decisions and pattern detection - Batching amplifies efficiency through context amortization - 3-5x more stories completed per rate limit window Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
05ded858bc
commit
ab579b966b
|
|
@ -0,0 +1,344 @@
|
||||||
|
# Batch Processing Architecture for Rate Limit Efficiency
|
||||||
|
|
||||||
|
> **📣 Request for Feedback**
|
||||||
|
>
|
||||||
|
> This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on:
|
||||||
|
> - Does this model tiering approach resonate with your experience?
|
||||||
|
> - Are there edge cases where Sonnet struggles with implementation tasks?
|
||||||
|
> - How can we improve the UX for model switching prompts?
|
||||||
|
> - Would a formal BBP module be valuable to the community?
|
||||||
|
>
|
||||||
|
> Please share thoughts in Discord `#suggestions-feedback` or comment on the PR. We're here to collaborate, not prescribe.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
> **Scope**: This architecture is specifically designed for **Anthropic Claude models** on **Claude Max subscriptions** (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure.
|
||||||
|
|
||||||
|
> **Origin**: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption.
|
||||||
|
|
||||||
|
## Platform Context: Anthropic Claude Max
|
||||||
|
|
||||||
|
This pattern assumes access to **Claude Code** with a **Claude Max subscription**, which provides:
|
||||||
|
|
||||||
|
| Model | Capability | Rate Limit Behavior |
|
||||||
|
|-------|------------|---------------------|
|
||||||
|
| **Claude Opus 4** | Highest reasoning, best for complex decisions | Most restrictive limits |
|
||||||
|
| **Claude Sonnet 4** | Strong coding, fast execution | More generous limits |
|
||||||
|
| **Claude Haiku** | Quick tasks, simple operations | Most generous limits |
|
||||||
|
|
||||||
|
The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity.
|
||||||
|
|
||||||
|
**Note**: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity).
|
||||||
|
|
||||||
|
## The Problem: Opus Rate Limits in Extended Development Sessions
|
||||||
|
|
||||||
|
Working entirely on Claude Opus for software development quickly becomes impractical:
|
||||||
|
|
||||||
|
- **Rate limits hit fast**: Complex stories with multiple files, tests, and iterations consume tokens rapidly
|
||||||
|
- **Context reloading**: Each new conversation requires re-establishing project context
|
||||||
|
- **Expensive operations**: Every file read, search, and edit counts against limits
|
||||||
|
- **Development stalls**: Hitting rate limits mid-story forces context-switching or waiting
|
||||||
|
|
||||||
|
## The Solution: Strategic Model Tiering
|
||||||
|
|
||||||
|
The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ OPUS (Strategic Layer) │
|
||||||
|
│ • Epic-level story planning (high-stakes decisions) │
|
||||||
|
│ • Cross-story pattern detection during review │
|
||||||
|
│ • Architecture-impacting decisions │
|
||||||
|
│ • Final quality gates │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ SONNET (Execution Layer) │
|
||||||
|
│ • Story implementation (bulk of token usage) │
|
||||||
|
│ • Test writing and execution │
|
||||||
|
│ • Targeted rework from review feedback │
|
||||||
|
│ • File operations and searches │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why This Works: Quality Preservation Through Staging
|
||||||
|
|
||||||
|
### The Key Insight: Quality Comes From Context, Not Just Model Size
|
||||||
|
|
||||||
|
The common assumption is "bigger model = better code." In practice, **code quality is primarily determined by**:
|
||||||
|
|
||||||
|
1. **Clarity of specifications** - What exactly needs to be built
|
||||||
|
2. **Available context** - Project patterns, architecture decisions, existing code
|
||||||
|
3. **Scope constraints** - Focused task vs. open-ended exploration
|
||||||
|
|
||||||
|
When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus.
|
||||||
|
|
||||||
|
### How Staging Preserves Quality
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ OPUS PLANNING PHASE │
|
||||||
|
│ │
|
||||||
|
│ Input: PRD, Architecture, Epic requirements (ambiguous, high-level) │
|
||||||
|
│ │
|
||||||
|
│ Opus does the HARD work: │
|
||||||
|
│ • Interprets ambiguous requirements │
|
||||||
|
│ • Makes architectural judgment calls │
|
||||||
|
│ • Resolves conflicting constraints │
|
||||||
|
│ • Defines precise acceptance criteria │
|
||||||
|
│ • Sequences tasks in optimal order │
|
||||||
|
│ │
|
||||||
|
│ Output: Crystal-clear story files with: │
|
||||||
|
│ • Unambiguous task definitions │
|
||||||
|
│ • Specific file paths to modify │
|
||||||
|
│ • Exact acceptance criteria │
|
||||||
|
│ • Test scenarios spelled out │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ SONNET EXECUTION PHASE │
|
||||||
|
│ │
|
||||||
|
│ Input: Crystal-clear story files (all ambiguity resolved) │
|
||||||
|
│ │
|
||||||
|
│ Sonnet does MECHANICAL work: │
|
||||||
|
│ • Translates specs to code (no interpretation needed) │
|
||||||
|
│ • Follows established patterns (already defined) │
|
||||||
|
│ • Writes tests against clear criteria │
|
||||||
|
│ • Makes localized decisions only │
|
||||||
|
│ │
|
||||||
|
│ Why Sonnet excels here: │
|
||||||
|
│ • Strong coding capabilities (not inferior to Opus for implementation) │
|
||||||
|
│ • Faster execution │
|
||||||
|
│ • More generous rate limits │
|
||||||
|
│ • Sufficient reasoning for scoped tasks │
|
||||||
|
└─────────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Quality Is Front-Loaded, Not Distributed
|
||||||
|
|
||||||
|
Traditional approach (quality spread thin):
|
||||||
|
```
|
||||||
|
Story 1: [interpret─────implement─────test─────] → Quality depends on each step
|
||||||
|
Story 2: [interpret─────implement─────test─────] → Repeated interpretation risk
|
||||||
|
Story 3: [interpret─────implement─────test─────] → Inconsistency accumulates
|
||||||
|
```
|
||||||
|
|
||||||
|
Batched approach (quality concentrated):
|
||||||
|
```
|
||||||
|
PLAN: [interpret ALL stories with full context] → One-time, high-quality
|
||||||
|
DEV: [implement─implement─implement──────────] → Mechanical, consistent
|
||||||
|
TEST: [test──────test──────test───────────────] → Pattern-based
|
||||||
|
REVIEW: [review ALL with cross-story visibility ] → Catches what batch missed
|
||||||
|
```
|
||||||
|
|
||||||
|
**The quality gates are at phase boundaries, not scattered throughout.**
|
||||||
|
|
||||||
|
### Why Sonnet Doesn't Degrade Quality
|
||||||
|
|
||||||
|
| Task Type | Why Sonnet Matches Opus | Evidence |
|
||||||
|
|-----------|------------------------|----------|
|
||||||
|
| Code generation | Both models trained extensively on code; Sonnet optimized for speed | Benchmarks show near-parity on coding tasks |
|
||||||
|
| Following specs | No interpretation needed when specs are clear | Deterministic translation |
|
||||||
|
| Pattern matching | Existing code provides template | Copy-modify-adapt pattern |
|
||||||
|
| Test writing | Acceptance criteria explicit | Direct mapping to assertions |
|
||||||
|
| Syntax/formatting | Both models equivalent | Mechanical task |
|
||||||
|
|
||||||
|
### Where Opus Is Still Required
|
||||||
|
|
||||||
|
| Task Type | Why Opus Needed | Stage |
|
||||||
|
|-----------|-----------------|-------|
|
||||||
|
| Requirement interpretation | Ambiguity resolution requires deeper reasoning | Planning |
|
||||||
|
| Architecture decisions | Trade-off analysis, long-term implications | Planning |
|
||||||
|
| Cross-story patterns | Seeing connections across multiple changes | Review |
|
||||||
|
| Novel problem solving | No existing pattern to follow | Planning |
|
||||||
|
| Quality judgment | "Is this good enough?" requires taste | Review |
|
||||||
|
|
||||||
|
### The Batch Review Multiplier
|
||||||
|
|
||||||
|
Reviewing stories in batch actually **improves** quality over individual review:
|
||||||
|
|
||||||
|
```
|
||||||
|
Individual Review:
|
||||||
|
Story 1: [review] → Misses that auth pattern will conflict with Story 3
|
||||||
|
Story 2: [review] → Doesn't know Story 1 already solved similar problem
|
||||||
|
Story 3: [review] → Unaware of auth pattern from Story 1
|
||||||
|
|
||||||
|
Batch Review:
|
||||||
|
Stories 1,2,3: [review together]
|
||||||
|
→ "Story 1 and 3 both touch auth - ensure consistent approach"
|
||||||
|
→ "Story 2 duplicates utility from Story 1 - extract shared helper"
|
||||||
|
→ "All three stories need the same error handling pattern"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cross-story visibility enables pattern detection impossible in isolated review.**
|
||||||
|
|
||||||
|
## Token Efficiency: Why Batching Saves Tokens
|
||||||
|
|
||||||
|
### Dramatic Rate Limit Savings
|
||||||
|
|
||||||
|
Typical story implementation breakdown:
|
||||||
|
```
|
||||||
|
File reads/searches: 40% of tokens → Sonnet
|
||||||
|
Code generation: 35% of tokens → Sonnet
|
||||||
|
Test writing: 15% of tokens → Sonnet
|
||||||
|
Planning/decisions: 10% of tokens → Opus
|
||||||
|
```
|
||||||
|
|
||||||
|
By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work.
|
||||||
|
|
||||||
|
### Context Amortization: Batching Amplifies Efficiency
|
||||||
|
|
||||||
|
Instead of:
|
||||||
|
```
|
||||||
|
Story 1: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
|
||||||
|
Story 2: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
|
||||||
|
Story 3: Plan(Opus) → Implement(Opus) → Test(Opus) → Review(Opus)
|
||||||
|
```
|
||||||
|
|
||||||
|
BBP does:
|
||||||
|
```
|
||||||
|
Planning Phase: Plan stories 1,2,3 together (Opus, warm context)
|
||||||
|
Dev Phase: Implement stories 1,2,3 (Sonnet, parallel-ready)
|
||||||
|
Test Phase: Test stories 1,2,3 (Sonnet, shared fixtures)
|
||||||
|
Review Phase: Review stories 1,2,3 together (Opus, pattern detection)
|
||||||
|
```
|
||||||
|
|
||||||
|
Benefits:
|
||||||
|
- **Context loading amortized** across multiple stories
|
||||||
|
- **Pattern recognition** improved by seeing related changes together
|
||||||
|
- **Parallelization potential** for independent stories
|
||||||
|
|
||||||
|
## Pipeline Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||||
|
│ OPUS PLAN │───▶│ SONNET DEV │───▶│ SONNET TEST │───▶│ OPUS REVIEW │
|
||||||
|
│ │ │ │ │ │ │ │
|
||||||
|
│ • All stories│ │ • Implement │ │ • Write tests│ │ • Quality │
|
||||||
|
│ planned │ │ each story │ │ • Run tests │ │ gate │
|
||||||
|
│ • Context │ │ • Warm │ │ • Fix fails │ │ • Pattern │
|
||||||
|
│ established│ │ context │ │ │ │ detection │
|
||||||
|
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌──────────────┐
|
||||||
|
│SONNET REWORK │
|
||||||
|
│ (if needed) │
|
||||||
|
│ │
|
||||||
|
│ • Targeted │
|
||||||
|
│ fixes from │
|
||||||
|
│ review │
|
||||||
|
└──────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementation Notes
|
||||||
|
|
||||||
|
### Warm Context Strategy
|
||||||
|
|
||||||
|
Each phase loads relevant context before execution:
|
||||||
|
- **Planning**: PRD, Architecture, Epic definitions
|
||||||
|
- **Development**: Story file, project-context.md, related code
|
||||||
|
- **Testing**: Story AC, existing test patterns, fixtures
|
||||||
|
- **Review**: All changed files, architecture constraints, story requirements
|
||||||
|
|
||||||
|
### Checkpoint System
|
||||||
|
|
||||||
|
Pipeline state tracked in `_bmad-output/bbp-status.yaml`:
|
||||||
|
```yaml
|
||||||
|
epic: 3
|
||||||
|
phase: dev
|
||||||
|
stories:
|
||||||
|
story-3.1:
|
||||||
|
status: completed
|
||||||
|
dev: done
|
||||||
|
test: done
|
||||||
|
story-3.2:
|
||||||
|
status: in_progress
|
||||||
|
dev: done
|
||||||
|
test: pending
|
||||||
|
story-3.3:
|
||||||
|
status: pending
|
||||||
|
```
|
||||||
|
|
||||||
|
Enables:
|
||||||
|
- Resume after interruption
|
||||||
|
- Skip completed work
|
||||||
|
- Parallel execution tracking
|
||||||
|
|
||||||
|
### When to Use Opus vs Sonnet
|
||||||
|
|
||||||
|
| Task | Model | Reason |
|
||||||
|
|------|-------|--------|
|
||||||
|
| Story planning | Opus | Requirement interpretation, scope decisions |
|
||||||
|
| Code implementation | Sonnet | Mechanical translation of specs to code |
|
||||||
|
| Test writing | Sonnet | Pattern-based, well-defined expectations |
|
||||||
|
| Code review | Opus | Cross-cutting concerns, architectural judgment |
|
||||||
|
| Targeted fixes | Sonnet | Specific, well-scoped corrections |
|
||||||
|
| Architecture decisions | Opus | Long-term impact, trade-off analysis |
|
||||||
|
| File operations | Sonnet | Token-heavy, mechanical operations |
|
||||||
|
|
||||||
|
## Results
|
||||||
|
|
||||||
|
In practice, this approach enables:
|
||||||
|
- **3-5x more stories** completed per rate limit window
|
||||||
|
- **Equivalent quality** on implementation tasks
|
||||||
|
- **Better review quality** through batch pattern detection
|
||||||
|
- **Sustainable development pace** without rate limit interruptions
|
||||||
|
|
||||||
|
## Current Limitations: Manual Model Switching
|
||||||
|
|
||||||
|
**Important**: Model switching in Claude Code is currently a manual operation via the `/model` command.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
/model sonnet # Switch to Sonnet for implementation
|
||||||
|
/model opus # Switch to Opus for planning/review
|
||||||
|
```
|
||||||
|
|
||||||
|
### UX Challenge (Work in Progress)
|
||||||
|
|
||||||
|
The pipeline workflows need to clearly signal to users when to switch models:
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ ⚠️ PHASE COMPLETE: Planning finished │
|
||||||
|
│ │
|
||||||
|
│ Next phase: Development (Sonnet recommended) │
|
||||||
|
│ │
|
||||||
|
│ To switch models: /model sonnet │
|
||||||
|
│ To continue: /bmad:bbp:workflows:sonnet-dev-batch │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
Current approach:
|
||||||
|
- Each workflow is named with its intended model (`opus-plan-epic`, `sonnet-dev-batch`)
|
||||||
|
- Phase completion messages prompt the user to switch
|
||||||
|
- Status workflow shows current phase and recommended model
|
||||||
|
|
||||||
|
**Known improvement needed**: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach.
|
||||||
|
|
||||||
|
Future improvements being explored:
|
||||||
|
- Clearer visual cues at phase boundaries (boxed prompts, color hints)
|
||||||
|
- Model recommendation prominently displayed in status output
|
||||||
|
- Standardized "SWITCH MODEL" callout format across all phase transitions
|
||||||
|
- Potential Claude Code hooks for model suggestions
|
||||||
|
|
||||||
|
### Why Manual Switching is Acceptable
|
||||||
|
|
||||||
|
Despite the manual step, this approach works because:
|
||||||
|
1. **Phase boundaries are natural pause points** - good time to review progress anyway
|
||||||
|
2. **Explicit control** - users know exactly which model is doing what
|
||||||
|
3. **No surprise costs** - transparent about when Opus vs Sonnet is used
|
||||||
|
4. **Workflow names are self-documenting** - `sonnet-*` vs `opus-*` prefixes
|
||||||
|
|
||||||
|
## Future Considerations
|
||||||
|
|
||||||
|
- **Haiku tier**: For even lighter operations (file searches, simple edits)
|
||||||
|
- **Adaptive routing**: Dynamic model selection based on task complexity signals
|
||||||
|
- **Parallel batches**: Multiple Sonnet sessions for independent stories
|
||||||
|
- **Automated model hints**: Claude Code integration for model recommendations
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.*
|
||||||
Loading…
Reference in New Issue