From ab579b966b3bc29aa63ae259d24b2d95d343469b Mon Sep 17 00:00:00 2001 From: VS Date: Fri, 16 Jan 2026 00:52:56 +0530 Subject: [PATCH] docs: add batch processing architecture rationale Document the core insight that working entirely on Opus is impractical due to rate limits, and that strategic model tiering (Opus for planning/review, Sonnet for implementation/testing) produces equivalent results with dramatically lower rate limit consumption. Key points: - 90% of tokens go to mechanical tasks suitable for Sonnet - Opus reserved for strategic decisions and pattern detection - Batching amplifies efficiency through context amortization - 3-5x more stories completed per rate limit window Co-Authored-By: Claude Sonnet 4.5 --- docs/batch-processing-architecture.md | 344 ++++++++++++++++++++++++++ 1 file changed, 344 insertions(+) create mode 100644 docs/batch-processing-architecture.md diff --git a/docs/batch-processing-architecture.md b/docs/batch-processing-architecture.md new file mode 100644 index 00000000..12a09257 --- /dev/null +++ b/docs/batch-processing-architecture.md @@ -0,0 +1,344 @@ +# Batch Processing Architecture for Rate Limit Efficiency + +> **πŸ“£ Request for Feedback** +> +> This is an alpha-stage contribution and we welcome community input. We've tested this pattern with our team but would love feedback on: +> - Does this model tiering approach resonate with your experience? +> - Are there edge cases where Sonnet struggles with implementation tasks? +> - How can we improve the UX for model switching prompts? +> - Would a formal BBP module be valuable to the community? +> +> Please share thoughts in Discord `#suggestions-feedback` or comment on the PR. We're here to collaborate, not prescribe. + +--- + +> **Scope**: This architecture is specifically designed for **Anthropic Claude models** on **Claude Max subscriptions** (or similar tiered access plans). The rate limit characteristics, model capabilities, and tiering strategy are based on Anthropic's current offering structure. + +> **Origin**: These constraints were discovered when attempting to scale the BMAD method across a development team. Individual developers hitting Opus rate limits mid-sprint created bottlenecks that rippled through the entire team's velocity. The model tiering pattern emerged as a practical solution to sustain team-wide BMAD adoption. + +## Platform Context: Anthropic Claude Max + +This pattern assumes access to **Claude Code** with a **Claude Max subscription**, which provides: + +| Model | Capability | Rate Limit Behavior | +|-------|------------|---------------------| +| **Claude Opus 4** | Highest reasoning, best for complex decisions | Most restrictive limits | +| **Claude Sonnet 4** | Strong coding, fast execution | More generous limits | +| **Claude Haiku** | Quick tasks, simple operations | Most generous limits | + +The rate limit disparity between Opus and Sonnet is significant enough that strategic model selection dramatically impacts sustainable throughput. This architecture exploits that disparity. + +**Note**: If using API access with pay-per-token pricing instead of subscription limits, the economics differ but the capability-matching principle still applies (use the right model for the task complexity). + +## The Problem: Opus Rate Limits in Extended Development Sessions + +Working entirely on Claude Opus for software development quickly becomes impractical: + +- **Rate limits hit fast**: Complex stories with multiple files, tests, and iterations consume tokens rapidly +- **Context reloading**: Each new conversation requires re-establishing project context +- **Expensive operations**: Every file read, search, and edit counts against limits +- **Development stalls**: Hitting rate limits mid-story forces context-switching or waiting + +## The Solution: Strategic Model Tiering + +The Batch-Based Pipeline (BBP) approach distributes work across model tiers based on task complexity: + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ OPUS (Strategic Layer) β”‚ +β”‚ β€’ Epic-level story planning (high-stakes decisions) β”‚ +β”‚ β€’ Cross-story pattern detection during review β”‚ +β”‚ β€’ Architecture-impacting decisions β”‚ +β”‚ β€’ Final quality gates β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ SONNET (Execution Layer) β”‚ +β”‚ β€’ Story implementation (bulk of token usage) β”‚ +β”‚ β€’ Test writing and execution β”‚ +β”‚ β€’ Targeted rework from review feedback β”‚ +β”‚ β€’ File operations and searches β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Why This Works: Quality Preservation Through Staging + +### The Key Insight: Quality Comes From Context, Not Just Model Size + +The common assumption is "bigger model = better code." In practice, **code quality is primarily determined by**: + +1. **Clarity of specifications** - What exactly needs to be built +2. **Available context** - Project patterns, architecture decisions, existing code +3. **Scope constraints** - Focused task vs. open-ended exploration + +When these three factors are optimized, Sonnet produces implementation-equivalent results to Opus. + +### How Staging Preserves Quality + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ OPUS PLANNING PHASE β”‚ +β”‚ β”‚ +β”‚ Input: PRD, Architecture, Epic requirements (ambiguous, high-level) β”‚ +β”‚ β”‚ +β”‚ Opus does the HARD work: β”‚ +β”‚ β€’ Interprets ambiguous requirements β”‚ +β”‚ β€’ Makes architectural judgment calls β”‚ +β”‚ β€’ Resolves conflicting constraints β”‚ +β”‚ β€’ Defines precise acceptance criteria β”‚ +β”‚ β€’ Sequences tasks in optimal order β”‚ +β”‚ β”‚ +β”‚ Output: Crystal-clear story files with: β”‚ +β”‚ β€’ Unambiguous task definitions β”‚ +β”‚ β€’ Specific file paths to modify β”‚ +β”‚ β€’ Exact acceptance criteria β”‚ +β”‚ β€’ Test scenarios spelled out β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ SONNET EXECUTION PHASE β”‚ +β”‚ β”‚ +β”‚ Input: Crystal-clear story files (all ambiguity resolved) β”‚ +β”‚ β”‚ +β”‚ Sonnet does MECHANICAL work: β”‚ +β”‚ β€’ Translates specs to code (no interpretation needed) β”‚ +β”‚ β€’ Follows established patterns (already defined) β”‚ +β”‚ β€’ Writes tests against clear criteria β”‚ +β”‚ β€’ Makes localized decisions only β”‚ +β”‚ β”‚ +β”‚ Why Sonnet excels here: β”‚ +β”‚ β€’ Strong coding capabilities (not inferior to Opus for implementation) β”‚ +β”‚ β€’ Faster execution β”‚ +β”‚ β€’ More generous rate limits β”‚ +β”‚ β€’ Sufficient reasoning for scoped tasks β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +### Quality Is Front-Loaded, Not Distributed + +Traditional approach (quality spread thin): +``` +Story 1: [interpret─────implement─────test─────] β†’ Quality depends on each step +Story 2: [interpret─────implement─────test─────] β†’ Repeated interpretation risk +Story 3: [interpret─────implement─────test─────] β†’ Inconsistency accumulates +``` + +Batched approach (quality concentrated): +``` +PLAN: [interpret ALL stories with full context] β†’ One-time, high-quality +DEV: [implement─implement─implement──────────] β†’ Mechanical, consistent +TEST: [test──────test──────test───────────────] β†’ Pattern-based +REVIEW: [review ALL with cross-story visibility ] β†’ Catches what batch missed +``` + +**The quality gates are at phase boundaries, not scattered throughout.** + +### Why Sonnet Doesn't Degrade Quality + +| Task Type | Why Sonnet Matches Opus | Evidence | +|-----------|------------------------|----------| +| Code generation | Both models trained extensively on code; Sonnet optimized for speed | Benchmarks show near-parity on coding tasks | +| Following specs | No interpretation needed when specs are clear | Deterministic translation | +| Pattern matching | Existing code provides template | Copy-modify-adapt pattern | +| Test writing | Acceptance criteria explicit | Direct mapping to assertions | +| Syntax/formatting | Both models equivalent | Mechanical task | + +### Where Opus Is Still Required + +| Task Type | Why Opus Needed | Stage | +|-----------|-----------------|-------| +| Requirement interpretation | Ambiguity resolution requires deeper reasoning | Planning | +| Architecture decisions | Trade-off analysis, long-term implications | Planning | +| Cross-story patterns | Seeing connections across multiple changes | Review | +| Novel problem solving | No existing pattern to follow | Planning | +| Quality judgment | "Is this good enough?" requires taste | Review | + +### The Batch Review Multiplier + +Reviewing stories in batch actually **improves** quality over individual review: + +``` +Individual Review: +Story 1: [review] β†’ Misses that auth pattern will conflict with Story 3 +Story 2: [review] β†’ Doesn't know Story 1 already solved similar problem +Story 3: [review] β†’ Unaware of auth pattern from Story 1 + +Batch Review: +Stories 1,2,3: [review together] + β†’ "Story 1 and 3 both touch auth - ensure consistent approach" + β†’ "Story 2 duplicates utility from Story 1 - extract shared helper" + β†’ "All three stories need the same error handling pattern" +``` + +**Cross-story visibility enables pattern detection impossible in isolated review.** + +## Token Efficiency: Why Batching Saves Tokens + +### Dramatic Rate Limit Savings + +Typical story implementation breakdown: +``` +File reads/searches: 40% of tokens β†’ Sonnet +Code generation: 35% of tokens β†’ Sonnet +Test writing: 15% of tokens β†’ Sonnet +Planning/decisions: 10% of tokens β†’ Opus +``` + +By routing 90% of token-heavy operations to Sonnet, Opus capacity is preserved for high-value strategic work. + +### Context Amortization: Batching Amplifies Efficiency + +Instead of: +``` +Story 1: Plan(Opus) β†’ Implement(Opus) β†’ Test(Opus) β†’ Review(Opus) +Story 2: Plan(Opus) β†’ Implement(Opus) β†’ Test(Opus) β†’ Review(Opus) +Story 3: Plan(Opus) β†’ Implement(Opus) β†’ Test(Opus) β†’ Review(Opus) +``` + +BBP does: +``` +Planning Phase: Plan stories 1,2,3 together (Opus, warm context) +Dev Phase: Implement stories 1,2,3 (Sonnet, parallel-ready) +Test Phase: Test stories 1,2,3 (Sonnet, shared fixtures) +Review Phase: Review stories 1,2,3 together (Opus, pattern detection) +``` + +Benefits: +- **Context loading amortized** across multiple stories +- **Pattern recognition** improved by seeing related changes together +- **Parallelization potential** for independent stories + +## Pipeline Flow + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ OPUS PLAN │───▢│ SONNET DEV │───▢│ SONNET TEST │───▢│ OPUS REVIEW β”‚ +β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ +β”‚ β€’ All storiesβ”‚ β”‚ β€’ Implement β”‚ β”‚ β€’ Write testsβ”‚ β”‚ β€’ Quality β”‚ +β”‚ planned β”‚ β”‚ each story β”‚ β”‚ β€’ Run tests β”‚ β”‚ gate β”‚ +β”‚ β€’ Context β”‚ β”‚ β€’ Warm β”‚ β”‚ β€’ Fix fails β”‚ β”‚ β€’ Pattern β”‚ +β”‚ establishedβ”‚ β”‚ context β”‚ β”‚ β”‚ β”‚ detection β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚SONNET REWORK β”‚ + β”‚ (if needed) β”‚ + β”‚ β”‚ + β”‚ β€’ Targeted β”‚ + β”‚ fixes from β”‚ + β”‚ review β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Implementation Notes + +### Warm Context Strategy + +Each phase loads relevant context before execution: +- **Planning**: PRD, Architecture, Epic definitions +- **Development**: Story file, project-context.md, related code +- **Testing**: Story AC, existing test patterns, fixtures +- **Review**: All changed files, architecture constraints, story requirements + +### Checkpoint System + +Pipeline state tracked in `_bmad-output/bbp-status.yaml`: +```yaml +epic: 3 +phase: dev +stories: + story-3.1: + status: completed + dev: done + test: done + story-3.2: + status: in_progress + dev: done + test: pending + story-3.3: + status: pending +``` + +Enables: +- Resume after interruption +- Skip completed work +- Parallel execution tracking + +### When to Use Opus vs Sonnet + +| Task | Model | Reason | +|------|-------|--------| +| Story planning | Opus | Requirement interpretation, scope decisions | +| Code implementation | Sonnet | Mechanical translation of specs to code | +| Test writing | Sonnet | Pattern-based, well-defined expectations | +| Code review | Opus | Cross-cutting concerns, architectural judgment | +| Targeted fixes | Sonnet | Specific, well-scoped corrections | +| Architecture decisions | Opus | Long-term impact, trade-off analysis | +| File operations | Sonnet | Token-heavy, mechanical operations | + +## Results + +In practice, this approach enables: +- **3-5x more stories** completed per rate limit window +- **Equivalent quality** on implementation tasks +- **Better review quality** through batch pattern detection +- **Sustainable development pace** without rate limit interruptions + +## Current Limitations: Manual Model Switching + +**Important**: Model switching in Claude Code is currently a manual operation via the `/model` command. + +```bash +/model sonnet # Switch to Sonnet for implementation +/model opus # Switch to Opus for planning/review +``` + +### UX Challenge (Work in Progress) + +The pipeline workflows need to clearly signal to users when to switch models: + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ ⚠️ PHASE COMPLETE: Planning finished β”‚ +β”‚ β”‚ +β”‚ Next phase: Development (Sonnet recommended) β”‚ +β”‚ β”‚ +β”‚ To switch models: /model sonnet β”‚ +β”‚ To continue: /bmad:bbp:workflows:sonnet-dev-batch β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +Current approach: +- Each workflow is named with its intended model (`opus-plan-epic`, `sonnet-dev-batch`) +- Phase completion messages prompt the user to switch +- Status workflow shows current phase and recommended model + +**Known improvement needed**: Workflow output messages need optimization to make model switching more intuitive. Current messages may not be prominent enough for users unfamiliar with the tiered approach. + +Future improvements being explored: +- Clearer visual cues at phase boundaries (boxed prompts, color hints) +- Model recommendation prominently displayed in status output +- Standardized "SWITCH MODEL" callout format across all phase transitions +- Potential Claude Code hooks for model suggestions + +### Why Manual Switching is Acceptable + +Despite the manual step, this approach works because: +1. **Phase boundaries are natural pause points** - good time to review progress anyway +2. **Explicit control** - users know exactly which model is doing what +3. **No surprise costs** - transparent about when Opus vs Sonnet is used +4. **Workflow names are self-documenting** - `sonnet-*` vs `opus-*` prefixes + +## Future Considerations + +- **Haiku tier**: For even lighter operations (file searches, simple edits) +- **Adaptive routing**: Dynamic model selection based on task complexity signals +- **Parallel batches**: Multiple Sonnet sessions for independent stories +- **Automated model hints**: Claude Code integration for model recommendations + +--- + +*This architecture emerged from practical necessity during extended development sessions where Opus rate limits became the primary bottleneck to productivity.*