32 KiB
Core Development Principles
Shared philosophy and principles for all JavaScript/TypeScript development agents. These principles guide decision-making across the entire stack.
Universal Principles
Type Safety First: Catch errors at compile time, not runtime. Use TypeScript strict mode, explicit types, and leverage inference.
Security First: Every endpoint authenticated, all inputs validated, all outputs sanitized. Never trust client input.
Developer Experience: Fast feedback loops, clear error messages, intuitive APIs, comprehensive documentation.
Performance Matters: Optimize for perceived performance first. Measure before optimizing. Cache strategically.
Maintainability: Clean code organization, consistent patterns, automated testing, comprehensive documentation.
YAGNI (You Aren't Gonna Need It): Don't over-engineer for future needs. Build for 2x scale, not 100x. Prefer boring, proven technology.
Component-First Thinking: Every piece is reusable, well-tested, and composable (applies to UI components, services, modules).
Clean Architecture: Separation of concerns, dependency injection, testable code. Controllers handle I/O, services handle logic, repositories handle data.
Observability: Logging, monitoring, error tracking. You can't improve what you don't measure.
Context Efficiency Philosophy
All agents optimize token usage through high-signal communication:
Reference, Don't Repeat: Point to file paths instead of duplicating content
- ✅ "Implementation in
src/services/auth.service.ts" - ❌ [Pasting entire file contents]
Provide Summaries: After creating artifacts, give 2-3 sentence overview with file reference
- ✅ "Created authentication service with JWT + refresh tokens. See
src/services/auth.service.tsfor implementation." - ❌ [Explaining every line of code written]
Progressive Detail: Start with high-level structure, add details only when implementing
- ✅ Start: "Need authentication with JWT" → Later: "bcrypt rounds, token expiry settings"
- ❌ Upfront: Every security configuration detail before deciding approach
Archive Verbose Content: Keep implementations/discussions in files, reference them
- ✅ Long discussions →
docs/decisions/auth-strategy.md→ Reference in checkpoint - ❌ Repeating entire discussion in every message
Checkpoint Summaries: At phase transitions, compress context into max 100-line summaries containing:
- Final decisions and rationale
- Artifact file paths
- Critical constraints and dependencies
- What to do next
Just-in-Time Context Retrieval
CRITICAL: Agents must retrieve context only when making specific decisions, not upfront. This prevents token waste and maintains focus.
Start Minimal (Every Task)
Load at Task Start:
- ✅ Your role definition and core principles (this file)
- ✅ The specific task description and requirements
- ✅ Any explicitly referenced artifacts (checkpoint files, requirements docs)
- ❌ NO technology-specific guides yet
- ❌ NO implementation patterns yet
- ❌ NO best practices yet
Example: React Developer receives task "Build user profile component"
- Load: Role definition, task description, requirements
- Skip: state-management-guide.md, react-patterns.md, testing-strategy.md (load later when needed)
Load on Decision Points (Not Before)
Decision-Triggered Loading:
-
Technology Selection → Load technology-stack-guide.md
- ONLY when choosing framework/library
- NOT if stack already decided
-
Implementation Pattern → Load pattern-specific guide
- ONLY when implementing that specific pattern
- Example: Loading GraphQL patterns ONLY if building GraphQL API
-
Security Concern → Load security-guidelines.md
- ONLY when handling auth, sensitive data, or user input
- NOT for every task
-
Performance Issue → Load performance optimization guides
- ONLY when performance requirements specified
- NOT preemptively
-
Deployment Decision → Load deployment-strategies.md
- ONLY when choosing deployment approach
- NOT during feature development
Question Before Loading
Before loading any data file, ask:
- Is this decision being made RIGHT NOW?
- Can I defer this decision to implementation?
- Is this information in the checkpoint/requirements already?
Examples:
❌ Over-Loading: "Building auth system" → Loading security-guidelines.md, api-implementation-patterns.md, deployment-strategies.md, database-optimization.md
- WHY BAD: Most won't be needed yet
✅ JIT Loading: "Building auth system" → Start with requirements → Load security-guidelines.md when designing auth flow → Load api-implementation-patterns.md when implementing endpoints → Skip deployment/optimization (not needed yet)
❌ Over-Loading: "Design architecture" → Loading ALL data files (technology-stack-guide.md, deployment-strategies.md, security-guidelines.md, best-practices.md, etc.)
- WHY BAD: Won't use 80% of it in architecture phase
✅ JIT Loading: "Design architecture" → Load technology-stack-guide.md for stack decision → Load deployment-strategies.md for hosting decision → Load security-guidelines.md IF handling sensitive data → Skip implementation patterns (for later)
Context Loading Decision Tree
Task Received
├─ Load: Role + Task + Requirements (ALWAYS)
├─ Checkpoint exists? → Load checkpoint (skip full history)
└─ Then ask: What decision am I making RIGHT NOW?
│
├─ Choosing Stack/Framework?
│ └─ Load: technology-stack-guide.md
│
├─ Implementing API endpoints?
│ ├─ What type? REST/GraphQL/tRPC?
│ └─ Load: api-implementation-patterns.md (specific section only)
│
├─ Handling Authentication/Authorization?
│ └─ Load: security-guidelines.md
│
├─ Database schema design?
│ └─ Load: database-design-patterns.md
│
├─ Deployment/Hosting decision?
│ └─ Load: deployment-strategies.md
│
├─ Performance optimization?
│ └─ Load: performance-optimization.md
│
├─ Code review/quality check?
│ └─ Load: development-guidelines.md + relevant checklist
│
└─ General implementation?
└─ Load NOTHING extra - use role knowledge
(Load specific guides only if encountering decision)
Role-Specific Context Limits
React Developer:
- Default load: core-principles.md only
- Load state-management-guide.md ONLY when choosing state solution
- Load component-design-guidelines.md ONLY when building complex component
- SKIP: Backend patterns, API specs (unless integrating), database docs
Node Backend Developer:
- Default load: core-principles.md only
- Load framework comparison ONLY when choosing Express/Fastify/NestJS
- Load database patterns ONLY when writing queries
- SKIP: React patterns, frontend state management, CSS guides
Solution Architect:
- Default load: core-principles.md + requirements
- Load technology-stack-guide.md for stack decisions
- Load deployment-strategies.md for hosting decisions
- Load security-guidelines.md IF requirements mention sensitive data
- SKIP: Implementation patterns (delegate to implementation agents)
API Developer:
- Default load: core-principles.md only
- Load api-implementation-patterns.md for specific API type (REST/GraphQL/tRPC section only)
- SKIP: Frontend patterns, deployment strategies, database optimization
TypeScript Expert:
- Default load: core-principles.md only
- Load tsconfig reference ONLY when configuring TypeScript
- Load advanced patterns ONLY when implementing complex types
- SKIP: Framework-specific guides, deployment, testing (unless TS-specific)
Progressive Context Expansion
3-Stage Loading Pattern:
Stage 1 - Task Start (Minimal):
- Role definition
- Task requirements
- Referenced artifacts only
Stage 2 - Decision Point (Targeted):
- Load specific guide for current decision
- Load ONLY relevant section (not entire file)
- Example: Load "REST API" section from api-implementation-patterns.md, skip GraphQL/tRPC sections
Stage 3 - Implementation (On-Demand):
- Load patterns as you encounter need
- Load examples when unclear
- Load checklists when validating
Anti-Patterns (What NOT to Do)
❌ "Load Everything Just in Case"
- Loading all data files at task start
- "Might need it later" is NOT a valid reason
❌ "Load Full File for One Detail"
- Load specific section only
- Reference file path for full details
❌ "Load Before Decision Needed"
- Don't load deployment guide during feature design
- Don't load implementation patterns during architecture phase
❌ "Load Other Agents' Context"
- Backend developer doesn't need React state management
- Frontend developer doesn't need database optimization
Validation Questions
Before loading any file, ask yourself:
- Am I making THIS specific decision RIGHT NOW?
- Is this my role's responsibility?
- Can this wait until implementation?
- Is this already in checkpoint/requirements?
If answer to #1 or #2 is NO → Don't load it
Token Budget Awareness
Typical Token Costs:
- Role definition: ~200-300 tokens
- Task description: ~100-500 tokens
- Data file (full): ~1,000-3,000 tokens
- Data file (section): ~200-500 tokens
- Checkpoint: ~500-1,000 tokens
Target Budget:
- Task start: <1,000 tokens total
- Per decision: +200-500 tokens (specific guide section)
- Full workflow: <10,000 tokens cumulative (use checkpoints)
If approaching budget limit: Create checkpoint, archive verbose context, start fresh phase.
Runtime Token Monitoring
CRITICAL: Agents must actively monitor token usage during execution and self-regulate to stay within budgets.
Token Self-Assessment Protocol
At Task Start:
Task: [task name]
Estimated Budget: [X tokens]
Loaded:
- Role definition: ~250 tokens
- core-principles.md: ~300 tokens
- Task requirements: ~[Y] tokens
- [other files]: ~[Z] tokens
TOTAL LOADED: ~[sum] tokens
Remaining Budget: [budget - sum] tokens
During Execution (every 2-3 decisions):
Token Check:
- Starting context: [X] tokens
- Loaded since last check: [Y] tokens
- Current estimated total: [X+Y] tokens
- Budget: [Z] tokens
- Status: [OK | WARNING | EXCEEDED]
Before Loading Any File:
Pre-Load Check:
- Current context: ~[X] tokens
- File to load: ~[Y] tokens (estimated)
- After load: ~[X+Y] tokens
- Budget: [Z] tokens
- Decision: [PROCEED | SKIP | CHECKPOINT FIRST]
Checkpoint Triggers
MUST create checkpoint when:
- ✅ Context exceeds 3,000 tokens
- ✅ Completing a major phase (architecture → implementation)
- ✅ Before loading would exceed budget
- ✅ After 5+ agent interactions in sequence
- ✅ Detailed discussion exceeds 1,000 tokens
Checkpoint Process:
- Estimate current context size
- Create checkpoint summary (max 100 lines, ~500 tokens)
- Archive verbose content to
docs/archive/ - Start next phase with checkpoint only
- Token reduction: Expect 80%+ compression
Example:
Phase complete. Estimating context:
- Architecture discussion: ~2,500 tokens
- Technology decisions: ~1,200 tokens
- Requirements: ~800 tokens
TOTAL: ~4,500 tokens
Action: Creating checkpoint
- Checkpoint summary: ~500 tokens
- Token reduction: 89%
- Archived: Full discussion to docs/archive/architecture-phase/
Budget Warning System
Token Status Levels:
🟢 GREEN (0-50% of budget):
- Status: Healthy
- Action: Continue normally
- Example: 800/2000 tokens used
🟡 YELLOW (50-75% of budget):
- Status: Warning
- Action: Be selective with additional loads
- Example: 1,500/2000 tokens used
- Strategy: Load only critical guides, skip nice-to-haves
🟠 ORANGE (75-90% of budget):
- Status: Critical
- Action: Stop loading new context
- Example: 1,700/2000 tokens used
- Strategy: Use existing context only, prepare checkpoint
🔴 RED (>90% of budget):
- Status: Exceeded
- Action: Create checkpoint IMMEDIATELY
- Example: 1,850/2000 tokens used
- Strategy: Checkpoint NOW, start fresh phase
Self-Monitoring Format
Maintain Mental Token Accounting:
Throughout task execution, mentally track:
=== TOKEN ACCOUNTING ===
FIXED CONTEXT (loaded at start):
- Role definition: ~250
- Core principles: ~300
- Task requirements: ~500
Subtotal: ~1,050 tokens
DECISION CONTEXT (loaded during execution):
- technology-stack-guide.md: ~1,500
- security-guidelines.md: ~1,000
Subtotal: ~2,500 tokens
CONVERSATION (generated during work):
- Discussion and analysis: ~800
- Code examples: ~400
Subtotal: ~1,200 tokens
TOTAL ESTIMATED: ~4,750 tokens
BUDGET: 5,000 tokens
STATUS: 🟠 ORANGE - Approaching limit
ACTION: No more loads, complete task with current context
=== END ACCOUNTING ===
Estimation Guidelines
Token Estimation Rules:
- 1 word ≈ 1.3 tokens (on average)
- 1 line of code ≈ 15-25 tokens
- 1 markdown paragraph ≈ 50-100 tokens
- Agent role file ≈ 200-400 tokens
- Data file (full) ≈ 1,000-3,000 tokens
- Data file (section) ≈ 200-500 tokens
- Checkpoint summary ≈ 500-800 tokens
- Architecture doc ≈ 2,000-4,000 tokens
Quick Estimation:
- Count words in content
- Multiply by 1.3
- Round up for safety
File Size Indicators:
- Small file (<100 lines): ~500 tokens
- Medium file (100-300 lines): ~1,000-1,500 tokens
- Large file (300-500 lines): ~2,000-3,000 tokens
- Very large file (>500 lines): ~3,000+ tokens
Runtime Optimization Tactics
When Approaching Budget:
-
Use Section Loading: Load only relevant section of large file
- Instead of: Full api-implementation-patterns.md (~1,500 tokens)
- Load: REST section only (~300 tokens)
- Savings: 80%
-
Reference Don't Load: Point to file path instead of loading
- Instead of: Loading best-practices.md
- Say: "Following patterns in best-practices.md section 3.2"
- Savings: 100% (no load needed)
-
Checkpoint Intermediate State: Don't wait for phase end
- If discussion getting long, checkpoint NOW
- Compress verbose analysis into key decisions
- Continue with lightweight checkpoint
-
Defer Non-Critical Loads: Skip nice-to-have context
- If 🟡 YELLOW or above, load ONLY must-haves
- Skip optimization guides if not performance-critical
- Skip best practices if pattern already clear
-
Use Role Knowledge: Lean on built-in expertise
- Instead of: Loading guide for basic patterns
- Use: Agent's inherent role knowledge
- When: Standard, well-known patterns
Multi-Agent Workflow Monitoring
Workflow-Level Tracking:
When multiple agents work sequentially:
WORKFLOW TOKEN TRACKING
Phase 1: Requirements (Analyst)
- Context: ~800 tokens
- Output: requirements.md
- Checkpoint: ~600 tokens
- Handoff: Pass checkpoint only
Phase 2: Architecture (Solution Architect)
- Receive: checkpoint (~600)
- Load: tech-stack-guide (~1,500)
- Context: ~2,800 tokens
- Output: architecture.md
- Checkpoint: ~800 tokens
- Handoff: Pass checkpoint only
Phase 3: Implementation (Developers)
- Receive: checkpoint (~800)
- Load: role-specific guides (~500-1,000)
- Context: ~1,500-2,000 tokens
- No checkpoint needed (final phase)
TOTAL WORKFLOW: ~5,100 tokens (with checkpoints)
WITHOUT CHECKPOINTS: ~15,000+ tokens (accumulated context)
SAVINGS: 66%
Checkpoint Quality Metrics
Effective Checkpoint Indicators:
- ✅ Reduces context by 80%+ (e.g., 4,000 → 800 tokens)
- ✅ Contains all critical decisions
- ✅ Includes all artifact paths
- ✅ Next agent can proceed without loading previous phase
- ✅ Max 100 lines (~500-800 tokens)
Poor Checkpoint Indicators:
- ❌ Only 30-50% reduction (too verbose)
- ❌ Missing key decisions or rationale
- ❌ Next agent needs to reload original context
- ❌ Over 150 lines (defeats purpose)
Self-Regulation Commitment
Every agent commits to:
- Estimate before loading: Know token cost before loading any file
- Track cumulative usage: Maintain mental accounting of context size
- Respect budget warnings: React to 🟡🟠🔴 status appropriately
- Create checkpoints proactively: Don't wait for red status
- Load sections not files: When possible, load only relevant sections
- Use role knowledge first: Load guides only when truly needed
Example: Good Runtime Monitoring
Task: Design REST API for user service
Budget: 2,000 tokens
[Start]
Loaded: Role (250) + core-principles (300) + requirements (400) = 950 tokens
Status: 🟢 GREEN (48% of budget)
[Decision: Need REST patterns]
Pre-check: Current 950 + api-patterns REST section (300) = 1,250
Status: 🟢 GREEN (63% of budget) → PROCEED
Loaded: REST section
Context: 1,250 tokens
[Decision: Need auth patterns?]
Pre-check: Current 1,250 + security-guidelines (1,000) = 2,250
Status: 🔴 RED (113% of budget) → DEFER
Decision: Use basic auth pattern from role knowledge, skip loading guide
[Complete]
Final context: ~1,400 tokens (with output)
Status: 🟢 GREEN (70% of budget)
Result: Task complete, stayed within budget
Example: Bad Runtime Monitoring (Don't Do This)
Task: Design REST API for user service
Budget: 2,000 tokens
[Start]
Loaded: Role + core-principles + requirements + api-patterns (full) +
security-guidelines + best-practices + deployment-strategies
Context: ~5,500 tokens
Status: 🔴 RED (275% of budget) - EXCEEDED
Problem: Loaded everything upfront without checking budget
Result: Token waste, exceeded budget, needed checkpoint immediately
Prompt Caching Optimization
POWERFUL: Prompt caching can reduce token costs by ~90% for static content and improve response times by caching reusable context.
How Prompt Caching Works
Concept: AI providers cache static portions of prompts across multiple requests, dramatically reducing token costs and latency.
Benefits:
- Token Savings: ~90% reduction for cached content (5,000 tokens → 500 tokens charged)
- Speed: Faster response times (cached content pre-processed)
- Cost: Significant cost reduction for repeated context
Requirements:
- Content must be static (doesn't change between requests)
- Content must be large enough (typically >1,000 tokens) to benefit from caching
- Content must be in the prefix (beginning of prompt, before dynamic content)
Cache-Friendly Content Structure
Optimal Loading Order (Stable → Dynamic):
1. CACHEABLE (Static - Load First)
├─ Agent role definition (changes rarely)
├─ core-principles.md (stable reference)
├─ Data files (technology-stack-guide.md, etc. - stable reference)
├─ Architecture documents (stable after creation)
└─ Checkpoints (stable after creation)
2. SEMI-CACHEABLE (Changes Occasionally)
├─ Task templates (stable structure, variable content)
├─ Workflow definitions (stable structure)
└─ Requirements docs (stable after approval)
3. NON-CACHEABLE (Dynamic - Load Last)
├─ Current task description (unique per task)
├─ User questions/requests (unique per interaction)
├─ Conversation history (constantly changing)
└─ Real-time data (changes frequently)
Golden Rule: Static content first, dynamic content last
What to Cache vs Not Cache
✅ CACHE (Static Reference Material):
- Agent role definitions
- core-principles.md
- technology-stack-guide.md
- security-guidelines.md
- api-implementation-patterns.md
- deployment-strategies.md
- development-guidelines.md
- architecture-patterns.md
- All data/* files (stable reference)
- Completed architecture documents
- Finalized checkpoints
⚠️ CACHE WITH CARE (Semi-Static):
- Requirements documents (after finalization)
- Architecture documents (after approval)
- API specifications (after freeze)
- Checkpoints (after phase completion)
❌ DON'T CACHE (Dynamic):
- Current task description
- User questions/messages
- Conversation history
- Work-in-progress documents
- Code being reviewed
- Intermediate decisions
- Debug output
- Error messages
Cache-Optimized Loading Pattern
Pattern 1: Static Foundation First
LOAD ORDER for cache efficiency:
[CACHEABLE BLOCK - Load First]
1. Agent role definition (~300 tokens) ─┐
2. core-principles.md (~5,000 tokens) │
3. technology-stack-guide.md (~2,000) ├─ CACHE THIS (~8,500 tokens)
4. security-guidelines.md (~1,200) │ Saves ~7,650 tokens on subsequent calls
─┘
[DYNAMIC BLOCK - Load Last]
5. Current task description (~500 tokens) ─ Don't cache (changes every task)
6. User request (~200 tokens) ─ Don't cache (unique)
Result:
- First call: ~8,700 tokens charged
- Subsequent calls: ~850 tokens charged (only dynamic content)
- Savings: ~90% per call after first
Pattern 2: Task-Specific Caching
For repetitive tasks (e.g., multiple features in same codebase):
[STABLE CACHE - Reuse Across Tasks]
1. Agent role
2. core-principles.md
3. Project architecture (finalized)
4. Tech stack decisions (finalized)
5. API specifications (frozen)
[TASK CACHE - Reuse Within Task]
6. Feature requirements (current feature)
7. Architecture checkpoint (current feature)
[NEVER CACHE]
8. Current subtask
9. User messages
Cache Invalidation Strategy
When to Refresh Cache:
🔄 REFRESH IMMEDIATELY when:
- core-principles.md updated
- Architecture significantly changed
- Technology decisions revised
- Security guidelines updated
- Major refactoring completed
⏱️ REFRESH PERIODICALLY:
- Data files: Every major version update
- Agent roles: When capabilities change
- Best practices: When standards evolve
✅ KEEP CACHED:
- Stable reference material
- Finalized architecture
- Approved requirements
- Completed checkpoints
Cache Lifetime Guidance:
- Long-lived (days/weeks): core-principles, data files, agent roles
- Medium-lived (hours/days): architecture docs, requirements
- Short-lived (minutes): task context, checkpoints
- No cache (per request): user input, conversation
Workflow-Level Caching
Greenfield Project Workflow:
Phase 1: Requirements
[CACHE]
- Analyst role
- core-principles.md
[NO CACHE]
- Stakeholder input
- Requirements being drafted
Phase 2: Architecture
[CACHE]
- Solution Architect role
- core-principles.md
- technology-stack-guide.md
- Requirements (now finalized)
[NO CACHE]
- Architecture being designed
Phase 3: Implementation
[CACHE]
- Developer roles
- core-principles.md
- Architecture docs (now finalized)
- Checkpoints (now stable)
[NO CACHE]
- Current feature
- Code being written
CACHE BENEFIT: Each phase reuses stable context from previous phases
Cache Size Optimization
Optimal Cache Sizes:
- Minimum: 1,000+ tokens (smaller = not worth caching overhead)
- Sweet Spot: 5,000-15,000 tokens (balance reuse vs memory)
- Maximum: Per provider limits (Claude: up to ~3-4 cache blocks)
Strategies:
-
Bundle Static Files: Load related static files together in cacheable block
GOOD: Load all data/* files together (one cache block) BAD: Load data files separately interspersed with dynamic content -
Stable Prefix: Keep structure consistent across requests
GOOD: Always load role → principles → guides in same order BAD: Random order each time (breaks cache) -
Cache Boundaries: Clear separation between static and dynamic
GOOD: [Static block] then [Dynamic block] BAD: Static → Dynamic → Static → Dynamic (cache thrashing)
Cache-Aware JIT Loading
Modify JIT Pattern for Caching:
Standard JIT (No Caching Consideration):
- Load role + task
- Load guide when needed
- Load another guide when needed
- Load dynamic content
Cache-Aware JIT:
- Load ALL static guides upfront (cache block)
- Load dynamic task content
- Reference cached guides as needed
- NO additional loads (all guides pre-cached)
When to Use Each:
Use Standard JIT when:
- One-off task (no cache benefit)
- Highly dynamic workflow
- Frequently changing context
Use Cache-Aware Loading when:
- Repetitive tasks (multiple features)
- Long-running project (many interactions)
- Stable reference material
- Multiple agents using same context
Cache Hit Indicators
High Cache Hit Scenarios (Good):
- ✅ Multiple features in same project
- ✅ Sequential agent interactions
- ✅ Iterative development (multiple sprints)
- ✅ Code review across similar code
- ✅ Consistent use of same data files
Low Cache Hit Scenarios (Less Benefit):
- ❌ One-off greenfield projects
- ❌ Constantly changing architecture
- ❌ Different tech stacks per task
- ❌ Unique context each time
Caching Best Practices
DO:
- ✅ Load static content first (agent role, principles, data files)
- ✅ Keep loading order consistent across requests
- ✅ Bundle related static files together
- ✅ Cache finalized documents (architecture, requirements)
- ✅ Use cache-aware loading for repetitive tasks
DON'T:
- ❌ Interleave static and dynamic content
- ❌ Change loading order randomly
- ❌ Cache work-in-progress documents
- ❌ Cache user input or conversation
- ❌ Over-cache (don't cache tiny files)
Example: Cache-Optimized Workflow
Task: Implement 5 features in existing project
Without Cache Optimization:
Feature 1: Load role + principles + architecture + task → 8,000 tokens
Feature 2: Load role + principles + architecture + task → 8,000 tokens
Feature 3: Load role + principles + architecture + task → 8,000 tokens
Feature 4: Load role + principles + architecture + task → 8,000 tokens
Feature 5: Load role + principles + architecture + task → 8,000 tokens
TOTAL: 40,000 tokens
With Cache Optimization:
Feature 1: Load [role + principles + architecture]CACHE + task → 8,000 tokens (full charge)
Feature 2: [Cache hit] + task → 800 tokens (10% charge)
Feature 3: [Cache hit] + task → 800 tokens
Feature 4: [Cache hit] + task → 800 tokens
Feature 5: [Cache hit] + task → 800 tokens
TOTAL: 11,200 tokens
SAVINGS: 72% (28,800 tokens saved)
Integration with JIT Loading
Hybrid Strategy (Best of Both):
Phase 1: Load Static Cache Block
[CACHE BLOCK - Load Once]
- Agent role
- core-principles.md
- agent-responsibility-matrix.md
- context-loading-guide.md (if needed)
Total: ~7,000 tokens
Phase 2: JIT Load Decision Guides (Also Cache-Friendly)
[CONDITIONAL CACHE - Load If Needed]
- technology-stack-guide.md (if making stack decision)
- security-guidelines.md (if implementing auth)
- api-implementation-patterns.md (if designing API)
Add: ~1,000-3,000 tokens
Phase 3: Dynamic Task Context (Never Cache)
[DYNAMIC - Changes Per Task]
- Current task description
- User messages
- Work artifacts
Add: ~500-1,500 tokens
Result:
- Stable foundation cached
- Decision guides cached when needed
- Dynamic content fresh
- Optimal token efficiency
Token Accounting with Caching
Revised Token Accounting Format:
=== TOKEN ACCOUNTING (With Cache) ===
CACHEABLE CONTEXT (Static):
- Role definition: ~300 tokens
- core-principles.md: ~5,000 tokens
- architecture.md: ~2,000 tokens
Subtotal: ~7,300 tokens
Cache Status: CACHED (first call: full charge, subsequent: ~10% charge)
DECISION CONTEXT (Semi-Static):
- security-guidelines.md: ~1,000 tokens
Cache Status: CACHED (loaded in previous task)
DYNAMIC CONTEXT (Never Cache):
- Current task: ~500 tokens
- User messages: ~200 tokens
Subtotal: ~700 tokens
TOTAL ESTIMATED:
- First call: ~9,000 tokens
- This call (cache hit): ~1,500 tokens (83% savings)
Summary: Caching Optimization
Key Principles:
- Static first, dynamic last - Order matters for caching
- Bundle static files - Load together for efficient caching
- Consistent structure - Same order maximizes cache hits
- Finalize before caching - Don't cache work-in-progress
- Cache-aware JIT - Load static guides upfront for repetitive tasks
Expected Impact:
- Single task: 0-30% savings (cache setup cost)
- Repetitive tasks: 70-90% savings (massive benefit)
- Long-term project: 80%+ cumulative savings
Remember: Caching optimizes INPUT token costs. Output quality remains HIGH with comprehensive, detailed responses.
Code Quality Standards
Naming Conventions:
- Files: kebab-case for utilities, PascalCase for components, camelCase for hooks
- Variables: camelCase for functions/vars, PascalCase for classes, UPPER_SNAKE_CASE for constants
- Descriptive:
isLoadingnotloading,handleSubmitnotsubmit
TypeScript Rules:
- No
any- useunknownfor truly unknown types interfacefor objects,typefor unions/intersections- Explicit function return types
- Generics for reusable type-safe code
Testing Philosophy:
- Test user interactions, not implementation details
- Mock external dependencies
- Integration tests for critical paths
- Unit tests for complex business logic
- Aim for >80% coverage on critical code
Error Handling:
- Custom error classes for different error types
- Centralized error handling (middleware for backend, error boundaries for frontend)
- Structured logging with context
- Never expose stack traces in production
- Proper HTTP status codes (400 validation, 401 auth, 404 not found, 500 server error)
Security Standards
Authentication:
- bcrypt/argon2 for password hashing (10+ rounds)
- JWT with short-lived access tokens + long-lived refresh tokens
- Store refresh tokens in httpOnly cookies or secure DB
- Implement token rotation on refresh
Input Validation:
- Validate ALL user inputs with Zod or Joi
- Sanitize HTML output to prevent XSS
- Use parameterized queries to prevent SQL injection
- Whitelist, never blacklist
API Security:
- CORS with specific origins (not *)
- Rate limiting per user/endpoint
- CSRF protection for state-changing operations
- Security headers with Helmet.js
- HTTPS in production
Secrets Management:
- Never commit secrets to version control
- Use environment variables (.env files)
- Rotate credentials regularly
- Minimal privilege principle
Performance Standards
Frontend:
- Code splitting and lazy loading for routes/heavy components
- Memoization (React.memo, useMemo, useCallback) only when measured benefit
- Virtual scrolling for long lists
- Image optimization (next/image or similar)
- Bundle analysis and tree shaking
Backend:
- Database indexes on frequently queried fields
- Connection pooling
- Redis caching for frequently accessed data
- Pagination for large result sets
- Async/await throughout (no blocking operations)
- Background jobs for heavy processing
Documentation Standards
Code Documentation:
- JSDoc for public APIs and complex functions
- README for setup and architecture overview
- Inline comments for "why", not "what"
- Keep docs close to code
API Documentation:
- OpenAPI/Swagger for REST APIs
- GraphQL SDL with type descriptions
- Code examples for common use cases
- Authentication flows documented
- Error codes and messages explained
Architecture Documentation:
- Architecture Decision Records (ADRs) for key decisions
- System diagrams for complex architectures
- Database schema documentation
- Deployment guides
Git Standards
Commit Format: <type>(<scope>): <subject>
Types: feat, fix, docs, style, refactor, test, chore
Example: feat(auth): add password reset functionality
PR Requirements:
- TypeScript compiles with no errors
- ESLint passes
- All tests pass, coverage >80%
- No console.logs or debugger statements
- Meaningful commits
- Clear PR description
Working Philosophy
Start Simple: MVP first, add complexity as needed. Monolith before microservices.
Fail Fast: Validate early, catch errors at compile time, comprehensive error handling.
Iterate Quickly: Ship small increments, get feedback, improve continuously.
Automate Everything: Testing, linting, deployment, monitoring. Reduce manual toil.
Monitor and Learn: Track metrics, analyze errors, learn from production, improve continuously.
All agents reference these principles.
Additional References:
- Implementation details:
development-guidelines.md,best-practices.md,security-guidelines.md - Agent boundaries:
agent-responsibility-matrix.md(who owns what decisions) - Context loading:
context-loading-guide.md(when to load what) - Technology guides: Various data/* files (loaded just-in-time)