BMAD-METHOD/src/modules/bmb/docs/workflows/csv-data-file-standards.md

9.8 KiB

CSV Data File Standards for BMAD Workflows

Purpose and Usage

CSV data files in BMAD workflows serve specific purposes for different workflow types:

For Agents: Provide structured data that agents need to reference but cannot realistically generate (such as specific configurations, domain-specific data, or structured knowledge bases).

For Expert Agents: Supply specialized knowledge bases, reference data, or persistent information that the expert agent needs to access consistently across sessions.

For Workflows: Include reference data, configuration parameters, or structured inputs that guide workflow execution and decision-making.

Key Principle: CSV files should contain data that is essential, structured, and not easily generated by LLMs during execution.

Intent-Based Design Principle

Core Philosophy: The closer workflows stay to intent rather than prescriptive instructions, the more creative and adaptive the LLM experience becomes.

CSV Enables Intent-Based Design:

  • Instead of: Hardcoded scripts with exact phrases LLM must say
  • CSV Provides: Clear goals and patterns that LLM adapts creatively to context
  • Result: Natural, contextual conversations rather than rigid scripts

Example - Advanced Elicitation:

  • Prescriptive Alternative: 50 separate files with exact conversation scripts
  • Intent-Based Reality: One CSV row with method goal + pattern → LLM adapts to user
  • Benefit: Same method works differently for different users while maintaining essence

Intent vs Prescriptive Spectrum:

  • Highly Prescriptive: "Say exactly: 'Based on my analysis, I recommend...'"
  • Balanced Intent: "Help the user understand the implications using your professional judgment"
  • CSV Goal: Provide just enough guidance to enable creative, context-aware execution

Primary Use Cases

1. Knowledge Base Indexing (Document Lookup Optimization)

Problem: Large knowledge bases with hundreds of documents cause context blowup and missed details when LLMs try to process them all.

CSV Solution: Create a knowledge base index with:

  • Column 1: Keywords and topics
  • Column 2: Document file path/location
  • Column 3: Section or line number where relevant content starts
  • Column 4: Content type or summary (optional)

Result: Transform from context-blowing document loads to surgical precision lookups, creating agents with near-infinite knowledge bases while maintaining optimal context usage.

2. Workflow Sequence Optimization

Problem: Complex workflows (e.g., game development) with hundreds of potential steps for different scenarios become unwieldy and context-heavy.

CSV Solution: Create a workflow routing table:

  • Column 1: Scenario type (e.g., "2D Platformer", "RPG", "Puzzle Game")
  • Column 2: Required step sequence (e.g., "step-01,step-03,step-07,step-12")
  • Column 3: Document sections to include
  • Column 4: Specialized parameters or configurations

Result: Step 1 determines user needs, finds closest match in CSV, confirms with user, then follows optimized sequence - truly optimal for context usage.

3. Method Registry (Dynamic Technique Selection)

Problem: Tasks need to select optimal techniques from dozens of options based on context, without hardcoding selection logic.

CSV Solution: Create a method registry with:

  • Column 1: Category (collaboration, advanced, technical, creative, etc.)
  • Column 2: Method name and rich description
  • Column 3: Execution pattern or flow guide (e.g., "analysis → insights → action")
  • Column 4: Complexity level or use case indicators

Example: Advanced Elicitation task analyzes content context, selects 5 best-matched methods from 50 options, then executes dynamically using CSV descriptions.

Result: Smart, context-aware technique selection without hardcoded logic - infinitely extensible method libraries.

4. Configuration Management

Problem: Complex systems with many configuration options that vary by use case.

CSV Solution: Configuration lookup tables mapping scenarios to specific parameter sets.

What NOT to Include in CSV Files

Avoid Web-Searchable Data: Do not include information that LLMs can readily access through web search or that exists in their training data, such as:

  • Common programming syntax or standard library functions
  • General knowledge about widely used technologies
  • Historical facts or commonly available information
  • Basic terminology or standard definitions

Include Specialized Data: Focus on data that is:

  • Specific to your project or domain
  • Not readily available through web search
  • Essential for consistent workflow execution
  • Too voluminous for LLM context windows

CSV Data File Standards

1. Purpose Validation

  • Essential Data Only: CSV must contain data that cannot be reasonably generated by LLMs
  • Domain Specific: Data should be specific to the workflow's domain or purpose
  • Consistent Usage: All columns and data must be referenced and used somewhere in the workflow
  • No Redundancy: Avoid data that duplicates functionality already available to LLMs

2. Structural Standards

  • Valid CSV Format: Proper comma-separated values with quoted fields where needed
  • Consistent Columns: All rows must have the same number of columns
  • No Missing Data: Empty values should be explicitly marked (e.g., "", "N/A", or NULL)
  • Header Row: First row must contain clear, descriptive column headers
  • Proper Encoding: UTF-8 encoding required for special characters

3. Content Standards

  • No LLM-Generated Content: Avoid data that LLMs can easily generate (e.g., generic phrases, common knowledge)
  • Specific and Concrete: Use specific values rather than vague descriptions
  • Verifiable Data: Data should be factual and verifiable when possible
  • Consistent Formatting: Date formats, numbers, and text should follow consistent patterns

4. Column Standards

  • Clear Headers: Column names must be descriptive and self-explanatory
  • Consistent Data Types: Each column should contain consistent data types
  • No Unused Columns: Every column must be referenced and used in the workflow
  • Appropriate Width: Columns should be reasonably narrow and focused

5. File Size Standards

  • Efficient Structure: CSV files should be as small as possible while maintaining functionality
  • No Redundant Rows: Avoid duplicate or nearly identical rows
  • Compressed Data: Use efficient data representation (e.g., codes instead of full descriptions)
  • Maximum Size: Individual CSV files should not exceed 1MB unless absolutely necessary

6. Documentation Standards

  • Documentation Required: Each CSV file should have documentation explaining its purpose
  • Column Descriptions: Each column must be documented with its usage and format
  • Data Sources: Source of data should be documented when applicable
  • Update Procedures: Process for updating CSV data should be documented

7. Integration Standards

  • File References: CSV files must be properly referenced in workflow configuration
  • Access Patterns: Workflow must clearly define how and when CSV data is accessed
  • Error Handling: Workflow must handle cases where CSV files are missing or corrupted
  • Version Control: CSV files should be versioned when changes occur

8. Quality Assurance

  • Data Validation: CSV data should be validated for correctness and completeness
  • Format Consistency: Consistent formatting across all rows and columns
  • No Ambiguity: Data entries should be clear and unambiguous
  • Regular Review: CSV content should be reviewed periodically for relevance

9. Security Considerations

  • No Sensitive Data: Avoid including sensitive, personal, or confidential information
  • Data Sanitization: CSV data should be sanitized for security issues
  • Access Control: Access to CSV files should be controlled when necessary
  • Audit Trail: Changes to CSV files should be logged when appropriate

10. Performance Standards

  • Fast Loading: CSV files must load quickly within workflow execution
  • Memory Efficient: Structure should minimize memory usage during processing
  • Optimized Queries: If data lookup is needed, optimize for efficient access
  • Caching Strategy: Consider whether data can be cached for performance

Implementation Guidelines

When creating CSV data files for BMAD workflows:

  1. Start with Purpose: Clearly define why CSV is needed instead of LLM generation
  2. Design Structure: Plan columns and data types before creating the file
  3. Test Integration: Ensure workflow properly accesses and uses CSV data
  4. Document Thoroughly: Provide complete documentation for future maintenance
  5. Validate Quality: Check data quality, format consistency, and integration
  6. Monitor Usage: Track how CSV data is used and optimize as needed

Common Anti-Patterns to Avoid

  • Generic Phrases: CSV files containing common phrases or LLM-generated content
  • Redundant Data: Duplicating information easily available to LLMs
  • Overly Complex: Unnecessarily complex CSV structures when simple data suffices
  • Unused Columns: Columns that are defined but never referenced in workflows
  • Poor Formatting: Inconsistent data formats, missing values, or structural issues
  • No Documentation: CSV files without clear purpose or usage documentation

Validation Checklist

For each CSV file, verify:

  • Purpose is essential and cannot be replaced by LLM generation
  • All columns are used in the workflow
  • Data is properly formatted and consistent
  • File is efficiently sized and structured
  • Documentation is complete and clear
  • Integration with workflow is tested and working
  • Security considerations are addressed
  • Performance requirements are met