9.8 KiB
CSV Data File Standards for BMAD Workflows
Purpose and Usage
CSV data files in BMAD workflows serve specific purposes for different workflow types:
For Agents: Provide structured data that agents need to reference but cannot realistically generate (such as specific configurations, domain-specific data, or structured knowledge bases).
For Expert Agents: Supply specialized knowledge bases, reference data, or persistent information that the expert agent needs to access consistently across sessions.
For Workflows: Include reference data, configuration parameters, or structured inputs that guide workflow execution and decision-making.
Key Principle: CSV files should contain data that is essential, structured, and not easily generated by LLMs during execution.
Intent-Based Design Principle
Core Philosophy: The closer workflows stay to intent rather than prescriptive instructions, the more creative and adaptive the LLM experience becomes.
CSV Enables Intent-Based Design:
- Instead of: Hardcoded scripts with exact phrases LLM must say
- CSV Provides: Clear goals and patterns that LLM adapts creatively to context
- Result: Natural, contextual conversations rather than rigid scripts
Example - Advanced Elicitation:
- Prescriptive Alternative: 50 separate files with exact conversation scripts
- Intent-Based Reality: One CSV row with method goal + pattern → LLM adapts to user
- Benefit: Same method works differently for different users while maintaining essence
Intent vs Prescriptive Spectrum:
- Highly Prescriptive: "Say exactly: 'Based on my analysis, I recommend...'"
- Balanced Intent: "Help the user understand the implications using your professional judgment"
- CSV Goal: Provide just enough guidance to enable creative, context-aware execution
Primary Use Cases
1. Knowledge Base Indexing (Document Lookup Optimization)
Problem: Large knowledge bases with hundreds of documents cause context blowup and missed details when LLMs try to process them all.
CSV Solution: Create a knowledge base index with:
- Column 1: Keywords and topics
- Column 2: Document file path/location
- Column 3: Section or line number where relevant content starts
- Column 4: Content type or summary (optional)
Result: Transform from context-blowing document loads to surgical precision lookups, creating agents with near-infinite knowledge bases while maintaining optimal context usage.
2. Workflow Sequence Optimization
Problem: Complex workflows (e.g., game development) with hundreds of potential steps for different scenarios become unwieldy and context-heavy.
CSV Solution: Create a workflow routing table:
- Column 1: Scenario type (e.g., "2D Platformer", "RPG", "Puzzle Game")
- Column 2: Required step sequence (e.g., "step-01,step-03,step-07,step-12")
- Column 3: Document sections to include
- Column 4: Specialized parameters or configurations
Result: Step 1 determines user needs, finds closest match in CSV, confirms with user, then follows optimized sequence - truly optimal for context usage.
3. Method Registry (Dynamic Technique Selection)
Problem: Tasks need to select optimal techniques from dozens of options based on context, without hardcoding selection logic.
CSV Solution: Create a method registry with:
- Column 1: Category (collaboration, advanced, technical, creative, etc.)
- Column 2: Method name and rich description
- Column 3: Execution pattern or flow guide (e.g., "analysis → insights → action")
- Column 4: Complexity level or use case indicators
Example: Advanced Elicitation task analyzes content context, selects 5 best-matched methods from 50 options, then executes dynamically using CSV descriptions.
Result: Smart, context-aware technique selection without hardcoded logic - infinitely extensible method libraries.
4. Configuration Management
Problem: Complex systems with many configuration options that vary by use case.
CSV Solution: Configuration lookup tables mapping scenarios to specific parameter sets.
What NOT to Include in CSV Files
Avoid Web-Searchable Data: Do not include information that LLMs can readily access through web search or that exists in their training data, such as:
- Common programming syntax or standard library functions
- General knowledge about widely used technologies
- Historical facts or commonly available information
- Basic terminology or standard definitions
Include Specialized Data: Focus on data that is:
- Specific to your project or domain
- Not readily available through web search
- Essential for consistent workflow execution
- Too voluminous for LLM context windows
CSV Data File Standards
1. Purpose Validation
- Essential Data Only: CSV must contain data that cannot be reasonably generated by LLMs
- Domain Specific: Data should be specific to the workflow's domain or purpose
- Consistent Usage: All columns and data must be referenced and used somewhere in the workflow
- No Redundancy: Avoid data that duplicates functionality already available to LLMs
2. Structural Standards
- Valid CSV Format: Proper comma-separated values with quoted fields where needed
- Consistent Columns: All rows must have the same number of columns
- No Missing Data: Empty values should be explicitly marked (e.g., "", "N/A", or NULL)
- Header Row: First row must contain clear, descriptive column headers
- Proper Encoding: UTF-8 encoding required for special characters
3. Content Standards
- No LLM-Generated Content: Avoid data that LLMs can easily generate (e.g., generic phrases, common knowledge)
- Specific and Concrete: Use specific values rather than vague descriptions
- Verifiable Data: Data should be factual and verifiable when possible
- Consistent Formatting: Date formats, numbers, and text should follow consistent patterns
4. Column Standards
- Clear Headers: Column names must be descriptive and self-explanatory
- Consistent Data Types: Each column should contain consistent data types
- No Unused Columns: Every column must be referenced and used in the workflow
- Appropriate Width: Columns should be reasonably narrow and focused
5. File Size Standards
- Efficient Structure: CSV files should be as small as possible while maintaining functionality
- No Redundant Rows: Avoid duplicate or nearly identical rows
- Compressed Data: Use efficient data representation (e.g., codes instead of full descriptions)
- Maximum Size: Individual CSV files should not exceed 1MB unless absolutely necessary
6. Documentation Standards
- Documentation Required: Each CSV file should have documentation explaining its purpose
- Column Descriptions: Each column must be documented with its usage and format
- Data Sources: Source of data should be documented when applicable
- Update Procedures: Process for updating CSV data should be documented
7. Integration Standards
- File References: CSV files must be properly referenced in workflow configuration
- Access Patterns: Workflow must clearly define how and when CSV data is accessed
- Error Handling: Workflow must handle cases where CSV files are missing or corrupted
- Version Control: CSV files should be versioned when changes occur
8. Quality Assurance
- Data Validation: CSV data should be validated for correctness and completeness
- Format Consistency: Consistent formatting across all rows and columns
- No Ambiguity: Data entries should be clear and unambiguous
- Regular Review: CSV content should be reviewed periodically for relevance
9. Security Considerations
- No Sensitive Data: Avoid including sensitive, personal, or confidential information
- Data Sanitization: CSV data should be sanitized for security issues
- Access Control: Access to CSV files should be controlled when necessary
- Audit Trail: Changes to CSV files should be logged when appropriate
10. Performance Standards
- Fast Loading: CSV files must load quickly within workflow execution
- Memory Efficient: Structure should minimize memory usage during processing
- Optimized Queries: If data lookup is needed, optimize for efficient access
- Caching Strategy: Consider whether data can be cached for performance
Implementation Guidelines
When creating CSV data files for BMAD workflows:
- Start with Purpose: Clearly define why CSV is needed instead of LLM generation
- Design Structure: Plan columns and data types before creating the file
- Test Integration: Ensure workflow properly accesses and uses CSV data
- Document Thoroughly: Provide complete documentation for future maintenance
- Validate Quality: Check data quality, format consistency, and integration
- Monitor Usage: Track how CSV data is used and optimize as needed
Common Anti-Patterns to Avoid
- Generic Phrases: CSV files containing common phrases or LLM-generated content
- Redundant Data: Duplicating information easily available to LLMs
- Overly Complex: Unnecessarily complex CSV structures when simple data suffices
- Unused Columns: Columns that are defined but never referenced in workflows
- Poor Formatting: Inconsistent data formats, missing values, or structural issues
- No Documentation: CSV files without clear purpose or usage documentation
Validation Checklist
For each CSV file, verify:
- Purpose is essential and cannot be replaced by LLM generation
- All columns are used in the workflow
- Data is properly formatted and consistent
- File is efficiently sized and structured
- Documentation is complete and clear
- Integration with workflow is tested and working
- Security considerations are addressed
- Performance requirements are met