# CSV Data File Standards for BMAD Workflows ## Purpose and Usage CSV data files in BMAD workflows serve specific purposes for different workflow types: **For Agents:** Provide structured data that agents need to reference but cannot realistically generate (such as specific configurations, domain-specific data, or structured knowledge bases). **For Expert Agents:** Supply specialized knowledge bases, reference data, or persistent information that the expert agent needs to access consistently across sessions. **For Workflows:** Include reference data, configuration parameters, or structured inputs that guide workflow execution and decision-making. **Key Principle:** CSV files should contain data that is essential, structured, and not easily generated by LLMs during execution. ## Intent-Based Design Principle **Core Philosophy:** The closer workflows stay to **intent** rather than **prescriptive** instructions, the more creative and adaptive the LLM experience becomes. **CSV Enables Intent-Based Design:** - **Instead of:** Hardcoded scripts with exact phrases LLM must say - **CSV Provides:** Clear goals and patterns that LLM adapts creatively to context - **Result:** Natural, contextual conversations rather than rigid scripts **Example - Advanced Elicitation:** - **Prescriptive Alternative:** 50 separate files with exact conversation scripts - **Intent-Based Reality:** One CSV row with method goal + pattern → LLM adapts to user - **Benefit:** Same method works differently for different users while maintaining essence **Intent vs Prescriptive Spectrum:** - **Highly Prescriptive:** "Say exactly: 'Based on my analysis, I recommend...'" - **Balanced Intent:** "Help the user understand the implications using your professional judgment" - **CSV Goal:** Provide just enough guidance to enable creative, context-aware execution ## Primary Use Cases ### 1. Knowledge Base Indexing (Document Lookup Optimization) **Problem:** Large knowledge bases with hundreds of documents cause context blowup and missed details when LLMs try to process them all. **CSV Solution:** Create a knowledge base index with: - **Column 1:** Keywords and topics - **Column 2:** Document file path/location - **Column 3:** Section or line number where relevant content starts - **Column 4:** Content type or summary (optional) **Result:** Transform from context-blowing document loads to surgical precision lookups, creating agents with near-infinite knowledge bases while maintaining optimal context usage. ### 2. Workflow Sequence Optimization **Problem:** Complex workflows (e.g., game development) with hundreds of potential steps for different scenarios become unwieldy and context-heavy. **CSV Solution:** Create a workflow routing table: - **Column 1:** Scenario type (e.g., "2D Platformer", "RPG", "Puzzle Game") - **Column 2:** Required step sequence (e.g., "step-01,step-03,step-07,step-12") - **Column 3:** Document sections to include - **Column 4:** Specialized parameters or configurations **Result:** Step 1 determines user needs, finds closest match in CSV, confirms with user, then follows optimized sequence - truly optimal for context usage. ### 3. Method Registry (Dynamic Technique Selection) **Problem:** Tasks need to select optimal techniques from dozens of options based on context, without hardcoding selection logic. **CSV Solution:** Create a method registry with: - **Column 1:** Category (collaboration, advanced, technical, creative, etc.) - **Column 2:** Method name and rich description - **Column 3:** Execution pattern or flow guide (e.g., "analysis → insights → action") - **Column 4:** Complexity level or use case indicators **Example:** Advanced Elicitation task analyzes content context, selects 5 best-matched methods from 50 options, then executes dynamically using CSV descriptions. **Result:** Smart, context-aware technique selection without hardcoded logic - infinitely extensible method libraries. ### 4. Configuration Management **Problem:** Complex systems with many configuration options that vary by use case. **CSV Solution:** Configuration lookup tables mapping scenarios to specific parameter sets. ## What NOT to Include in CSV Files **Avoid Web-Searchable Data:** Do not include information that LLMs can readily access through web search or that exists in their training data, such as: - Common programming syntax or standard library functions - General knowledge about widely used technologies - Historical facts or commonly available information - Basic terminology or standard definitions **Include Specialized Data:** Focus on data that is: - Specific to your project or domain - Not readily available through web search - Essential for consistent workflow execution - Too voluminous for LLM context windows ## CSV Data File Standards ### 1. Purpose Validation - **Essential Data Only:** CSV must contain data that cannot be reasonably generated by LLMs - **Domain Specific:** Data should be specific to the workflow's domain or purpose - **Consistent Usage:** All columns and data must be referenced and used somewhere in the workflow - **No Redundancy:** Avoid data that duplicates functionality already available to LLMs ### 2. Structural Standards - **Valid CSV Format:** Proper comma-separated values with quoted fields where needed - **Consistent Columns:** All rows must have the same number of columns - **No Missing Data:** Empty values should be explicitly marked (e.g., "", "N/A", or NULL) - **Header Row:** First row must contain clear, descriptive column headers - **Proper Encoding:** UTF-8 encoding required for special characters ### 3. Content Standards - **No LLM-Generated Content:** Avoid data that LLMs can easily generate (e.g., generic phrases, common knowledge) - **Specific and Concrete:** Use specific values rather than vague descriptions - **Verifiable Data:** Data should be factual and verifiable when possible - **Consistent Formatting:** Date formats, numbers, and text should follow consistent patterns ### 4. Column Standards - **Clear Headers:** Column names must be descriptive and self-explanatory - **Consistent Data Types:** Each column should contain consistent data types - **No Unused Columns:** Every column must be referenced and used in the workflow - **Appropriate Width:** Columns should be reasonably narrow and focused ### 5. File Size Standards - **Efficient Structure:** CSV files should be as small as possible while maintaining functionality - **No Redundant Rows:** Avoid duplicate or nearly identical rows - **Compressed Data:** Use efficient data representation (e.g., codes instead of full descriptions) - **Maximum Size:** Individual CSV files should not exceed 1MB unless absolutely necessary ### 6. Documentation Standards - **Documentation Required:** Each CSV file should have documentation explaining its purpose - **Column Descriptions:** Each column must be documented with its usage and format - **Data Sources:** Source of data should be documented when applicable - **Update Procedures:** Process for updating CSV data should be documented ### 7. Integration Standards - **File References:** CSV files must be properly referenced in workflow configuration - **Access Patterns:** Workflow must clearly define how and when CSV data is accessed - **Error Handling:** Workflow must handle cases where CSV files are missing or corrupted - **Version Control:** CSV files should be versioned when changes occur ### 8. Quality Assurance - **Data Validation:** CSV data should be validated for correctness and completeness - **Format Consistency:** Consistent formatting across all rows and columns - **No Ambiguity:** Data entries should be clear and unambiguous - **Regular Review:** CSV content should be reviewed periodically for relevance ### 9. Security Considerations - **No Sensitive Data:** Avoid including sensitive, personal, or confidential information - **Data Sanitization:** CSV data should be sanitized for security issues - **Access Control:** Access to CSV files should be controlled when necessary - **Audit Trail:** Changes to CSV files should be logged when appropriate ### 10. Performance Standards - **Fast Loading:** CSV files must load quickly within workflow execution - **Memory Efficient:** Structure should minimize memory usage during processing - **Optimized Queries:** If data lookup is needed, optimize for efficient access - **Caching Strategy**: Consider whether data can be cached for performance ## Implementation Guidelines When creating CSV data files for BMAD workflows: 1. **Start with Purpose:** Clearly define why CSV is needed instead of LLM generation 2. **Design Structure:** Plan columns and data types before creating the file 3. **Test Integration:** Ensure workflow properly accesses and uses CSV data 4. **Document Thoroughly:** Provide complete documentation for future maintenance 5. **Validate Quality:** Check data quality, format consistency, and integration 6. **Monitor Usage:** Track how CSV data is used and optimize as needed ## Common Anti-Patterns to Avoid - **Generic Phrases:** CSV files containing common phrases or LLM-generated content - **Redundant Data:** Duplicating information easily available to LLMs - **Overly Complex:** Unnecessarily complex CSV structures when simple data suffices - **Unused Columns:** Columns that are defined but never referenced in workflows - **Poor Formatting:** Inconsistent data formats, missing values, or structural issues - **No Documentation:** CSV files without clear purpose or usage documentation ## Validation Checklist For each CSV file, verify: - [ ] Purpose is essential and cannot be replaced by LLM generation - [ ] All columns are used in the workflow - [ ] Data is properly formatted and consistent - [ ] File is efficiently sized and structured - [ ] Documentation is complete and clear - [ ] Integration with workflow is tested and working - [ ] Security considerations are addressed - [ ] Performance requirements are met