BMAD-METHOD/src/modules/bmb/docs/workflows/csv-data-file-standards.md

# CSV Data File Standards for BMAD Workflows

## Purpose and Usage

CSV data files in BMAD workflows serve specific purposes for different workflow types:

**For Agents:** Provide structured data that agents need to reference but cannot realistically generate (such as specific configurations, domain-specific data, or structured knowledge bases).

**For Expert Agents:** Supply specialized knowledge bases, reference data, or persistent information that the expert agent needs to access consistently across sessions.

**For Workflows:** Include reference data, configuration parameters, or structured inputs that guide workflow execution and decision-making.

**Key Principle:** CSV files should contain data that is essential, structured, and not easily generated by LLMs during execution.

## Intent-Based Design Principle

**Core Philosophy:** The closer workflows stay to **intent** rather than **prescriptive** instructions, the more creative and adaptive the LLM experience becomes.

**CSV Enables Intent-Based Design:**

- **Instead of:** Hardcoded scripts with exact phrases LLM must say
- **CSV Provides:** Clear goals and patterns that LLM adapts creatively to context
- **Result:** Natural, contextual conversations rather than rigid scripts

**Example - Advanced Elicitation:**

- **Prescriptive Alternative:** 50 separate files with exact conversation scripts
- **Intent-Based Reality:** One CSV row with method goal + pattern → LLM adapts to user
- **Benefit:** Same method works differently for different users while maintaining essence

**Intent vs Prescriptive Spectrum:**

- **Highly Prescriptive:** "Say exactly: 'Based on my analysis, I recommend...'"
- **Balanced Intent:** "Help the user understand the implications using your professional judgment"
- **CSV Goal:** Provide just enough guidance to enable creative, context-aware execution

## Primary Use Cases

### 1. Knowledge Base Indexing (Document Lookup Optimization)

**Problem:** Large knowledge bases with hundreds of documents cause context blowup and missed details when LLMs try to process them all.

**CSV Solution:** Create a knowledge base index with:

- **Column 1:** Keywords and topics
- **Column 2:** Document file path/location
- **Column 3:** Section or line number where relevant content starts
- **Column 4:** Content type or summary (optional)

**Result:** Transform from context-blowing document loads to surgical precision lookups, creating agents with near-infinite knowledge bases while maintaining optimal context usage.

### 2. Workflow Sequence Optimization

**Problem:** Complex workflows (e.g., game development) with hundreds of potential steps for different scenarios become unwieldy and context-heavy.

**CSV Solution:** Create a workflow routing table:

- **Column 1:** Scenario type (e.g., "2D Platformer", "RPG", "Puzzle Game")
- **Column 2:** Required step sequence (e.g., "step-01,step-03,step-07,step-12")
- **Column 3:** Document sections to include
- **Column 4:** Specialized parameters or configurations

**Result:** Step 1 determines user needs, finds closest match in CSV, confirms with user, then follows optimized sequence - truly optimal for context usage.

### 3. Method Registry (Dynamic Technique Selection)

**Problem:** Tasks need to select optimal techniques from dozens of options based on context, without hardcoding selection logic.

**CSV Solution:** Create a method registry with:

- **Column 1:** Category (collaboration, advanced, technical, creative, etc.)
- **Column 2:** Method name and rich description
- **Column 3:** Execution pattern or flow guide (e.g., "analysis → insights → action")
- **Column 4:** Complexity level or use case indicators

**Example:** Advanced Elicitation task analyzes content context, selects 5 best-matched methods from 50 options, then executes dynamically using CSV descriptions.

**Result:** Smart, context-aware technique selection without hardcoded logic - infinitely extensible method libraries.

### 4. Configuration Management

**Problem:** Complex systems with many configuration options that vary by use case.

**CSV Solution:** Configuration lookup tables mapping scenarios to specific parameter sets.

## What NOT to Include in CSV Files

**Avoid Web-Searchable Data:** Do not include information that LLMs can readily access through web search or that exists in their training data, such as:

- Common programming syntax or standard library functions
- General knowledge about widely used technologies
- Historical facts or commonly available information
- Basic terminology or standard definitions

**Include Specialized Data:** Focus on data that is:

- Specific to your project or domain
- Not readily available through web search
- Essential for consistent workflow execution
- Too voluminous for LLM context windows

## CSV Data File Standards

### 1. Purpose Validation

- **Essential Data Only:** CSV must contain data that cannot be reasonably generated by LLMs
- **Domain Specific:** Data should be specific to the workflow's domain or purpose
- **Consistent Usage:** All columns and data must be referenced and used somewhere in the workflow
- **No Redundancy:** Avoid data that duplicates functionality already available to LLMs

### 2. Structural Standards

- **Valid CSV Format:** Proper comma-separated values with quoted fields where needed
- **Consistent Columns:** All rows must have the same number of columns
- **No Missing Data:** Empty values should be explicitly marked (e.g., "", "N/A", or NULL)
- **Header Row:** First row must contain clear, descriptive column headers
- **Proper Encoding:** UTF-8 encoding required for special characters

### 3. Content Standards

- **No LLM-Generated Content:** Avoid data that LLMs can easily generate (e.g., generic phrases, common knowledge)
- **Specific and Concrete:** Use specific values rather than vague descriptions
- **Verifiable Data:** Data should be factual and verifiable when possible
- **Consistent Formatting:** Date formats, numbers, and text should follow consistent patterns

### 4. Column Standards

- **Clear Headers:** Column names must be descriptive and self-explanatory
- **Consistent Data Types:** Each column should contain consistent data types
- **No Unused Columns:** Every column must be referenced and used in the workflow
- **Appropriate Width:** Columns should be reasonably narrow and focused

### 5. File Size Standards

- **Efficient Structure:** CSV files should be as small as possible while maintaining functionality
- **No Redundant Rows:** Avoid duplicate or nearly identical rows
- **Compressed Data:** Use efficient data representation (e.g., codes instead of full descriptions)
- **Maximum Size:** Individual CSV files should not exceed 1MB unless absolutely necessary

### 6. Documentation Standards

- **Documentation Required:** Each CSV file should have documentation explaining its purpose
- **Column Descriptions:** Each column must be documented with its usage and format
- **Data Sources:** Source of data should be documented when applicable
- **Update Procedures:** Process for updating CSV data should be documented

### 7. Integration Standards

- **File References:** CSV files must be properly referenced in workflow configuration
- **Access Patterns:** Workflow must clearly define how and when CSV data is accessed
- **Error Handling:** Workflow must handle cases where CSV files are missing or corrupted
- **Version Control:** CSV files should be versioned when changes occur

### 8. Quality Assurance

- **Data Validation:** CSV data should be validated for correctness and completeness
- **Format Consistency:** Consistent formatting across all rows and columns
- **No Ambiguity:** Data entries should be clear and unambiguous
- **Regular Review:** CSV content should be reviewed periodically for relevance

### 9. Security Considerations

- **No Sensitive Data:** Avoid including sensitive, personal, or confidential information
- **Data Sanitization:** CSV data should be sanitized for security issues
- **Access Control:** Access to CSV files should be controlled when necessary
- **Audit Trail:** Changes to CSV files should be logged when appropriate

### 10. Performance Standards

- **Fast Loading:** CSV files must load quickly within workflow execution
- **Memory Efficient:** Structure should minimize memory usage during processing
- **Optimized Queries:** If data lookup is needed, optimize for efficient access
- **Caching Strategy**: Consider whether data can be cached for performance

## Implementation Guidelines

When creating CSV data files for BMAD workflows:

1. **Start with Purpose:** Clearly define why CSV is needed instead of LLM generation
2. **Design Structure:** Plan columns and data types before creating the file
3. **Test Integration:** Ensure workflow properly accesses and uses CSV data
4. **Document Thoroughly:** Provide complete documentation for future maintenance
5. **Validate Quality:** Check data quality, format consistency, and integration
6. **Monitor Usage:** Track how CSV data is used and optimize as needed

## Common Anti-Patterns to Avoid

- **Generic Phrases:** CSV files containing common phrases or LLM-generated content
- **Redundant Data:** Duplicating information easily available to LLMs
- **Overly Complex:** Unnecessarily complex CSV structures when simple data suffices
- **Unused Columns:** Columns that are defined but never referenced in workflows
- **Poor Formatting:** Inconsistent data formats, missing values, or structural issues
- **No Documentation:** CSV files without clear purpose or usage documentation

## Validation Checklist

For each CSV file, verify:

- [ ] Purpose is essential and cannot be replaced by LLM generation
- [ ] All columns are used in the workflow
- [ ] Data is properly formatted and consistent
- [ ] File is efficiently sized and structured
- [ ] Documentation is complete and clear
- [ ] Integration with workflow is tested and working
- [ ] Security considerations are addressed
- [ ] Performance requirements are met