207 lines
9.8 KiB
Markdown
207 lines
9.8 KiB
Markdown
# CSV Data File Standards for BMAD Workflows
|
|
|
|
## Purpose and Usage
|
|
|
|
CSV data files in BMAD workflows serve specific purposes for different workflow types:
|
|
|
|
**For Agents:** Provide structured data that agents need to reference but cannot realistically generate (such as specific configurations, domain-specific data, or structured knowledge bases).
|
|
|
|
**For Expert Agents:** Supply specialized knowledge bases, reference data, or persistent information that the expert agent needs to access consistently across sessions.
|
|
|
|
**For Workflows:** Include reference data, configuration parameters, or structured inputs that guide workflow execution and decision-making.
|
|
|
|
**Key Principle:** CSV files should contain data that is essential, structured, and not easily generated by LLMs during execution.
|
|
|
|
## Intent-Based Design Principle
|
|
|
|
**Core Philosophy:** The closer workflows stay to **intent** rather than **prescriptive** instructions, the more creative and adaptive the LLM experience becomes.
|
|
|
|
**CSV Enables Intent-Based Design:**
|
|
|
|
- **Instead of:** Hardcoded scripts with exact phrases LLM must say
|
|
- **CSV Provides:** Clear goals and patterns that LLM adapts creatively to context
|
|
- **Result:** Natural, contextual conversations rather than rigid scripts
|
|
|
|
**Example - Advanced Elicitation:**
|
|
|
|
- **Prescriptive Alternative:** 50 separate files with exact conversation scripts
|
|
- **Intent-Based Reality:** One CSV row with method goal + pattern → LLM adapts to user
|
|
- **Benefit:** Same method works differently for different users while maintaining essence
|
|
|
|
**Intent vs Prescriptive Spectrum:**
|
|
|
|
- **Highly Prescriptive:** "Say exactly: 'Based on my analysis, I recommend...'"
|
|
- **Balanced Intent:** "Help the user understand the implications using your professional judgment"
|
|
- **CSV Goal:** Provide just enough guidance to enable creative, context-aware execution
|
|
|
|
## Primary Use Cases
|
|
|
|
### 1. Knowledge Base Indexing (Document Lookup Optimization)
|
|
|
|
**Problem:** Large knowledge bases with hundreds of documents cause context blowup and missed details when LLMs try to process them all.
|
|
|
|
**CSV Solution:** Create a knowledge base index with:
|
|
|
|
- **Column 1:** Keywords and topics
|
|
- **Column 2:** Document file path/location
|
|
- **Column 3:** Section or line number where relevant content starts
|
|
- **Column 4:** Content type or summary (optional)
|
|
|
|
**Result:** Transform from context-blowing document loads to surgical precision lookups, creating agents with near-infinite knowledge bases while maintaining optimal context usage.
|
|
|
|
### 2. Workflow Sequence Optimization
|
|
|
|
**Problem:** Complex workflows (e.g., game development) with hundreds of potential steps for different scenarios become unwieldy and context-heavy.
|
|
|
|
**CSV Solution:** Create a workflow routing table:
|
|
|
|
- **Column 1:** Scenario type (e.g., "2D Platformer", "RPG", "Puzzle Game")
|
|
- **Column 2:** Required step sequence (e.g., "step-01,step-03,step-07,step-12")
|
|
- **Column 3:** Document sections to include
|
|
- **Column 4:** Specialized parameters or configurations
|
|
|
|
**Result:** Step 1 determines user needs, finds closest match in CSV, confirms with user, then follows optimized sequence - truly optimal for context usage.
|
|
|
|
### 3. Method Registry (Dynamic Technique Selection)
|
|
|
|
**Problem:** Tasks need to select optimal techniques from dozens of options based on context, without hardcoding selection logic.
|
|
|
|
**CSV Solution:** Create a method registry with:
|
|
|
|
- **Column 1:** Category (collaboration, advanced, technical, creative, etc.)
|
|
- **Column 2:** Method name and rich description
|
|
- **Column 3:** Execution pattern or flow guide (e.g., "analysis → insights → action")
|
|
- **Column 4:** Complexity level or use case indicators
|
|
|
|
**Example:** Advanced Elicitation task analyzes content context, selects 5 best-matched methods from 50 options, then executes dynamically using CSV descriptions.
|
|
|
|
**Result:** Smart, context-aware technique selection without hardcoded logic - infinitely extensible method libraries.
|
|
|
|
### 4. Configuration Management
|
|
|
|
**Problem:** Complex systems with many configuration options that vary by use case.
|
|
|
|
**CSV Solution:** Configuration lookup tables mapping scenarios to specific parameter sets.
|
|
|
|
## What NOT to Include in CSV Files
|
|
|
|
**Avoid Web-Searchable Data:** Do not include information that LLMs can readily access through web search or that exists in their training data, such as:
|
|
|
|
- Common programming syntax or standard library functions
|
|
- General knowledge about widely used technologies
|
|
- Historical facts or commonly available information
|
|
- Basic terminology or standard definitions
|
|
|
|
**Include Specialized Data:** Focus on data that is:
|
|
|
|
- Specific to your project or domain
|
|
- Not readily available through web search
|
|
- Essential for consistent workflow execution
|
|
- Too voluminous for LLM context windows
|
|
|
|
## CSV Data File Standards
|
|
|
|
### 1. Purpose Validation
|
|
|
|
- **Essential Data Only:** CSV must contain data that cannot be reasonably generated by LLMs
|
|
- **Domain Specific:** Data should be specific to the workflow's domain or purpose
|
|
- **Consistent Usage:** All columns and data must be referenced and used somewhere in the workflow
|
|
- **No Redundancy:** Avoid data that duplicates functionality already available to LLMs
|
|
|
|
### 2. Structural Standards
|
|
|
|
- **Valid CSV Format:** Proper comma-separated values with quoted fields where needed
|
|
- **Consistent Columns:** All rows must have the same number of columns
|
|
- **No Missing Data:** Empty values should be explicitly marked (e.g., "", "N/A", or NULL)
|
|
- **Header Row:** First row must contain clear, descriptive column headers
|
|
- **Proper Encoding:** UTF-8 encoding required for special characters
|
|
|
|
### 3. Content Standards
|
|
|
|
- **No LLM-Generated Content:** Avoid data that LLMs can easily generate (e.g., generic phrases, common knowledge)
|
|
- **Specific and Concrete:** Use specific values rather than vague descriptions
|
|
- **Verifiable Data:** Data should be factual and verifiable when possible
|
|
- **Consistent Formatting:** Date formats, numbers, and text should follow consistent patterns
|
|
|
|
### 4. Column Standards
|
|
|
|
- **Clear Headers:** Column names must be descriptive and self-explanatory
|
|
- **Consistent Data Types:** Each column should contain consistent data types
|
|
- **No Unused Columns:** Every column must be referenced and used in the workflow
|
|
- **Appropriate Width:** Columns should be reasonably narrow and focused
|
|
|
|
### 5. File Size Standards
|
|
|
|
- **Efficient Structure:** CSV files should be as small as possible while maintaining functionality
|
|
- **No Redundant Rows:** Avoid duplicate or nearly identical rows
|
|
- **Compressed Data:** Use efficient data representation (e.g., codes instead of full descriptions)
|
|
- **Maximum Size:** Individual CSV files should not exceed 1MB unless absolutely necessary
|
|
|
|
### 6. Documentation Standards
|
|
|
|
- **Documentation Required:** Each CSV file should have documentation explaining its purpose
|
|
- **Column Descriptions:** Each column must be documented with its usage and format
|
|
- **Data Sources:** Source of data should be documented when applicable
|
|
- **Update Procedures:** Process for updating CSV data should be documented
|
|
|
|
### 7. Integration Standards
|
|
|
|
- **File References:** CSV files must be properly referenced in workflow configuration
|
|
- **Access Patterns:** Workflow must clearly define how and when CSV data is accessed
|
|
- **Error Handling:** Workflow must handle cases where CSV files are missing or corrupted
|
|
- **Version Control:** CSV files should be versioned when changes occur
|
|
|
|
### 8. Quality Assurance
|
|
|
|
- **Data Validation:** CSV data should be validated for correctness and completeness
|
|
- **Format Consistency:** Consistent formatting across all rows and columns
|
|
- **No Ambiguity:** Data entries should be clear and unambiguous
|
|
- **Regular Review:** CSV content should be reviewed periodically for relevance
|
|
|
|
### 9. Security Considerations
|
|
|
|
- **No Sensitive Data:** Avoid including sensitive, personal, or confidential information
|
|
- **Data Sanitization:** CSV data should be sanitized for security issues
|
|
- **Access Control:** Access to CSV files should be controlled when necessary
|
|
- **Audit Trail:** Changes to CSV files should be logged when appropriate
|
|
|
|
### 10. Performance Standards
|
|
|
|
- **Fast Loading:** CSV files must load quickly within workflow execution
|
|
- **Memory Efficient:** Structure should minimize memory usage during processing
|
|
- **Optimized Queries:** If data lookup is needed, optimize for efficient access
|
|
- **Caching Strategy**: Consider whether data can be cached for performance
|
|
|
|
## Implementation Guidelines
|
|
|
|
When creating CSV data files for BMAD workflows:
|
|
|
|
1. **Start with Purpose:** Clearly define why CSV is needed instead of LLM generation
|
|
2. **Design Structure:** Plan columns and data types before creating the file
|
|
3. **Test Integration:** Ensure workflow properly accesses and uses CSV data
|
|
4. **Document Thoroughly:** Provide complete documentation for future maintenance
|
|
5. **Validate Quality:** Check data quality, format consistency, and integration
|
|
6. **Monitor Usage:** Track how CSV data is used and optimize as needed
|
|
|
|
## Common Anti-Patterns to Avoid
|
|
|
|
- **Generic Phrases:** CSV files containing common phrases or LLM-generated content
|
|
- **Redundant Data:** Duplicating information easily available to LLMs
|
|
- **Overly Complex:** Unnecessarily complex CSV structures when simple data suffices
|
|
- **Unused Columns:** Columns that are defined but never referenced in workflows
|
|
- **Poor Formatting:** Inconsistent data formats, missing values, or structural issues
|
|
- **No Documentation:** CSV files without clear purpose or usage documentation
|
|
|
|
## Validation Checklist
|
|
|
|
For each CSV file, verify:
|
|
|
|
- [ ] Purpose is essential and cannot be replaced by LLM generation
|
|
- [ ] All columns are used in the workflow
|
|
- [ ] Data is properly formatted and consistent
|
|
- [ ] File is efficiently sized and structured
|
|
- [ ] Documentation is complete and clear
|
|
- [ ] Integration with workflow is tested and working
|
|
- [ ] Security considerations are addressed
|
|
- [ ] Performance requirements are met
|