171 lines
6.4 KiB
Markdown
171 lines
6.4 KiB
Markdown
<!-- Powered by BMAD™ Core -->
|
|
|
|
# RFQ Document Sharding Task
|
|
|
|
## Purpose
|
|
|
|
- Break down large RFQ documents into smaller, more manageable pieces (shards) for effective LLM processing
|
|
- Preserve context and meaning while working within token limitations
|
|
- Enable parallel processing of different RFQ sections
|
|
- Improve efficiency in analyzing extensive government RFQ documents
|
|
- Maintain traceability between sharded components
|
|
|
|
## Usage Scenarios
|
|
|
|
### Scenario 1: Initial RFQ Processing
|
|
|
|
1. **Document Assessment**: Evaluate RFQ document size and complexity
|
|
2. **Sharding Strategy Selection**: Choose appropriate sharding approach based on document structure
|
|
3. **Shard Creation**: Break document into logical, context-preserving shards
|
|
4. **Metadata Enrichment**: Add RFQ-specific metadata to each shard
|
|
5. **Manifest Generation**: Create a shard manifest for navigation and relationship tracking
|
|
|
|
### Scenario 2: Processing Large Section L/M Documents
|
|
|
|
1. **Section Identification**: Locate Section L (instructions) and Section M (evaluation criteria)
|
|
2. **Requirement-Based Sharding**: Break sections into requirement-specific shards
|
|
3. **Cross-Reference Preservation**: Maintain links between related requirements
|
|
4. **Compliance Mapping**: Tag shards with compliance identifiers
|
|
5. **Content Distribution**: Distribute shards to appropriate subject matter experts
|
|
|
|
## Task Instructions
|
|
|
|
### 1. Document Analysis and Strategy Selection
|
|
|
|
**Analysis Process**:
|
|
|
|
1. **Size Assessment**:
|
|
- Calculate total token count of RFQ documents
|
|
- Identify sections exceeding token limits
|
|
- Determine optimal shard size based on document complexity
|
|
|
|
2. **Structure Analysis**:
|
|
- Identify natural document break points (sections, subsections)
|
|
- Map hierarchical relationships between document components
|
|
- Analyze section dependencies and references
|
|
|
|
3. **Sharding Strategy Selection**:
|
|
- **Structural Sharding**: Break at section/subsection boundaries
|
|
- **Semantic Sharding**: Break based on content meaning/topic
|
|
- **Requirement-Based Sharding**: Break at individual requirement boundaries
|
|
- **Hybrid Approach**: Combine strategies based on document characteristics
|
|
|
|
### 2. Shard Creation and Processing
|
|
|
|
**Sharding Process**:
|
|
|
|
1. **Document Preparation**:
|
|
- Normalize formatting for consistent processing
|
|
- Extract section headings and numbering
|
|
- Identify and preserve formatting structures
|
|
|
|
2. **Boundary Identification**:
|
|
- Locate optimal break points based on selected strategy
|
|
- Ensure each shard has sufficient context
|
|
- Preserve paragraph integrity where possible
|
|
- Respect logical content boundaries
|
|
|
|
3. **Shard Generation**:
|
|
- Create individual shard files with unique identifiers
|
|
- Include contextual information from surrounding content
|
|
- Apply consistent naming convention
|
|
- Set appropriate shard size (3000-5000 tokens recommended)
|
|
|
|
### 3. Metadata and Context Preservation
|
|
|
|
**Metadata Components**:
|
|
|
|
1. **RFQ-Specific Metadata**:
|
|
- RFQ number and section information
|
|
- Requirement IDs contained within shard
|
|
- Section numbers and hierarchical location
|
|
- Original page numbers from source document
|
|
|
|
2. **Context Preservation**:
|
|
- Include abbreviated section headings for context
|
|
- Add parent section information
|
|
- Preserve numbered lists and hierarchies
|
|
- Maintain table structures and formatting
|
|
|
|
3. **Cross-Reference Management**:
|
|
- Track references to other sections/attachments
|
|
- Note dependencies on other shards
|
|
- Preserve links to definitions or glossary terms
|
|
- Maintain requirement traceability
|
|
|
|
### 4. Shard Manifest Generation
|
|
|
|
**Manifest Elements**:
|
|
|
|
1. **Shard Catalog**:
|
|
- Comprehensive listing of all generated shards
|
|
- Hierarchical organization reflecting document structure
|
|
- Unique identifiers and file locations
|
|
- Contained requirement IDs
|
|
|
|
2. **Relationship Mapping**:
|
|
- Parent-child relationships between shards
|
|
- Cross-references between related shards
|
|
- Dependency tracking for proper sequencing
|
|
- Content overlap indicators
|
|
|
|
3. **Navigation Support**:
|
|
- Table of contents for all shards
|
|
- Quick reference for locating specific content
|
|
- Search optimized metadata
|
|
- Visualization of shard relationships
|
|
|
|
### 5. Integration with Workflow
|
|
|
|
**Workflow Integration**:
|
|
|
|
1. **Shard Distribution**:
|
|
- Assign shards to appropriate team members
|
|
- Track shard ownership and responsibility
|
|
- Enable parallel processing of different shards
|
|
- Coordinate work across related shards
|
|
|
|
2. **Shard Status Tracking**:
|
|
- Monitor processing status of each shard
|
|
- Track completion percentage
|
|
- Identify bottlenecks or dependencies
|
|
- Prioritize critical path shards
|
|
|
|
3. **Reassembly Planning**:
|
|
- Define shard recombination process
|
|
- Establish quality checks for reassembled content
|
|
- Create Integration tests for shard boundaries
|
|
- Document reassembly sequence
|
|
|
|
## Best Practices
|
|
|
|
- **Appropriate Shard Size**: Target 3000-5000 tokens per shard for optimal processing
|
|
- **Context Preservation**: Include sufficient context at shard boundaries (overlap of 100-200 tokens recommended)
|
|
- **Consistent Metadata**: Apply uniform metadata structure across all shards
|
|
- **Natural Boundaries**: Break at natural section boundaries whenever possible
|
|
- **Requirement Integrity**: Avoid splitting individual requirements across shards
|
|
- **Relationship Tracking**: Maintain clear linkages between related shards
|
|
- **Version Control**: Implement versioning for shards to track changes
|
|
- **Complete Coverage**: Ensure no content is lost during sharding process
|
|
- **Parallel Processing**: Design shards to enable efficient parallel processing
|
|
|
|
## Integration Points
|
|
|
|
- **RFQ Document Import**: Receives imported RFQ documents as input
|
|
- **Compliance Matrix Generation**: Feeds sharded documents to compliance tracking
|
|
- **Requirements Traceability**: Provides metadata for requirement tracking
|
|
- **Proposal Content Development**: Supplies relevant RFQ sections to content creators
|
|
- **Evaluation Simulation**: Enables comprehensive review across sharded content
|
|
|
|
## Related Agents
|
|
|
|
- **RFQ Opportunity Summarizer**: Primary agent for RFQ document processing
|
|
- **Compliance Matrix Builder**: Consumers of sharded RFQ documents
|
|
|
|
## Technical Implementation Notes
|
|
|
|
- Implements core capabilities from BMAD Core's `shard-doc.md` task
|
|
- Extended with government RFQ-specific functionality
|
|
- Compatible with existing flattener architecture
|
|
- Enables 30-50% improvement in processing efficiency for large RFQs
|