BMAD-METHOD/expansion-packs/bmad-rfq-government/tasks/rfq-document-sharding.md

<!-- Powered by BMAD™ Core -->

# RFQ Document Sharding Task

## Purpose

- Break down large RFQ documents into smaller, more manageable pieces (shards) for effective LLM processing
- Preserve context and meaning while working within token limitations
- Enable parallel processing of different RFQ sections
- Improve efficiency in analyzing extensive government RFQ documents
- Maintain traceability between sharded components

## Usage Scenarios

### Scenario 1: Initial RFQ Processing

1. **Document Assessment**: Evaluate RFQ document size and complexity
2. **Sharding Strategy Selection**: Choose appropriate sharding approach based on document structure
3. **Shard Creation**: Break document into logical, context-preserving shards
4. **Metadata Enrichment**: Add RFQ-specific metadata to each shard
5. **Manifest Generation**: Create a shard manifest for navigation and relationship tracking

### Scenario 2: Processing Large Section L/M Documents

1. **Section Identification**: Locate Section L (instructions) and Section M (evaluation criteria)
2. **Requirement-Based Sharding**: Break sections into requirement-specific shards
3. **Cross-Reference Preservation**: Maintain links between related requirements
4. **Compliance Mapping**: Tag shards with compliance identifiers
5. **Content Distribution**: Distribute shards to appropriate subject matter experts

## Task Instructions

### 1. Document Analysis and Strategy Selection

**Analysis Process**:

1. **Size Assessment**:
   - Calculate total token count of RFQ documents
   - Identify sections exceeding token limits
   - Determine optimal shard size based on document complexity

2. **Structure Analysis**:
   - Identify natural document break points (sections, subsections)
   - Map hierarchical relationships between document components
   - Analyze section dependencies and references

3. **Sharding Strategy Selection**:
   - **Structural Sharding**: Break at section/subsection boundaries
   - **Semantic Sharding**: Break based on content meaning/topic
   - **Requirement-Based Sharding**: Break at individual requirement boundaries
   - **Hybrid Approach**: Combine strategies based on document characteristics

### 2. Shard Creation and Processing

**Sharding Process**:

1. **Document Preparation**:
   - Normalize formatting for consistent processing
   - Extract section headings and numbering
   - Identify and preserve formatting structures

2. **Boundary Identification**:
   - Locate optimal break points based on selected strategy
   - Ensure each shard has sufficient context
   - Preserve paragraph integrity where possible
   - Respect logical content boundaries

3. **Shard Generation**:
   - Create individual shard files with unique identifiers
   - Include contextual information from surrounding content
   - Apply consistent naming convention
   - Set appropriate shard size (3000-5000 tokens recommended)

### 3. Metadata and Context Preservation

**Metadata Components**:

1. **RFQ-Specific Metadata**:
   - RFQ number and section information
   - Requirement IDs contained within shard
   - Section numbers and hierarchical location
   - Original page numbers from source document

2. **Context Preservation**:
   - Include abbreviated section headings for context
   - Add parent section information
   - Preserve numbered lists and hierarchies
   - Maintain table structures and formatting

3. **Cross-Reference Management**:
   - Track references to other sections/attachments
   - Note dependencies on other shards
   - Preserve links to definitions or glossary terms
   - Maintain requirement traceability

### 4. Shard Manifest Generation

**Manifest Elements**:

1. **Shard Catalog**:
   - Comprehensive listing of all generated shards
   - Hierarchical organization reflecting document structure
   - Unique identifiers and file locations
   - Contained requirement IDs

2. **Relationship Mapping**:
   - Parent-child relationships between shards
   - Cross-references between related shards
   - Dependency tracking for proper sequencing
   - Content overlap indicators

3. **Navigation Support**:
   - Table of contents for all shards
   - Quick reference for locating specific content
   - Search optimized metadata
   - Visualization of shard relationships

### 5. Integration with Workflow

**Workflow Integration**:

1. **Shard Distribution**:
   - Assign shards to appropriate team members
   - Track shard ownership and responsibility
   - Enable parallel processing of different shards
   - Coordinate work across related shards

2. **Shard Status Tracking**:
   - Monitor processing status of each shard
   - Track completion percentage
   - Identify bottlenecks or dependencies
   - Prioritize critical path shards

3. **Reassembly Planning**:
   - Define shard recombination process
   - Establish quality checks for reassembled content
   - Create Integration tests for shard boundaries
   - Document reassembly sequence

## Best Practices

- **Appropriate Shard Size**: Target 3000-5000 tokens per shard for optimal processing
- **Context Preservation**: Include sufficient context at shard boundaries (overlap of 100-200 tokens recommended)
- **Consistent Metadata**: Apply uniform metadata structure across all shards
- **Natural Boundaries**: Break at natural section boundaries whenever possible
- **Requirement Integrity**: Avoid splitting individual requirements across shards
- **Relationship Tracking**: Maintain clear linkages between related shards
- **Version Control**: Implement versioning for shards to track changes
- **Complete Coverage**: Ensure no content is lost during sharding process
- **Parallel Processing**: Design shards to enable efficient parallel processing

## Integration Points

- **RFQ Document Import**: Receives imported RFQ documents as input
- **Compliance Matrix Generation**: Feeds sharded documents to compliance tracking
- **Requirements Traceability**: Provides metadata for requirement tracking
- **Proposal Content Development**: Supplies relevant RFQ sections to content creators
- **Evaluation Simulation**: Enables comprehensive review across sharded content

## Related Agents

- **RFQ Opportunity Summarizer**: Primary agent for RFQ document processing
- **Compliance Matrix Builder**: Consumers of sharded RFQ documents

## Technical Implementation Notes

- Implements core capabilities from BMAD Core's `shard-doc.md` task
- Extended with government RFQ-specific functionality
- Compatible with existing flattener architecture
- Enables 30-50% improvement in processing efficiency for large RFQs