6.4 KiB
6.4 KiB
RFQ Document Sharding Task
Purpose
- Break down large RFQ documents into smaller, more manageable pieces (shards) for effective LLM processing
- Preserve context and meaning while working within token limitations
- Enable parallel processing of different RFQ sections
- Improve efficiency in analyzing extensive government RFQ documents
- Maintain traceability between sharded components
Usage Scenarios
Scenario 1: Initial RFQ Processing
- Document Assessment: Evaluate RFQ document size and complexity
- Sharding Strategy Selection: Choose appropriate sharding approach based on document structure
- Shard Creation: Break document into logical, context-preserving shards
- Metadata Enrichment: Add RFQ-specific metadata to each shard
- Manifest Generation: Create a shard manifest for navigation and relationship tracking
Scenario 2: Processing Large Section L/M Documents
- Section Identification: Locate Section L (instructions) and Section M (evaluation criteria)
- Requirement-Based Sharding: Break sections into requirement-specific shards
- Cross-Reference Preservation: Maintain links between related requirements
- Compliance Mapping: Tag shards with compliance identifiers
- Content Distribution: Distribute shards to appropriate subject matter experts
Task Instructions
1. Document Analysis and Strategy Selection
Analysis Process:
-
Size Assessment:
- Calculate total token count of RFQ documents
- Identify sections exceeding token limits
- Determine optimal shard size based on document complexity
-
Structure Analysis:
- Identify natural document break points (sections, subsections)
- Map hierarchical relationships between document components
- Analyze section dependencies and references
-
Sharding Strategy Selection:
- Structural Sharding: Break at section/subsection boundaries
- Semantic Sharding: Break based on content meaning/topic
- Requirement-Based Sharding: Break at individual requirement boundaries
- Hybrid Approach: Combine strategies based on document characteristics
2. Shard Creation and Processing
Sharding Process:
-
Document Preparation:
- Normalize formatting for consistent processing
- Extract section headings and numbering
- Identify and preserve formatting structures
-
Boundary Identification:
- Locate optimal break points based on selected strategy
- Ensure each shard has sufficient context
- Preserve paragraph integrity where possible
- Respect logical content boundaries
-
Shard Generation:
- Create individual shard files with unique identifiers
- Include contextual information from surrounding content
- Apply consistent naming convention
- Set appropriate shard size (3000-5000 tokens recommended)
3. Metadata and Context Preservation
Metadata Components:
-
RFQ-Specific Metadata:
- RFQ number and section information
- Requirement IDs contained within shard
- Section numbers and hierarchical location
- Original page numbers from source document
-
Context Preservation:
- Include abbreviated section headings for context
- Add parent section information
- Preserve numbered lists and hierarchies
- Maintain table structures and formatting
-
Cross-Reference Management:
- Track references to other sections/attachments
- Note dependencies on other shards
- Preserve links to definitions or glossary terms
- Maintain requirement traceability
4. Shard Manifest Generation
Manifest Elements:
-
Shard Catalog:
- Comprehensive listing of all generated shards
- Hierarchical organization reflecting document structure
- Unique identifiers and file locations
- Contained requirement IDs
-
Relationship Mapping:
- Parent-child relationships between shards
- Cross-references between related shards
- Dependency tracking for proper sequencing
- Content overlap indicators
-
Navigation Support:
- Table of contents for all shards
- Quick reference for locating specific content
- Search optimized metadata
- Visualization of shard relationships
5. Integration with Workflow
Workflow Integration:
-
Shard Distribution:
- Assign shards to appropriate team members
- Track shard ownership and responsibility
- Enable parallel processing of different shards
- Coordinate work across related shards
-
Shard Status Tracking:
- Monitor processing status of each shard
- Track completion percentage
- Identify bottlenecks or dependencies
- Prioritize critical path shards
-
Reassembly Planning:
- Define shard recombination process
- Establish quality checks for reassembled content
- Create Integration tests for shard boundaries
- Document reassembly sequence
Best Practices
- Appropriate Shard Size: Target 3000-5000 tokens per shard for optimal processing
- Context Preservation: Include sufficient context at shard boundaries (overlap of 100-200 tokens recommended)
- Consistent Metadata: Apply uniform metadata structure across all shards
- Natural Boundaries: Break at natural section boundaries whenever possible
- Requirement Integrity: Avoid splitting individual requirements across shards
- Relationship Tracking: Maintain clear linkages between related shards
- Version Control: Implement versioning for shards to track changes
- Complete Coverage: Ensure no content is lost during sharding process
- Parallel Processing: Design shards to enable efficient parallel processing
Integration Points
- RFQ Document Import: Receives imported RFQ documents as input
- Compliance Matrix Generation: Feeds sharded documents to compliance tracking
- Requirements Traceability: Provides metadata for requirement tracking
- Proposal Content Development: Supplies relevant RFQ sections to content creators
- Evaluation Simulation: Enables comprehensive review across sharded content
Related Agents
- RFQ Opportunity Summarizer: Primary agent for RFQ document processing
- Compliance Matrix Builder: Consumers of sharded RFQ documents
Technical Implementation Notes
- Implements core capabilities from BMAD Core's
shard-doc.mdtask - Extended with government RFQ-specific functionality
- Compatible with existing flattener architecture
- Enables 30-50% improvement in processing efficiency for large RFQs