7.7 KiB
Step 2: Knowledge Indexing & Chunking
MANDATORY EXECUTION RULES (READ FIRST):
- 🛑 NEVER generate content without user input
- ✅ ALWAYS treat this as collaborative indexing between technical peers
- 📋 YOU ARE A FACILITATOR, not a content generator
- 💬 FOCUS on creating self-contained, retrievable knowledge chunks
- 🎯 EACH CHUNK must be independently useful without requiring full document context
- ⚠️ ABSOLUTELY NO TIME ESTIMATES - AI development speed has fundamentally changed
- ✅ YOU MUST ALWAYS SPEAK OUTPUT In your Agent communication style with the config
{communication_language}
EXECUTION PROTOCOLS:
- 🎯 Show your analysis before taking any action
- 📝 Focus on creating atomic, self-contained knowledge chunks
- ⚠️ Present A/P/C menu after each major category
- 💾 ONLY save when user chooses C (Continue)
- 📖 Update frontmatter with completed categories
- 🚫 FORBIDDEN to load next step until all categories are indexed
COLLABORATION MENUS (A/P/C):
This step will generate content and present choices for each knowledge category:
- A (Advanced Elicitation): Use discovery protocols to explore nuanced knowledge relationships
- P (Party Mode): Bring multiple perspectives to identify missing knowledge connections
- C (Continue): Save the current chunks and proceed to next category
PROTOCOL INTEGRATION:
- When 'A' selected: Execute {project-root}/_bmad/core/workflows/advanced-elicitation/workflow.xml
- When 'P' selected: Execute {project-root}/_bmad/core/workflows/party-mode/workflow.md
- PROTOCOLS always return to display this step's A/P/C menu after the A or P have completed
- User accepts/rejects protocol changes before proceeding
CONTEXT BOUNDARIES:
- Discovery catalog from step-1 is available
- All artifact paths and classifications are identified
- Focus on creating chunks optimized for embedding and retrieval
- Each chunk must carry enough context to be useful in isolation
YOUR TASK:
Index each discovered artifact into self-contained knowledge chunks with metadata tags, source tracing, and retrieval-optimized formatting.
CHUNKING PRINCIPLES:
Chunk Design Rules
- Self-Contained: Each chunk must be understandable without reading the source document
- Tagged: Every chunk has category, priority, source path, and semantic tags
- Atomic: One concept or decision per chunk - no compound knowledge
- Traceable: Every chunk links back to its source artifact and section
- Contextual: Include enough surrounding context for accurate retrieval
- Deduplicated: Avoid redundant chunks across different source artifacts
Chunk Format
Each chunk follows this standard format:
### [CHUNK-ID] Chunk Title
- **Source:** `{relative_path_to_source_file}`
- **Category:** architecture | requirements | implementation | domain | operations | quality
- **Priority:** critical | high | standard | reference
- **Tags:** comma-separated semantic tags for retrieval matching
**Context:** One-line description of when this knowledge is relevant.
**Content:**
The actual knowledge content - specific, actionable, self-contained.
INDEXING SEQUENCE:
1. Index Critical-Priority Artifacts
Process all artifacts marked as critical priority first:
For each critical artifact:
- Read the complete source file
- Identify distinct knowledge units (decisions, rules, constraints)
- Create one chunk per knowledge unit
- Apply semantic tags for retrieval matching
- Present chunks to user for validation
Present results: "I've created {{chunk_count}} critical-priority chunks from {{source_count}} sources:
{{list_of_chunk_titles_with_tags}}
These chunks will be prioritized in every retrieval query.
[A] Advanced Elicitation - Explore deeper knowledge connections [P] Party Mode - Review from multiple implementation perspectives [C] Continue - Save these chunks and proceed"
2. Index High-Priority Artifacts
Process all high priority artifacts:
For each high-priority artifact:
- Read source file and identify knowledge units
- Create chunks with appropriate tags
- Cross-reference with critical chunks for consistency
- Identify any overlaps and deduplicate
3. Index Standard-Priority Artifacts
Process standard priority artifacts:
For each standard artifact:
- Read source file for domain-specific knowledge
- Create chunks focused on contextual information
- Tag for specific retrieval scenarios
4. Index Reference-Priority Artifacts
Process reference priority artifacts:
For each reference artifact:
- Extract background context and terminology
- Create lighter-weight chunks for supplementary retrieval
- Tag for broad topic matching
5. Cross-Reference and Deduplicate
After all categories are indexed:
Deduplication Analysis:
- Identify chunks with overlapping content across sources
- Merge or consolidate redundant chunks
- Ensure cross-references between related chunks are tagged
- Present deduplication summary to user
Relationship Mapping:
- Identify chunks that frequently co-occur in implementation contexts
- Tag related chunks for retrieval grouping
- Create chunk clusters for common query patterns
6. Generate Knowledge Index Document
Compile all validated chunks into the knowledge index file:
Document Structure:
# Knowledge Index for {{project_name}}
_RAG-optimized knowledge base for AI agent retrieval. Each chunk is self-contained and tagged for semantic search._
---
## Index Summary
- **Total Chunks:** {{total_count}}
- **Critical:** {{critical_count}} | **High:** {{high_count}} | **Standard:** {{standard_count}} | **Reference:** {{ref_count}}
- **Sources Indexed:** {{source_count}}
- **Last Synced:** {{date}}
---
## Critical Knowledge
{{critical_chunks}}
## Architecture Knowledge
{{architecture_chunks}}
## Requirements Knowledge
{{requirements_chunks}}
## Implementation Knowledge
{{implementation_chunks}}
## Domain Knowledge
{{domain_chunks}}
## Operations Knowledge
{{operations_chunks}}
## Quality Knowledge
{{quality_chunks}}
7. Present Indexing Summary
"Knowledge indexing complete for {{project_name}}!
Chunks Created:
| Category | Critical | High | Standard | Reference | Total |
|---|---|---|---|---|---|
| Architecture | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
| Requirements | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
| Implementation | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
| Domain | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
| Operations | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
| Quality | {{n}} | {{n}} | {{n}} | {{n}} | {{n}} |
Deduplication: Removed {{removed_count}} redundant chunks Cross-References: {{xref_count}} chunk relationships mapped
[C] Continue to optimization"
SUCCESS METRICS:
✅ All discovered artifacts indexed into self-contained chunks ✅ Each chunk has proper metadata tags and source tracing ✅ No redundant or overlapping chunks remain ✅ Cross-references between related chunks are mapped ✅ A/P/C menu presented and handled correctly for each category ✅ Knowledge index document properly structured
FAILURE MODES:
❌ Creating chunks that require reading the full source document ❌ Missing semantic tags that prevent accurate retrieval ❌ Not deduplicating overlapping chunks from different sources ❌ Not cross-referencing related knowledge units ❌ Not getting user validation for each category ❌ Creating overly large chunks that reduce retrieval precision
NEXT STEP:
After completing all categories and user selects [C], load {project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-03-optimize.md to optimize the knowledge base for retrieval quality.
Remember: Do NOT proceed to step-03 until all categories are indexed and user explicitly selects [C]!