7.7 KiB

Raw Blame History

Step 2: Knowledge Indexing & Chunking

MANDATORY EXECUTION RULES (READ FIRST):

🛑 NEVER generate content without user input
✅ ALWAYS treat this as collaborative indexing between technical peers
📋 YOU ARE A FACILITATOR, not a content generator
💬 FOCUS on creating self-contained, retrievable knowledge chunks
🎯 EACH CHUNK must be independently useful without requiring full document context
⚠️ ABSOLUTELY NO TIME ESTIMATES - AI development speed has fundamentally changed
✅ YOU MUST ALWAYS SPEAK OUTPUT In your Agent communication style with the config {communication_language}

EXECUTION PROTOCOLS:

🎯 Show your analysis before taking any action
📝 Focus on creating atomic, self-contained knowledge chunks
⚠️ Present A/P/C menu after each major category
💾 ONLY save when user chooses C (Continue)
📖 Update frontmatter with completed categories
🚫 FORBIDDEN to load next step until all categories are indexed

COLLABORATION MENUS (A/P/C):

This step will generate content and present choices for each knowledge category:

A (Advanced Elicitation): Use discovery protocols to explore nuanced knowledge relationships
P (Party Mode): Bring multiple perspectives to identify missing knowledge connections
C (Continue): Save the current chunks and proceed to next category

PROTOCOL INTEGRATION:

When 'A' selected: Execute {project-root}/_bmad/core/workflows/advanced-elicitation/workflow.xml
When 'P' selected: Execute {project-root}/_bmad/core/workflows/party-mode/workflow.md
PROTOCOLS always return to display this step's A/P/C menu after the A or P have completed
User accepts/rejects protocol changes before proceeding

CONTEXT BOUNDARIES:

Discovery catalog from step-1 is available
All artifact paths and classifications are identified
Focus on creating chunks optimized for embedding and retrieval
Each chunk must carry enough context to be useful in isolation

YOUR TASK:

Index each discovered artifact into self-contained knowledge chunks with metadata tags, source tracing, and retrieval-optimized formatting.

CHUNKING PRINCIPLES:

Chunk Design Rules

Self-Contained: Each chunk must be understandable without reading the source document
Tagged: Every chunk has category, priority, source path, and semantic tags
Atomic: One concept or decision per chunk - no compound knowledge
Traceable: Every chunk links back to its source artifact and section
Contextual: Include enough surrounding context for accurate retrieval
Deduplicated: Avoid redundant chunks across different source artifacts

Chunk Format

Each chunk follows this standard format:

### [CHUNK-ID] Chunk Title

- **Source:** `{relative_path_to_source_file}`
- **Category:** architecture | requirements | implementation | domain | operations | quality
- **Priority:** critical | high | standard | reference
- **Tags:** comma-separated semantic tags for retrieval matching

**Context:** One-line description of when this knowledge is relevant.

**Content:**
The actual knowledge content - specific, actionable, self-contained.

INDEXING SEQUENCE:

1. Index Critical-Priority Artifacts

Process all artifacts marked as critical priority first:

For each critical artifact:

Read the complete source file
Identify distinct knowledge units (decisions, rules, constraints)
Create one chunk per knowledge unit
Apply semantic tags for retrieval matching
Present chunks to user for validation

Present results: "I've created {{chunk_count}} critical-priority chunks from {{source_count}} sources:

These chunks will be prioritized in every retrieval query.

[A] Advanced Elicitation - Explore deeper knowledge connections [P] Party Mode - Review from multiple implementation perspectives [C] Continue - Save these chunks and proceed"

2. Index High-Priority Artifacts

Process all high priority artifacts:

For each high-priority artifact:

Read source file and identify knowledge units
Create chunks with appropriate tags
Cross-reference with critical chunks for consistency
Identify any overlaps and deduplicate

3. Index Standard-Priority Artifacts

Process standard priority artifacts:

For each standard artifact:

Read source file for domain-specific knowledge
Create chunks focused on contextual information
Tag for specific retrieval scenarios

4. Index Reference-Priority Artifacts

Process reference priority artifacts:

For each reference artifact:

Extract background context and terminology
Create lighter-weight chunks for supplementary retrieval
Tag for broad topic matching

5. Cross-Reference and Deduplicate

After all categories are indexed:

Deduplication Analysis:

Identify chunks with overlapping content across sources
Merge or consolidate redundant chunks
Ensure cross-references between related chunks are tagged
Present deduplication summary to user

Relationship Mapping:

Identify chunks that frequently co-occur in implementation contexts
Tag related chunks for retrieval grouping
Create chunk clusters for common query patterns

6. Generate Knowledge Index Document

Compile all validated chunks into the knowledge index file:

Document Structure:

# Knowledge Index for {{project_name}}

_RAG-optimized knowledge base for AI agent retrieval. Each chunk is self-contained and tagged for semantic search._

---

## Index Summary

- **Total Chunks:** {{total_count}}
- **Critical:** {{critical_count}} | **High:** {{high_count}} | **Standard:** {{standard_count}} | **Reference:** {{ref_count}}
- **Sources Indexed:** {{source_count}}
- **Last Synced:** {{date}}

---

## Critical Knowledge

{{critical_chunks}}

## Architecture Knowledge

{{architecture_chunks}}

## Requirements Knowledge

{{requirements_chunks}}

## Implementation Knowledge

{{implementation_chunks}}

## Domain Knowledge

{{domain_chunks}}

## Operations Knowledge

{{operations_chunks}}

## Quality Knowledge

{{quality_chunks}}

7. Present Indexing Summary

"Knowledge indexing complete for {{project_name}}!

Chunks Created:

Category	Critical	High	Standard	Reference	Total
Architecture	{{n}}	{{n}}	{{n}}	{{n}}	{{n}}
Requirements	{{n}}	{{n}}	{{n}}	{{n}}	{{n}}
Implementation	{{n}}	{{n}}	{{n}}	{{n}}	{{n}}
Domain	{{n}}	{{n}}	{{n}}	{{n}}	{{n}}
Operations	{{n}}	{{n}}	{{n}}	{{n}}	{{n}}
Quality	{{n}}	{{n}}	{{n}}	{{n}}	{{n}}

Deduplication: Removed {{removed_count}} redundant chunks Cross-References: {{xref_count}} chunk relationships mapped

[C] Continue to optimization"

SUCCESS METRICS:

✅ All discovered artifacts indexed into self-contained chunks ✅ Each chunk has proper metadata tags and source tracing ✅ No redundant or overlapping chunks remain ✅ Cross-references between related chunks are mapped ✅ A/P/C menu presented and handled correctly for each category ✅ Knowledge index document properly structured

FAILURE MODES:

❌ Creating chunks that require reading the full source document ❌ Missing semantic tags that prevent accurate retrieval ❌ Not deduplicating overlapping chunks from different sources ❌ Not cross-referencing related knowledge units ❌ Not getting user validation for each category ❌ Creating overly large chunks that reduce retrieval precision

NEXT STEP:

After completing all categories and user selects [C], load {project-root}/_bmad/bmm/workflows/4-implementation/genai-knowledge-sync/steps/step-03-optimize.md to optimize the knowledge base for retrieval quality.

Remember: Do NOT proceed to step-03 until all categories are indexed and user explicitly selects [C]!

7.7 KiB Raw Blame History