docs: add core tools reference and apply Diataxis style fixes
Add comprehensive reference doc for all 11 built-in core tools (tasks and workflows) that ship with every BMad installation — bmad-help, brainstorming, party-mode, distillator, advanced-elicitation, both review tools, both editorial tools, shard-doc, and index-docs. Each entry follows the Configuration Reference structure with purpose, use cases, how it works, inputs, and outputs. Style fixes across existing docs: - reference/commands.md: convert #### headers to bold text, replace sparse task table with link to new core-tools reference - how-to/get-answers-about-bmad.md: remove horizontal rule between sections (Diataxis violation) - how-to/project-context.md: consolidate 4 consecutive tip admonitions into single admonition with bullet list, add AGENTS.md reference Also includes: - Add bmad-distillator task to core module with compression agents, format reference, splitting strategy, and analysis scripts - Add Distillator entry to module-help.csv - Rename supports-autonomous to supports-headless in product-brief manifest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
474ef8f128
commit
2e7a07fdc9
|
|
@ -42,8 +42,6 @@ BMad-Help responds with:
|
||||||
- What the first required task is
|
- What the first required task is
|
||||||
- What the rest of the process looks like
|
- What the rest of the process looks like
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## When to Use This Guide
|
## When to Use This Guide
|
||||||
|
|
||||||
Use this section when:
|
Use this section when:
|
||||||
|
|
|
||||||
|
|
@ -5,7 +5,7 @@ sidebar:
|
||||||
order: 7
|
order: 7
|
||||||
---
|
---
|
||||||
|
|
||||||
Use the `project-context.md` file to ensure AI agents follow your project's technical preferences and implementation rules throughout all workflows.
|
Use the `project-context.md` file to ensure AI agents follow your project's technical preferences and implementation rules throughout all workflows. To make sure this is always available, you can also add the line `Important project context and conventions are located in [path to project context]/project-context.md` to your tools context or always rules file (such as `AGENTS.md`)
|
||||||
|
|
||||||
:::note[Prerequisites]
|
:::note[Prerequisites]
|
||||||
- BMad Method installed
|
- BMad Method installed
|
||||||
|
|
@ -114,20 +114,11 @@ A `project-context.md` file that:
|
||||||
|
|
||||||
## Tips
|
## Tips
|
||||||
|
|
||||||
:::tip[Focus on the Unobvious]
|
:::tip[Best Practices]
|
||||||
Document patterns agents might miss such as "Use JSDoc style comments on every public class, function and variable", not universal practices like "use meaningful variable names" which LLMs know at this point.
|
- **Focus on the unobvious** — Document patterns agents might miss (e.g., "Use JSDoc on every public class"), not universal practices like "use meaningful variable names."
|
||||||
:::
|
- **Keep it lean** — This file is loaded by every implementation workflow. Long files waste context. Exclude content that only applies to narrow scope or specific stories.
|
||||||
|
- **Update as needed** — Edit manually when patterns change, or re-generate after significant architecture changes.
|
||||||
:::tip[Keep It Lean]
|
- Works for Quick Flow and full BMad Method projects alike.
|
||||||
This file is loaded by every implementation workflow. Long files waste context. Do not include content that only applies to narrow scope or specific stories or features.
|
|
||||||
:::
|
|
||||||
|
|
||||||
:::tip[Update as Needed]
|
|
||||||
Edit manually when patterns change, or re-generate after significant architecture changes.
|
|
||||||
:::
|
|
||||||
|
|
||||||
:::tip[Works for All Project Types]
|
|
||||||
Just as useful for Quick Flow as for full BMad Method projects.
|
|
||||||
:::
|
:::
|
||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
|
||||||
|
|
@ -105,32 +105,21 @@ See [Workflow Map](./workflow-map.md) for the complete workflow reference organi
|
||||||
|
|
||||||
Tasks and tools are standalone operations that do not require an agent or workflow context.
|
Tasks and tools are standalone operations that do not require an agent or workflow context.
|
||||||
|
|
||||||
#### BMad-Help: Your Intelligent Guide
|
**BMad-Help: Your Intelligent Guide**
|
||||||
|
|
||||||
**`bmad-help`** is your primary interface for discovering what to do next. It's not just a lookup tool — it's an intelligent assistant that:
|
`bmad-help` is your primary interface for discovering what to do next. It inspects your project, understands natural language queries, and recommends the next required or optional step based on your installed modules.
|
||||||
|
|
||||||
- **Inspects your project** to see what's already been done
|
|
||||||
- **Understands natural language queries** — ask questions in plain English
|
|
||||||
- **Varies by installed modules** — shows options based on what you have
|
|
||||||
- **Auto-invokes after workflows** — every workflow ends with clear next steps
|
|
||||||
- **Recommends the first required task** — no guessing where to start
|
|
||||||
|
|
||||||
**Examples:**
|
|
||||||
|
|
||||||
|
:::note[Example]
|
||||||
```
|
```
|
||||||
bmad-help
|
bmad-help
|
||||||
bmad-help I have a SaaS idea and know all the features. Where do I start?
|
bmad-help I have a SaaS idea and know all the features. Where do I start?
|
||||||
bmad-help What are my options for UX design?
|
bmad-help What are my options for UX design?
|
||||||
bmad-help I'm stuck on the PRD workflow
|
|
||||||
```
|
```
|
||||||
|
:::
|
||||||
|
|
||||||
#### Other Tasks and Tools
|
**Other Core Tasks and Tools**
|
||||||
|
|
||||||
| Example skill | Purpose |
|
The core module includes 11 built-in tools — reviews, compression, brainstorming, document management, and more. See [Core Tools](./core-tools.md) for the complete reference.
|
||||||
| --- | --- |
|
|
||||||
| `bmad-shard-doc` | Split a large markdown file into smaller sections |
|
|
||||||
| `bmad-index-docs` | Index project documentation |
|
|
||||||
| `bmad-editorial-review-prose` | Review document prose quality |
|
|
||||||
|
|
||||||
## Naming Convention
|
## Naming Convention
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,293 @@
|
||||||
|
---
|
||||||
|
title: Core Tools
|
||||||
|
description: Reference for all built-in tasks and workflows available in every BMad installation without additional modules.
|
||||||
|
sidebar:
|
||||||
|
order: 2
|
||||||
|
---
|
||||||
|
|
||||||
|
Every BMad installation includes a set of core skills that can be used in conjunction with any anything you are doing — standalone tasks and workflows that work across all projects, all modules, and all phases. These are always available regardless of which optional modules you install.
|
||||||
|
|
||||||
|
:::tip[Quick Path]
|
||||||
|
Run any core tool by typing its skill name (e.g., `bmad-help`) in your IDE. No agent session required.
|
||||||
|
:::
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
| Tool | Type | Purpose |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| [`bmad-help`](#bmad-help) | Task | Get context-aware guidance on what to do next |
|
||||||
|
| [`bmad-brainstorming`](#bmad-brainstorming) | Workflow | Facilitate interactive brainstorming sessions |
|
||||||
|
| [`bmad-party-mode`](#bmad-party-mode) | Workflow | Orchestrate multi-agent group discussions |
|
||||||
|
| [`bmad-distillator`](#bmad-distillator) | Task | Lossless LLM-optimized compression of documents |
|
||||||
|
| [`bmad-advanced-elicitation`](#bmad-advanced-elicitation) | Task | Push LLM output through iterative refinement methods |
|
||||||
|
| [`bmad-review-adversarial-general`](#bmad-review-adversarial-general) | Task | Cynical review that finds what's missing and what's wrong |
|
||||||
|
| [`bmad-review-edge-case-hunter`](#bmad-review-edge-case-hunter) | Task | Exhaustive branching-path analysis for unhandled edge cases |
|
||||||
|
| [`bmad-editorial-review-prose`](#bmad-editorial-review-prose) | Task | Clinical copy-editing for communication clarity |
|
||||||
|
| [`bmad-editorial-review-structure`](#bmad-editorial-review-structure) | Task | Structural editing — cuts, merges, and reorganization |
|
||||||
|
| [`bmad-shard-doc`](#bmad-shard-doc) | Task | Split large markdown files into organized sections |
|
||||||
|
| [`bmad-index-docs`](#bmad-index-docs) | Task | Generate or update an index of all docs in a folder |
|
||||||
|
|
||||||
|
## bmad-help
|
||||||
|
|
||||||
|
**Your intelligent guide to what comes next.** — Inspects your project state, detects what's been done, and recommends the next required or optional step.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You finished a workflow and want to know what's next
|
||||||
|
- You're new to BMad and need orientation
|
||||||
|
- You're stuck and want context-aware advice
|
||||||
|
- You installed new modules and want to see what's available
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Scans your project for existing artifacts (PRD, architecture, stories, etc.)
|
||||||
|
2. Detects which modules are installed and their available workflows
|
||||||
|
3. Recommends next steps in priority order — required steps first, then optional
|
||||||
|
4. Presents each recommendation with the skill command and a brief description
|
||||||
|
|
||||||
|
**Input:** Optional query in natural language (e.g., `bmad-help I have a SaaS idea, where do I start?`)
|
||||||
|
|
||||||
|
**Output:** Prioritized list of recommended next steps with skill commands
|
||||||
|
|
||||||
|
## bmad-brainstorming
|
||||||
|
|
||||||
|
**Generate diverse ideas through interactive creative techniques.** — A facilitated brainstorming session that loads proven ideation methods from a technique library and guides you toward 100+ ideas before organizing.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You're starting a new project and need to explore the problem space
|
||||||
|
- You're stuck generating ideas and need structured creativity
|
||||||
|
- You want to use proven ideation frameworks (SCAMPER, reverse brainstorming, etc.)
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Sets up a brainstorming session with your topic
|
||||||
|
2. Loads creative techniques from a method library
|
||||||
|
3. Guides you through technique after technique, generating ideas
|
||||||
|
4. Applies anti-bias protocol — shifts creative domain every 10 ideas to prevent clustering
|
||||||
|
5. Produces an append-only session document with all ideas organized by technique
|
||||||
|
|
||||||
|
**Input:** Brainstorming topic or problem statement, optional context file
|
||||||
|
|
||||||
|
**Output:** `brainstorming-session-{date}.md` with all generated ideas
|
||||||
|
|
||||||
|
:::note[Quantity Target]
|
||||||
|
The magic happens in ideas 50–100. The workflow encourages generating 100+ ideas before organization.
|
||||||
|
:::
|
||||||
|
|
||||||
|
## bmad-party-mode
|
||||||
|
|
||||||
|
**Orchestrate multi-agent group discussions.** — Loads all installed BMad agents and facilitates a natural conversation where each agent contributes from their unique expertise and personality.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You need multiple expert perspectives on a decision
|
||||||
|
- You want agents to challenge each other's assumptions
|
||||||
|
- You're exploring a complex topic that spans multiple domains
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Loads the agent manifest with all installed agent personalities
|
||||||
|
2. Analyzes your topic to select 2–3 most relevant agents
|
||||||
|
3. Agents take turns contributing, with natural cross-talk and disagreements
|
||||||
|
4. Rotates agent participation to ensure diverse perspectives over time
|
||||||
|
5. Exit with `goodbye`, `end party`, or `quit`
|
||||||
|
|
||||||
|
**Input:** Discussion topic or question, along with specification of personas you would like to participate (optional)
|
||||||
|
|
||||||
|
**Output:** Real-time multi-agent conversation with maintained agent personalities
|
||||||
|
|
||||||
|
## bmad-distillator
|
||||||
|
|
||||||
|
**Lossless LLM-optimized compression of source documents.** — Produces dense, token-efficient distillates that preserve all information for downstream LLM consumption. Verifiable through round-trip reconstruction.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- A document is too large for an LLM's context window
|
||||||
|
- You need token-efficient versions of research, specs, or planning artifacts
|
||||||
|
- You want to verify no information is lost during compression
|
||||||
|
- Agents will need to frequently reference and find information in it
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. **Analyze** — Reads source documents, identifies information density and structure
|
||||||
|
2. **Compress** — Converts prose to dense bullet-point format, strips decorative formatting
|
||||||
|
3. **Verify** — Checks completeness to ensure all original information is preserved
|
||||||
|
4. **Validate** (optional) — Round-trip reconstruction test proves lossless compression
|
||||||
|
|
||||||
|
**Input:**
|
||||||
|
|
||||||
|
- `source_documents` (required) — File paths, folder paths, or glob patterns
|
||||||
|
- `downstream_consumer` (optional) — What consumes this (e.g., "PRD creation")
|
||||||
|
- `token_budget` (optional) — Approximate target size
|
||||||
|
- `--validate` (flag) — Run round-trip reconstruction test
|
||||||
|
|
||||||
|
**Output:** Distillate markdown file(s) with compression ratio report (e.g., "3.2:1")
|
||||||
|
|
||||||
|
## bmad-advanced-elicitation
|
||||||
|
|
||||||
|
**Push LLM output through iterative refinement methods.** — Selects from a library of elicitation techniques to systematically improve content through multiple passes.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- LLM output feels shallow or generic
|
||||||
|
- You want to explore a topic from multiple analytical angles
|
||||||
|
- You're refining a critical document and want deeper thinking
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Loads method registry with 5+ elicitation techniques
|
||||||
|
2. Selects 5 best-fit methods based on content type and complexity
|
||||||
|
3. Presents an interactive menu — pick a method, reshuffle, or list all
|
||||||
|
4. Applies the selected method to enhance the content
|
||||||
|
5. Re-presents options for iterative improvement until you select "Proceed"
|
||||||
|
|
||||||
|
**Input:** Content section to enhance
|
||||||
|
|
||||||
|
**Output:** Enhanced version of the content with improvements applied
|
||||||
|
|
||||||
|
## bmad-review-adversarial-general
|
||||||
|
|
||||||
|
**Cynical review that assumes problems exist and searches for them.** — Takes a skeptical, jaded reviewer perspective with zero patience for sloppy work. Looks for what's missing, not just what's wrong.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You need quality assurance before finalizing a deliverable
|
||||||
|
- You want to stress-test a spec, story, or document
|
||||||
|
- You want to find gaps in coverage that optimistic reviews miss
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Reads the content with a cynical, critical perspective
|
||||||
|
2. Identifies issues across completeness, correctness, and quality
|
||||||
|
3. Searches specifically for what's missing — not just what's present and wrong
|
||||||
|
4. Must find a minimum of 10 issues or re-analyzes deeper
|
||||||
|
|
||||||
|
**Input:**
|
||||||
|
|
||||||
|
- `content` (required) — Diff, spec, story, doc, or any artifact
|
||||||
|
- `also_consider` (optional) — Additional areas to keep in mind
|
||||||
|
|
||||||
|
**Output:** Markdown list of 10+ findings with descriptions
|
||||||
|
|
||||||
|
## bmad-review-edge-case-hunter
|
||||||
|
|
||||||
|
**Walk every branching path and boundary condition, report only unhandled cases.** — Pure path-tracing methodology that mechanically derives edge classes. Orthogonal to adversarial review — method-driven, not attitude-driven.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You want exhaustive edge case coverage for code or logic
|
||||||
|
- You need a complement to adversarial review (different methodology, different findings)
|
||||||
|
- You're reviewing a diff or function for boundary conditions
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Enumerates all branching paths in the content
|
||||||
|
2. Derives edge classes mechanically: missing else/default, unguarded inputs, off-by-one, arithmetic overflow, implicit type coercion, race conditions, timeout gaps
|
||||||
|
3. Tests each path against existing guards
|
||||||
|
4. Reports only unhandled paths — silently discards handled ones
|
||||||
|
|
||||||
|
**Input:**
|
||||||
|
|
||||||
|
- `content` (required) — Diff, full file, or function
|
||||||
|
- `also_consider` (optional) — Additional areas to keep in mind
|
||||||
|
|
||||||
|
**Output:** JSON array of findings, each with `location`, `trigger_condition`, `guard_snippet`, and `potential_consequence`
|
||||||
|
|
||||||
|
:::note[Complementary Reviews]
|
||||||
|
Run both `bmad-review-adversarial-general` and `bmad-review-edge-case-hunter` together for orthogonal coverage. The adversarial review catches quality and completeness issues; the edge case hunter catches unhandled paths.
|
||||||
|
:::
|
||||||
|
|
||||||
|
## bmad-editorial-review-prose
|
||||||
|
|
||||||
|
**Clinical copy-editing focused on communication clarity.** — Reviews text for issues that impede comprehension. Applies Microsoft Writing Style Guide baseline. Preserves author voice.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You've drafted a document and want to polish the writing
|
||||||
|
- You need to ensure clarity for a specific audience
|
||||||
|
- You want communication fixes without style opinion changes
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Reads the content, skipping code blocks and frontmatter
|
||||||
|
2. Identifies communication issues (not style preferences)
|
||||||
|
3. Deduplicates same issues across multiple locations
|
||||||
|
4. Produces a three-column fix table
|
||||||
|
|
||||||
|
**Input:**
|
||||||
|
|
||||||
|
- `content` (required) — Markdown, plain text, or XML
|
||||||
|
- `style_guide` (optional) — Project-specific style guide
|
||||||
|
- `reader_type` (optional) — `humans` (default) for clarity/flow, or `llm` for precision/consistency
|
||||||
|
|
||||||
|
**Output:** Three-column markdown table: Original Text | Revised Text | Changes
|
||||||
|
|
||||||
|
## bmad-editorial-review-structure
|
||||||
|
|
||||||
|
**Structural editing — proposes cuts, merges, moves, and condensing.** — Reviews document organization and proposes substantive changes to improve clarity and flow before copy editing.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- A document was produced from multiple subprocesses and needs structural coherence
|
||||||
|
- You want to reduce document length while preserving comprehension
|
||||||
|
- You need to identify scope violations or buried critical information
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Analyzes document against 5 structure models (Tutorial, Reference, Explanation, Prompt, Strategic)
|
||||||
|
2. Identifies redundancies, scope violations, and buried information
|
||||||
|
3. Produces prioritized recommendations: CUT, MERGE, MOVE, CONDENSE, QUESTION, PRESERVE
|
||||||
|
4. Estimates total reduction in words and percentage
|
||||||
|
|
||||||
|
**Input:**
|
||||||
|
|
||||||
|
- `content` (required) — Document to review
|
||||||
|
- `purpose` (optional) — Intended purpose (e.g., "quickstart tutorial")
|
||||||
|
- `target_audience` (optional) — Who reads this
|
||||||
|
- `reader_type` (optional) — `humans` or `llm`
|
||||||
|
- `length_target` (optional) — Target reduction (e.g., "30% shorter")
|
||||||
|
|
||||||
|
**Output:** Document summary, prioritized recommendation list, and estimated reduction
|
||||||
|
|
||||||
|
## bmad-shard-doc
|
||||||
|
|
||||||
|
**Split large markdown files into organized section files.** — Uses level-2 headers as split points to create a folder of self-contained section files with an index.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- A markdown document has grown too large to manage effectively (500+ lines)
|
||||||
|
- You want to break a monolithic doc into navigable sections
|
||||||
|
- You need separate files for parallel editing or LLM context management
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Validates the source file exists and is markdown
|
||||||
|
2. Splits on level-2 (`##`) headers into numbered section files
|
||||||
|
3. Creates an `index.md` with section manifest and links
|
||||||
|
4. Prompts you to delete, archive, or keep the original
|
||||||
|
|
||||||
|
**Input:** Source markdown file path, optional destination folder
|
||||||
|
|
||||||
|
**Output:** Folder with `index.md` and `01-{section}.md`, `02-{section}.md`, etc.
|
||||||
|
|
||||||
|
## bmad-index-docs
|
||||||
|
|
||||||
|
**Generate or update an index of all documents in a folder.** — Scans a directory, reads each file to understand its purpose, and produces an organized `index.md` with links and descriptions.
|
||||||
|
|
||||||
|
**Use it when:**
|
||||||
|
|
||||||
|
- You need a lightweight index for quick LLM scanning of available docs
|
||||||
|
- A documentation folder has grown and needs an organized table of contents
|
||||||
|
- You want an auto-generated overview that stays current
|
||||||
|
|
||||||
|
**How it works:**
|
||||||
|
|
||||||
|
1. Scans the target directory for all non-hidden files
|
||||||
|
2. Reads each file to understand its actual purpose
|
||||||
|
3. Groups files by type, purpose, or subdirectory
|
||||||
|
4. Generates concise descriptions (3–10 words each)
|
||||||
|
|
||||||
|
**Input:** Target folder path
|
||||||
|
|
||||||
|
**Output:** `index.md` with organized file listings, relative links, and brief descriptions
|
||||||
|
|
@ -6,7 +6,7 @@
|
||||||
"name": "create-brief",
|
"name": "create-brief",
|
||||||
"menu-code": "CB",
|
"menu-code": "CB",
|
||||||
"description": "Produces executive product brief and optional LLM distillate for PRD input.",
|
"description": "Produces executive product brief and optional LLM distillate for PRD input.",
|
||||||
"supports-autonomous": true,
|
"supports-headless": true,
|
||||||
"phase-name": "1-analysis",
|
"phase-name": "1-analysis",
|
||||||
"after": ["brainstorming, perform-research"],
|
"after": ["brainstorming, perform-research"],
|
||||||
"before": ["create-prd"],
|
"before": ["create-prd"],
|
||||||
|
|
|
||||||
|
|
@ -8,3 +8,4 @@ core,anytime,Editorial Review - Prose,EP,,skill:bmad-editorial-review-prose,bmad
|
||||||
core,anytime,Editorial Review - Structure,ES,,skill:bmad-editorial-review-structure,bmad-editorial-review-structure,false,,,"Propose cuts, reorganization, and simplification while preserving comprehension. Use when doc produced from multiple subprocesses or needs structural improvement.",report located with target document,
|
core,anytime,Editorial Review - Structure,ES,,skill:bmad-editorial-review-structure,bmad-editorial-review-structure,false,,,"Propose cuts, reorganization, and simplification while preserving comprehension. Use when doc produced from multiple subprocesses or needs structural improvement.",report located with target document,
|
||||||
core,anytime,Adversarial Review (General),AR,,skill:bmad-review-adversarial-general,bmad-review-adversarial-general,false,,,"Review content critically to find issues and weaknesses. Use for quality assurance or before finalizing deliverables. Code Review in other modules run this automatically, but its useful also for document reviews",,
|
core,anytime,Adversarial Review (General),AR,,skill:bmad-review-adversarial-general,bmad-review-adversarial-general,false,,,"Review content critically to find issues and weaknesses. Use for quality assurance or before finalizing deliverables. Code Review in other modules run this automatically, but its useful also for document reviews",,
|
||||||
core,anytime,Edge Case Hunter Review,ECH,,skill:bmad-review-edge-case-hunter,bmad-review-edge-case-hunter,false,,,"Walk every branching path and boundary condition in code, report only unhandled edge cases. Use alongside adversarial review for orthogonal coverage - method-driven not attitude-driven.",,
|
core,anytime,Edge Case Hunter Review,ECH,,skill:bmad-review-edge-case-hunter,bmad-review-edge-case-hunter,false,,,"Walk every branching path and boundary condition in code, report only unhandled edge cases. Use alongside adversarial review for orthogonal coverage - method-driven not attitude-driven.",,
|
||||||
|
core,anytime,Distillator,DG,,skill:bmad-distillator,bmad-distillator,false,,,"Lossless LLM-optimized compression of source documents. Use when you need token-efficient distillates that preserve all information for downstream LLM consumption.",adjacent to source document or specified output_path,distillate markdown file(s)
|
||||||
|
|
|
||||||
|
Can't render this file because it has a wrong number of fields in line 2.
|
|
|
@ -0,0 +1,178 @@
|
||||||
|
---
|
||||||
|
name: bmad-distillator
|
||||||
|
description: Lossless LLM-optimized compression of source documents. Use when the user requests to 'distill documents' or 'create a distillate'.
|
||||||
|
argument-hint: "[to create provide input paths] [--validate distillate-path to confirm distillate is lossless and optimized]"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Distillator: A Document Distillation Engine
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
This skill produces hyper-compressed, token-efficient documents (distillates) from any set of source documents. A distillate preserves every fact, decision, constraint, and relationship from the sources while stripping all overhead that humans need and LLMs don't. Act as an information extraction and compression specialist. The output is a single dense document (or semantically-split set) that a downstream LLM workflow can consume as sole context input without information loss.
|
||||||
|
|
||||||
|
This is a compression task, not a summarization task. Summaries are lossy. Distillates are lossless compression optimized for LLM consumption.
|
||||||
|
|
||||||
|
## On Activation
|
||||||
|
|
||||||
|
1. **Validate inputs.** The caller must provide:
|
||||||
|
- **source_documents** (required) — One or more file paths, folder paths, or glob patterns to distill
|
||||||
|
- **downstream_consumer** (optional) — What workflow/agent consumes this distillate (e.g., "PRD creation", "architecture design"). When provided, use it to judge signal vs noise. When omitted, preserve everything.
|
||||||
|
- **token_budget** (optional) — Approximate target size. When provided and the distillate would exceed it, trigger semantic splitting.
|
||||||
|
- **output_path** (optional) — Where to save. When omitted, save adjacent to the primary source document with `-distillate.md` suffix.
|
||||||
|
- **--validate** (flag) — Run round-trip reconstruction test after producing the distillate.
|
||||||
|
|
||||||
|
2. **Route** — proceed to Stage 1.
|
||||||
|
|
||||||
|
## Stages
|
||||||
|
|
||||||
|
| # | Stage | Purpose |
|
||||||
|
|---|-------|---------|
|
||||||
|
| 1 | Analyze | Run analysis script, determine routing and splitting |
|
||||||
|
| 2 | Compress | Spawn compressor agent(s) to produce the distillate |
|
||||||
|
| 3 | Verify & Output | Completeness check, format check, save output |
|
||||||
|
| 4 | Round-Trip Validate | (--validate only) Reconstruct and diff against originals |
|
||||||
|
|
||||||
|
### Stage 1: Analyze
|
||||||
|
|
||||||
|
Run `scripts/analyze_sources.py --help` then run it with the source paths. Use its routing recommendation and grouping output to drive Stage 2. Do NOT read the source documents yourself.
|
||||||
|
|
||||||
|
### Stage 2: Compress
|
||||||
|
|
||||||
|
**Single mode** (routing = `"single"`, ≤3 files, ≤15K estimated tokens):
|
||||||
|
|
||||||
|
Spawn one subagent using `agents/distillate-compressor.md` with all source file paths.
|
||||||
|
|
||||||
|
**Fan-out mode** (routing = `"fan-out"`):
|
||||||
|
|
||||||
|
1. Spawn one compressor subagent per group from the analysis output. Each compressor receives only its group's file paths and produces an intermediate distillate.
|
||||||
|
|
||||||
|
2. After all compressors return, spawn one final **merge compressor** subagent using `agents/distillate-compressor.md`. Pass it the intermediate distillate contents as its input (not the original files). Its job is cross-group deduplication, thematic regrouping, and final compression.
|
||||||
|
|
||||||
|
3. Clean up intermediate distillate content (it exists only in memory, not saved to disk).
|
||||||
|
|
||||||
|
**Graceful degradation:** If subagent spawning is unavailable, read the source documents and perform the compression work directly using the same instructions from `agents/distillate-compressor.md`. For fan-out, process groups sequentially then merge.
|
||||||
|
|
||||||
|
The compressor returns a structured JSON result containing the distillate content, source headings, named entities, and token estimate.
|
||||||
|
|
||||||
|
### Stage 3: Verify & Output
|
||||||
|
|
||||||
|
After the compressor (or merge compressor) returns:
|
||||||
|
|
||||||
|
1. **Completeness check.** Using the headings and named entities list returned by the compressor, verify each appears in the distillate content. If gaps are found, send them back to the compressor for a targeted fix pass — not a full recompression. Limit to 2 fix passes maximum.
|
||||||
|
|
||||||
|
2. **Format check.** Verify the output follows distillate format rules:
|
||||||
|
- No prose paragraphs (only bullets)
|
||||||
|
- No decorative formatting
|
||||||
|
- No repeated information
|
||||||
|
- Each bullet is self-contained
|
||||||
|
- Themes are clearly delineated with `##` headings
|
||||||
|
|
||||||
|
3. **Determine output format.** Using the split prediction from Stage 1 and actual distillate size:
|
||||||
|
|
||||||
|
**Single distillate** (≤~5,000 tokens or token_budget not exceeded):
|
||||||
|
|
||||||
|
Save as a single file with frontmatter:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
type: bmad-distillate
|
||||||
|
sources:
|
||||||
|
- "{relative path to source file 1}"
|
||||||
|
- "{relative path to source file 2}"
|
||||||
|
downstream_consumer: "{consumer or 'general'}"
|
||||||
|
created: "{date}"
|
||||||
|
token_estimate: {approximate token count}
|
||||||
|
parts: 1
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
**Split distillate** (>~5,000 tokens, or token_budget requires it):
|
||||||
|
|
||||||
|
Create a folder `{base-name}-distillate/` containing:
|
||||||
|
|
||||||
|
```
|
||||||
|
{base-name}-distillate/
|
||||||
|
├── _index.md # Orientation, cross-cutting items, section manifest
|
||||||
|
├── 01-{topic-slug}.md # Self-contained section
|
||||||
|
├── 02-{topic-slug}.md
|
||||||
|
└── 03-{topic-slug}.md
|
||||||
|
```
|
||||||
|
|
||||||
|
The `_index.md` contains:
|
||||||
|
- Frontmatter with sources (relative paths from the distillate folder to the originals)
|
||||||
|
- 3-5 bullet orientation (what was distilled, from what)
|
||||||
|
- Section manifest: each section's filename + 1-line description
|
||||||
|
- Cross-cutting items that span multiple sections
|
||||||
|
|
||||||
|
Each section file is self-contained — loadable independently. Include a 1-line context header: "This section covers [topic]. Part N of M."
|
||||||
|
|
||||||
|
Source paths in frontmatter must be relative to the distillate's location.
|
||||||
|
|
||||||
|
4. **Measure distillate.** Run `scripts/analyze_sources.py` on the final distillate file(s) to get accurate token counts for the output. Use the `total_estimated_tokens` from this analysis as `distillate_total_tokens`.
|
||||||
|
|
||||||
|
5. **Report results.** Always return structured JSON output:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "complete",
|
||||||
|
"distillate": "{path or folder path}",
|
||||||
|
"section_distillates": ["{path1}", "{path2}"] or null,
|
||||||
|
"source_total_tokens": N,
|
||||||
|
"distillate_total_tokens": N,
|
||||||
|
"compression_ratio": "X:1",
|
||||||
|
"source_documents": ["{path1}", "{path2}"],
|
||||||
|
"completeness_check": "pass" or "pass_with_additions"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Where `source_total_tokens` is from the Stage 1 analysis and `distillate_total_tokens` is from step 4. The `compression_ratio` is `source_total_tokens / distillate_total_tokens` formatted as "X:1" (e.g., "3.2:1").
|
||||||
|
|
||||||
|
6. If `--validate` flag was set, proceed to Stage 4. Otherwise, done.
|
||||||
|
|
||||||
|
### Stage 4: Round-Trip Validation (--validate only)
|
||||||
|
|
||||||
|
This stage proves the distillate is lossless by reconstructing source documents from the distillate alone. Use for critical documents where information loss is unacceptable, or as a quality gate for high-stakes downstream workflows. Not for routine use — it adds significant token cost.
|
||||||
|
|
||||||
|
1. **Spawn the reconstructor agent** using `agents/round-trip-reconstructor.md`. Pass it ONLY the distillate file path (or `_index.md` path for split distillates) — it must NOT have access to the original source documents.
|
||||||
|
|
||||||
|
For split distillates, spawn one reconstructor per section in parallel. Each receives its section file plus the `_index.md` for cross-cutting context.
|
||||||
|
|
||||||
|
**Graceful degradation:** If subagent spawning is unavailable, this stage cannot be performed by the main agent (it has already seen the originals). Report that round-trip validation requires subagent support and skip.
|
||||||
|
|
||||||
|
2. **Receive reconstructions.** The reconstructor returns reconstruction file paths saved adjacent to the distillate.
|
||||||
|
|
||||||
|
3. **Perform semantic diff.** Read both the original source documents and the reconstructions. For each section of the original, assess:
|
||||||
|
- Is the core information present in the reconstruction?
|
||||||
|
- Are specific details preserved (numbers, names, decisions)?
|
||||||
|
- Are relationships and rationale intact?
|
||||||
|
- Did the reconstruction add anything not in the original? (indicates hallucination filling gaps)
|
||||||
|
|
||||||
|
4. **Produce validation report** saved adjacent to the distillate as `-validation-report.md`:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
type: distillate-validation
|
||||||
|
distillate: "{distillate path}"
|
||||||
|
sources: ["{source paths}"]
|
||||||
|
created: "{date}"
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation Summary
|
||||||
|
- Status: PASS | PASS_WITH_WARNINGS | FAIL
|
||||||
|
- Information preserved: {percentage estimate}
|
||||||
|
- Gaps found: {count}
|
||||||
|
- Hallucinations detected: {count}
|
||||||
|
|
||||||
|
## Gaps (information in originals but missing from reconstruction)
|
||||||
|
- {gap description} — Source: {which original}, Section: {where}
|
||||||
|
|
||||||
|
## Hallucinations (information in reconstruction not traceable to originals)
|
||||||
|
- {hallucination description} — appears to fill gap in: {section}
|
||||||
|
|
||||||
|
## Possible Gap Markers (flagged by reconstructor)
|
||||||
|
- {marker description}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **If gaps are found**, offer to run a targeted fix pass on the distillate — adding the missing information without full recompression. Limit to 2 fix passes maximum.
|
||||||
|
|
||||||
|
6. **Clean up** — delete the temporary reconstruction files after the report is generated.
|
||||||
|
|
@ -0,0 +1,116 @@
|
||||||
|
# Distillate Compressor Agent
|
||||||
|
|
||||||
|
Act as an information extraction and compression specialist. Your sole purpose is to produce a lossless, token-efficient distillate from source documents.
|
||||||
|
|
||||||
|
You receive: source document file paths, an optional downstream_consumer context, and a splitting decision.
|
||||||
|
|
||||||
|
You must load and apply `resources/compression-rules.md` before producing output. Reference `resources/distillate-format-reference.md` for the expected output format.
|
||||||
|
|
||||||
|
## Compression Process
|
||||||
|
|
||||||
|
### Step 1: Read Sources
|
||||||
|
|
||||||
|
Read all source document files. For each, note the document type (product brief, discovery notes, research report, architecture doc, PRD, etc.) based on content and naming.
|
||||||
|
|
||||||
|
### Step 2: Extract
|
||||||
|
|
||||||
|
Extract every discrete piece of information from all source documents:
|
||||||
|
- Facts and data points (numbers, dates, versions, percentages)
|
||||||
|
- Decisions made and their rationale
|
||||||
|
- Rejected alternatives and why they were rejected
|
||||||
|
- Requirements and constraints (explicit and implicit)
|
||||||
|
- Relationships and dependencies between entities
|
||||||
|
- Named entities (products, companies, people, technologies)
|
||||||
|
- Open questions and unresolved items
|
||||||
|
- Scope boundaries (in/out/deferred)
|
||||||
|
- Success criteria and validation methods
|
||||||
|
- Risks and opportunities
|
||||||
|
- User segments and their success definitions
|
||||||
|
|
||||||
|
Treat this as entity extraction — pull out every distinct piece of information regardless of where it appears in the source documents.
|
||||||
|
|
||||||
|
### Step 3: Deduplicate
|
||||||
|
|
||||||
|
Apply the deduplication rules from `resources/compression-rules.md`.
|
||||||
|
|
||||||
|
### Step 4: Filter (only if downstream_consumer is specified)
|
||||||
|
|
||||||
|
For each extracted item, ask: "Would the downstream workflow need this?"
|
||||||
|
- Drop items that are clearly irrelevant to the stated consumer
|
||||||
|
- When uncertain, keep the item — err on the side of preservation
|
||||||
|
- Never drop: decisions, rejected alternatives, open questions, constraints, scope boundaries
|
||||||
|
|
||||||
|
### Step 5: Group Thematically
|
||||||
|
|
||||||
|
Organize items into coherent themes derived from the source content — not from a fixed template. The themes should reflect what the documents are actually about.
|
||||||
|
|
||||||
|
Common groupings (use what fits, omit what doesn't, add what's needed):
|
||||||
|
- Core concept / problem / motivation
|
||||||
|
- Solution / approach / architecture
|
||||||
|
- Users / segments
|
||||||
|
- Technical decisions / constraints
|
||||||
|
- Scope boundaries (in/out/deferred)
|
||||||
|
- Competitive context
|
||||||
|
- Success criteria
|
||||||
|
- Rejected alternatives
|
||||||
|
- Open questions
|
||||||
|
- Risks and opportunities
|
||||||
|
|
||||||
|
### Step 6: Compress Language
|
||||||
|
|
||||||
|
For each item, apply the compression rules from `resources/compression-rules.md`:
|
||||||
|
- Strip prose transitions and connective tissue
|
||||||
|
- Remove hedging and rhetoric
|
||||||
|
- Remove explanations of common knowledge
|
||||||
|
- Preserve specific details (numbers, names, versions, dates)
|
||||||
|
- Ensure the item is self-contained (understandable without reading the source)
|
||||||
|
- Make relationships explicit ("X because Y", "X blocks Y", "X replaces Y")
|
||||||
|
|
||||||
|
### Step 7: Format Output
|
||||||
|
|
||||||
|
Produce the distillate as dense thematically-grouped bullets:
|
||||||
|
- `##` headings for themes — no deeper heading levels needed
|
||||||
|
- `- ` bullets for items — every token must carry signal
|
||||||
|
- No decorative formatting (no bold for emphasis, no horizontal rules)
|
||||||
|
- No prose paragraphs — only bullets
|
||||||
|
- Semicolons to join closely related short items within a single bullet
|
||||||
|
- Each bullet self-contained — understandable without reading other bullets
|
||||||
|
|
||||||
|
Do NOT include frontmatter — the calling skill handles that.
|
||||||
|
|
||||||
|
## Semantic Splitting
|
||||||
|
|
||||||
|
If the splitting decision indicates splitting is needed, load `resources/splitting-strategy.md` and follow it.
|
||||||
|
|
||||||
|
When splitting:
|
||||||
|
|
||||||
|
1. Identify natural semantic boundaries in the content — coherent topic clusters, not arbitrary size breaks.
|
||||||
|
|
||||||
|
2. Produce a **root distillate** containing:
|
||||||
|
- 3-5 bullet orientation (what was distilled, for whom, how many parts)
|
||||||
|
- Cross-references to section distillates
|
||||||
|
- Items that span multiple sections
|
||||||
|
|
||||||
|
3. Produce **section distillates**, each self-sufficient. Include a 1-line context header: "This section covers [topic]. Part N of M from [source document names]."
|
||||||
|
|
||||||
|
## Return Format
|
||||||
|
|
||||||
|
Return a structured result to the calling skill:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"distillate_content": "{the complete distillate text without frontmatter}",
|
||||||
|
"source_headings": ["heading 1", "heading 2"],
|
||||||
|
"source_named_entities": ["entity 1", "entity 2"],
|
||||||
|
"token_estimate": N,
|
||||||
|
"sections": null or [{"topic": "...", "content": "..."}]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- **distillate_content**: The full distillate text
|
||||||
|
- **source_headings**: All Level 2+ headings found across source documents (for completeness verification)
|
||||||
|
- **source_named_entities**: Key named entities (products, companies, people, technologies, decisions) found in sources
|
||||||
|
- **token_estimate**: Approximate token count of the distillate
|
||||||
|
- **sections**: null for single distillates; array of section objects if semantically split
|
||||||
|
|
||||||
|
Do not include conversational text, status updates, or preamble — return only the structured result.
|
||||||
|
|
@ -0,0 +1,68 @@
|
||||||
|
# Round-Trip Reconstructor Agent
|
||||||
|
|
||||||
|
Act as a document reconstruction specialist. Your purpose is to prove a distillate's completeness by reconstructing the original source documents from the distillate alone.
|
||||||
|
|
||||||
|
**Critical constraint:** You receive ONLY the distillate file path. You must NOT have access to the original source documents. If you can see the originals, the test is meaningless.
|
||||||
|
|
||||||
|
## Process
|
||||||
|
|
||||||
|
### Step 1: Analyze the Distillate
|
||||||
|
|
||||||
|
Read the distillate file. Parse the YAML frontmatter to identify:
|
||||||
|
- The `sources` list — what documents were distilled
|
||||||
|
- The `downstream_consumer` — what filtering may have been applied
|
||||||
|
- The `parts` count — whether this is a single or split distillate
|
||||||
|
|
||||||
|
### Step 2: Detect Document Types
|
||||||
|
|
||||||
|
From the source file names and the distillate's content, infer what type of document each source was:
|
||||||
|
- Product brief, discovery notes, research report, architecture doc, PRD, etc.
|
||||||
|
- Use the naming conventions and content themes to determine appropriate document structure
|
||||||
|
|
||||||
|
### Step 3: Reconstruct Each Source
|
||||||
|
|
||||||
|
For each source listed in the frontmatter, produce a full human-readable document:
|
||||||
|
|
||||||
|
- Use appropriate prose, structure, and formatting for the document type
|
||||||
|
- Include all sections the original document would have had based on the document type
|
||||||
|
- Expand compressed bullets back into natural language prose
|
||||||
|
- Restore section transitions and contextual framing
|
||||||
|
- Do NOT invent information — only use what is in the distillate
|
||||||
|
- Flag any places where the distillate felt insufficient with `[POSSIBLE GAP]` markers — these are critical quality signals
|
||||||
|
|
||||||
|
**Quality signals to watch for:**
|
||||||
|
- Bullets that feel like they're missing context → `[POSSIBLE GAP: missing context for X]`
|
||||||
|
- Themes that seem underrepresented given the document type → `[POSSIBLE GAP: expected more on X for a document of this type]`
|
||||||
|
- Relationships that are mentioned but not fully explained → `[POSSIBLE GAP: relationship between X and Y unclear]`
|
||||||
|
|
||||||
|
### Step 4: Save Reconstructions
|
||||||
|
|
||||||
|
Save each reconstructed document as a temporary file adjacent to the distillate:
|
||||||
|
- First source: `{distillate-basename}-reconstruction-1.md`
|
||||||
|
- Second source: `{distillate-basename}-reconstruction-2.md`
|
||||||
|
- And so on for each source
|
||||||
|
|
||||||
|
Each reconstruction should include a header noting it was reconstructed:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
type: distillate-reconstruction
|
||||||
|
source_distillate: "{distillate path}"
|
||||||
|
reconstructed_from: "{original source name}"
|
||||||
|
reconstruction_number: {N}
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 5: Return
|
||||||
|
|
||||||
|
Return a structured result to the calling skill:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"reconstruction_files": ["{path1}", "{path2}"],
|
||||||
|
"possible_gaps": ["gap description 1", "gap description 2"],
|
||||||
|
"source_count": N
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Do not include conversational text, status updates, or preamble — return only the structured result.
|
||||||
|
|
@ -0,0 +1,15 @@
|
||||||
|
type: skill
|
||||||
|
module: core
|
||||||
|
capabilities:
|
||||||
|
- name: bmad-distillator
|
||||||
|
menu-code: DSTL
|
||||||
|
description: "Produces lossless LLM-optimized distillate from source documents. Use after producing large human presentable documents that will be consumed later by LLMs"
|
||||||
|
supports-headless: true
|
||||||
|
input: source documents
|
||||||
|
args: output, validate
|
||||||
|
output: single distillate or folder of distillates next to source input
|
||||||
|
config-vars-used: null
|
||||||
|
phase: anytime
|
||||||
|
before: []
|
||||||
|
after: []
|
||||||
|
is-required: false
|
||||||
|
|
@ -0,0 +1,51 @@
|
||||||
|
# Compression Rules
|
||||||
|
|
||||||
|
These rules govern how source text is compressed into distillate format. Apply as a final pass over all output.
|
||||||
|
|
||||||
|
## Strip — Remove entirely
|
||||||
|
|
||||||
|
- Prose transitions: "As mentioned earlier", "It's worth noting", "In addition to this"
|
||||||
|
- Rhetoric and persuasion: "This is a game-changer", "The exciting thing is"
|
||||||
|
- Hedging: "We believe", "It's likely that", "Perhaps", "It seems"
|
||||||
|
- Self-reference: "This document describes", "As outlined above"
|
||||||
|
- Common knowledge explanations: "Vercel is a cloud platform company", "MIT is an open-source license", "JSON is a data interchange format"
|
||||||
|
- Repeated introductions of the same concept
|
||||||
|
- Section transition paragraphs
|
||||||
|
- Formatting-only elements (decorative bold/italic for emphasis, horizontal rules for visual breaks)
|
||||||
|
- Filler phrases: "In order to", "It should be noted that", "The fact that"
|
||||||
|
|
||||||
|
## Preserve — Keep always
|
||||||
|
|
||||||
|
- Specific numbers, dates, versions, percentages
|
||||||
|
- Named entities (products, companies, people, technologies)
|
||||||
|
- Decisions made and their rationale (compressed: "Decision: X. Reason: Y")
|
||||||
|
- Rejected alternatives and why (compressed: "Rejected: X. Reason: Y")
|
||||||
|
- Explicit constraints and non-negotiables
|
||||||
|
- Dependencies and ordering relationships
|
||||||
|
- Open questions and unresolved items
|
||||||
|
- Scope boundaries (in/out/deferred)
|
||||||
|
- Success criteria and how they're validated
|
||||||
|
- User segments and what success means for each
|
||||||
|
- Risks with their severity signals
|
||||||
|
- Conflicts between source documents
|
||||||
|
|
||||||
|
## Transform — Change form for efficiency
|
||||||
|
|
||||||
|
- Long prose paragraphs → single dense bullet capturing the same information
|
||||||
|
- "We decided to use X because Y and Z" → "X (rationale: Y, Z)"
|
||||||
|
- Repeated category labels → group under a single heading, no per-item labels
|
||||||
|
- "Risk: ... Severity: high" → "HIGH RISK: ..."
|
||||||
|
- Conditional statements → "If X → Y" form
|
||||||
|
- Multi-sentence explanations → semicolon-separated compressed form
|
||||||
|
- Lists of related short items → single bullet with semicolons
|
||||||
|
- "X is used for Y" → "X: Y" when context is clear
|
||||||
|
- Verbose enumerations → parenthetical lists: "platforms (Cursor, Claude Code, Windsurf, Copilot)"
|
||||||
|
|
||||||
|
## Deduplication Rules
|
||||||
|
|
||||||
|
- Same fact in multiple documents → keep the version with most context
|
||||||
|
- Same concept at different detail levels → keep the detailed version
|
||||||
|
- Overlapping lists → merge into single list, no duplicates
|
||||||
|
- When source documents disagree → note the conflict explicitly: "Brief says X; discovery notes say Y — unresolved"
|
||||||
|
- Executive summary points that are expanded elsewhere → keep only the expanded version
|
||||||
|
- Introductory framing repeated across sections → capture once under the most relevant theme
|
||||||
|
|
@ -0,0 +1,227 @@
|
||||||
|
# Distillate Format Reference
|
||||||
|
|
||||||
|
Examples showing the transformation from human-readable source content to distillate format.
|
||||||
|
|
||||||
|
## Frontmatter
|
||||||
|
|
||||||
|
Every distillate includes YAML frontmatter. Source paths are relative to the distillate's location so the distillate remains portable:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
type: bmad-distillate
|
||||||
|
sources:
|
||||||
|
- "product-brief-example.md"
|
||||||
|
- "product-brief-example-discovery-notes.md"
|
||||||
|
downstream_consumer: "PRD creation"
|
||||||
|
created: "2026-03-13"
|
||||||
|
token_estimate: 1200
|
||||||
|
parts: 1
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
## Before/After Examples
|
||||||
|
|
||||||
|
### Prose Paragraph to Dense Bullet
|
||||||
|
|
||||||
|
**Before** (human-readable brief excerpt):
|
||||||
|
```
|
||||||
|
## What Makes This Different
|
||||||
|
|
||||||
|
**The anti-fragmentation layer.** The AI tooling space is fracturing across 40+
|
||||||
|
platforms with no shared methodology layer. BMAD is uniquely positioned to be the
|
||||||
|
cross-platform constant — the structured approach that works the same in Cursor,
|
||||||
|
Claude Code, Windsurf, Copilot, and whatever launches next month. Every other
|
||||||
|
methodology or skill framework maintains its own platform support matrix. By
|
||||||
|
building on the open-source skills CLI ecosystem, BMAD offloads the highest-churn
|
||||||
|
maintenance burden and focuses on what actually differentiates it: the methodology
|
||||||
|
itself.
|
||||||
|
```
|
||||||
|
|
||||||
|
**After** (distillate):
|
||||||
|
```
|
||||||
|
## Differentiation
|
||||||
|
- Anti-fragmentation positioning: BMAD = cross-platform constant across 40+ fragmenting AI tools; no competitor provides shared methodology layer
|
||||||
|
- Platform complexity delegated to Vercel skills CLI ecosystem (MIT); BMAD maintains methodology, not platform configs
|
||||||
|
```
|
||||||
|
|
||||||
|
### Technical Details to Compressed Facts
|
||||||
|
|
||||||
|
**Before** (discovery notes excerpt):
|
||||||
|
```
|
||||||
|
## Competitive Landscape
|
||||||
|
|
||||||
|
- **Vercel Skills.sh**: 83K+ skills, 18 agents, largest curated leaderboard —
|
||||||
|
but dev-only, skills trigger unreliably (20% without explicit prompting)
|
||||||
|
- **SkillsMP**: 400K+ skills directory, pure aggregator with no curation or CLI
|
||||||
|
- **ClawHub/OpenClaw**: ~3.2K curated skills with versioning/rollback, small ecosystem
|
||||||
|
- **Lindy**: No-code AI agent builder for business automation — closed platform,
|
||||||
|
no skill sharing
|
||||||
|
- **Microsoft Copilot Studio**: Enterprise no-code agent builder — vendor-locked
|
||||||
|
to Microsoft
|
||||||
|
- **MindStudio**: No-code AI agent platform — siloed, no interoperability
|
||||||
|
- **Make/Zapier AI**: Workflow automation adding AI agents — workflow-centric,
|
||||||
|
not methodology-centric
|
||||||
|
- **Key gap**: NO competitor combines structured methodology with plugin
|
||||||
|
marketplace — this is BMAD's whitespace
|
||||||
|
```
|
||||||
|
|
||||||
|
**After** (distillate):
|
||||||
|
```
|
||||||
|
## Competitive Landscape
|
||||||
|
- No competitor combines structured methodology + plugin marketplace (whitespace)
|
||||||
|
- Skills.sh (Vercel): 83K skills, 18 agents, dev-only, 20% trigger reliability
|
||||||
|
- SkillsMP: 400K skills, aggregator only, no curation/CLI
|
||||||
|
- ClawHub: 3.2K curated, versioning, small ecosystem
|
||||||
|
- No-code platforms (Lindy, Copilot Studio, MindStudio, Make/Zapier): closed/siloed, no skill portability, business-only
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deduplication Across Documents
|
||||||
|
|
||||||
|
When the same fact appears in both a brief and discovery notes:
|
||||||
|
|
||||||
|
**Brief says:**
|
||||||
|
```
|
||||||
|
bmad-init must always be included as a base skill in every bundle
|
||||||
|
```
|
||||||
|
|
||||||
|
**Discovery notes say:**
|
||||||
|
```
|
||||||
|
bmad-init must always be included as a base skill in every bundle/install
|
||||||
|
(solves bootstrapping problem)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Distillate keeps the more contextual version:**
|
||||||
|
```
|
||||||
|
- bmad-init: always included as base skill in every bundle (solves bootstrapping)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Decision/Rationale Compression
|
||||||
|
|
||||||
|
**Before:**
|
||||||
|
```
|
||||||
|
We decided not to build our own platform support matrix going forward, instead
|
||||||
|
delegating to the Vercel skills CLI ecosystem. The rationale is that maintaining
|
||||||
|
20+ platform configs is the biggest maintenance burden and it's unsustainable
|
||||||
|
at 40+ platforms.
|
||||||
|
```
|
||||||
|
|
||||||
|
**After:**
|
||||||
|
```
|
||||||
|
- Rejected: own platform support matrix. Reason: unsustainable at 40+ platforms; delegate to Vercel CLI ecosystem
|
||||||
|
```
|
||||||
|
|
||||||
|
## Full Example
|
||||||
|
|
||||||
|
A complete distillate produced from a product brief and its discovery notes, targeted at PRD creation:
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
type: bmad-distillate
|
||||||
|
sources:
|
||||||
|
- "product-brief-bmad-next-gen-installer.md"
|
||||||
|
- "product-brief-bmad-next-gen-installer-discovery-notes.md"
|
||||||
|
downstream_consumer: "PRD creation"
|
||||||
|
created: "2026-03-13"
|
||||||
|
token_estimate: 1450
|
||||||
|
parts: 1
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Concept
|
||||||
|
- BMAD Next-Gen Installer: replaces monolithic Node.js CLI with skill-based plugin architecture for distributing BMAD methodology across 40+ AI platforms
|
||||||
|
- Three layers: self-describing plugins (bmad-manifest.json), cross-platform install via Vercel skills CLI (MIT), runtime registration via bmad-init skill
|
||||||
|
- Transforms BMAD from dev-only methodology into open platform for any domain (creative, therapeutic, educational, personal)
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
- Current installer maintains ~20 platform configs manually; each platform convention change requires installer update, test, release — largest maintenance burden on team
|
||||||
|
- Node.js/npm required — blocks non-technical users on UI-based platforms (Claude Co-Work, etc.)
|
||||||
|
- CSV manifests are static, generated once at install; no runtime scanning/registration
|
||||||
|
- Unsustainable at 40+ platforms; new tools launching weekly
|
||||||
|
|
||||||
|
## Solution Architecture
|
||||||
|
- Plugins: skill bundles with Anthropic plugin standard as base format + bmad-manifest.json extending for BMAD-specific metadata (installer options, capabilities, help integration, phase ordering, dependencies)
|
||||||
|
- Existing manifest example: `{"module-code":"bmm","replaces-skill":"bmad-create-product-brief","capabilities":[{"name":"create-brief","menu-code":"CB","supports-headless":true,"phase-name":"1-analysis","after":["brainstorming"],"before":["create-prd"],"is-required":true}]}`
|
||||||
|
- Vercel skills CLI handles platform translation; integration pattern (wrap/fork/call) is PRD decision
|
||||||
|
- bmad-init: global skill scanning installed bmad-manifest.json files, registering capabilities, configuring project settings; always included as base skill in every bundle (solves bootstrapping)
|
||||||
|
- bmad-update: plugin update path without full reinstall; technical approach (diff/replace/preserve customizations) is PRD decision
|
||||||
|
- Distribution tiers: (1) NPX installer wrapping skills CLI for technical users, (2) zip bundle + platform-specific README for non-technical users, (3) future marketplace
|
||||||
|
- Non-technical path has honest friction: "copy to right folder" requires knowing where; per-platform README instructions; improves over time as low-code space matures
|
||||||
|
|
||||||
|
## Differentiation
|
||||||
|
- Anti-fragmentation: BMAD = cross-platform constant; no competitor provides shared methodology layer across AI tools
|
||||||
|
- Curated quality: all submissions gated, human-reviewed by BMad + core team; 13.4% of community skills have critical vulnerabilities (Snyk 2026); quality gate value increases as ecosystem gets noisier
|
||||||
|
- Domain-agnostic: no competitor builds beyond software dev workflows; same plugin system powers any domain via BMAD Builder (separate initiative)
|
||||||
|
|
||||||
|
## Users (ordered by v1 priority)
|
||||||
|
- Module authors (primary v1): package/test/distribute plugins independently without installer changes
|
||||||
|
- Developers: single-command install on any of 40+ platforms via NPX
|
||||||
|
- Non-technical users: install without Node/Git/terminal; emerging segment including PMs, designers, educators
|
||||||
|
- Future plugin creators: non-dev authors using BMAD Builder; need distribution without building own installer
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
- Zero (or near-zero) custom platform directory code; delegated to skills CLI ecosystem
|
||||||
|
- Installation verified on top platforms by volume; skills CLI handles long tail
|
||||||
|
- Non-technical install path validated with non-developer users
|
||||||
|
- bmad-init discovers/registers all plugins from manifests; clear errors for malformed manifests
|
||||||
|
- At least one external module author successfully publishes plugin using manifest system
|
||||||
|
- bmad-update works without full reinstall
|
||||||
|
- Existing CLI users have documented migration path
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
- In: manifest spec, bmad-init, bmad-update, Vercel CLI integration, NPX installer, zip bundles, migration path
|
||||||
|
- Out: BMAD Builder, marketplace web platform, skill conversion (prerequisite, separate), one-click install for all platforms, monetization, quality certification process (gated-submission principle is architectural requirement; process defined separately)
|
||||||
|
- Deferred: CI/CD integration, telemetry for module authors, air-gapped enterprise install, zip bundle integrity verification (checksums/signing), deeper non-technical platform integrations
|
||||||
|
|
||||||
|
## Current Installer (migration context)
|
||||||
|
- Entry: `tools/cli/bmad-cli.js` (Commander.js) → `tools/cli/installers/lib/core/installer.js`
|
||||||
|
- Platforms: `platform-codes.yaml` (~20 platforms with target dirs, legacy dirs, template types, special flags)
|
||||||
|
- Manifests: CSV files (skill/workflow/agent-manifest.csv) are current source of truth, not JSON
|
||||||
|
- External modules: `external-official-modules.yaml` (CIS, GDS, TEA, WDS) from npm with semver
|
||||||
|
- Dependencies: 4-pass resolver (collect → parse → resolve → transitive); YAML-declared only
|
||||||
|
- Config: prompts for name, communication language, document output language, output folder
|
||||||
|
- Skills already use directory-per-skill layout; bmad-manifest.json sidecars exist but are not source of truth
|
||||||
|
- Key shift: CSV-based static manifests → JSON-based runtime scanning
|
||||||
|
|
||||||
|
## Vercel Skills CLI
|
||||||
|
- `npx skills add <source>` — GitHub, GitLab, local paths, git URLs
|
||||||
|
- 40+ agents; per-agent path mappings; symlinks (recommended) or copies
|
||||||
|
- Scopes: project-level or global
|
||||||
|
- Discovery: `skills/`, `.agents/skills/`, agent-specific paths, `.claude-plugin/marketplace.json`
|
||||||
|
- Commands: add, list, find, remove, check, update, init
|
||||||
|
- Non-interactive: `-y`, `--all` flags for CI/CD
|
||||||
|
|
||||||
|
## Competitive Landscape
|
||||||
|
- No competitor combines structured methodology + plugin marketplace (whitespace)
|
||||||
|
- Skills.sh (Vercel): 83K skills, dev-only, 20% trigger reliability without explicit prompting
|
||||||
|
- SkillsMP: 400K skills, aggregator only, no curation
|
||||||
|
- ClawHub: 3.2K curated, versioning, small
|
||||||
|
- No-code platforms (Lindy, Copilot Studio, MindStudio, Make/Zapier): closed/siloed, no skill portability, business-only
|
||||||
|
- Market: $7.84B (2025) → $52.62B (2030); Agent Skills spec ~4 months old, 351K+ skills; standards converging under Linux Foundation AAIF (MCP, AGENTS.md, A2A)
|
||||||
|
|
||||||
|
## Rejected Alternatives
|
||||||
|
- Building own platform support matrix: unsustainable at 40+; delegate to Vercel ecosystem
|
||||||
|
- One-click install for non-technical v1: emerging space; guidance-based, improve over time
|
||||||
|
- Prior roadmap/brainstorming: clean start, unconstrained by previous planning
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
- Vercel CLI integration pattern: wrap/fork/call/peer dependency?
|
||||||
|
- bmad-update mechanics: diff/replace? Preserve user customizations?
|
||||||
|
- Migration story: command/manual reinstall/compatibility shim?
|
||||||
|
- Cross-platform testing: CI matrix for top N? Community testing for rest?
|
||||||
|
- bmad-manifest.json as open standard submission to Agent Skills governance?
|
||||||
|
- Platforms NOT supported by Vercel skills CLI?
|
||||||
|
- Manifest versioning strategy for backward compatibility?
|
||||||
|
- Plugin author getting-started experience and tooling?
|
||||||
|
|
||||||
|
## Opportunities
|
||||||
|
- Module authors as acquisition channel: each published plugin distributes BMAD to creator's audience
|
||||||
|
- CI/CD integration: bmad-init as pipeline one-liner increases stickiness
|
||||||
|
- Educational institutions: structured methodology + non-technical install → university AI curriculum
|
||||||
|
- Skill composability: mixing BMAD modules with third-party skills for custom methodology stacks
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
- Manifest format evolution creates versioning/compatibility burden once third-party authors publish
|
||||||
|
- Quality gate needs defined process, not just claim — gated review model addresses
|
||||||
|
- 40+ platform testing environments even with Vercel handling translation
|
||||||
|
- Scope creep pressure from marketplace vision (explicitly excluded but primary long-term value)
|
||||||
|
- Vercel dependency: minor supply-chain risk; MIT license allows fork if deprioritized
|
||||||
|
```
|
||||||
|
|
@ -0,0 +1,78 @@
|
||||||
|
# Semantic Splitting Strategy
|
||||||
|
|
||||||
|
When the source content is large (exceeds ~15,000 tokens) or a token_budget requires it, split the distillate into semantically coherent sections rather than arbitrary size breaks.
|
||||||
|
|
||||||
|
## Why Semantic Over Size-Based
|
||||||
|
|
||||||
|
Arbitrary splits (every N tokens) break coherence. A downstream workflow loading "part 2 of 4" gets context fragments. Semantic splits produce self-contained topic clusters that a workflow can load selectively — "give me just the technical decisions section" — which is more useful and more token-efficient for the consumer.
|
||||||
|
|
||||||
|
## Splitting Process
|
||||||
|
|
||||||
|
### 1. Identify Natural Boundaries
|
||||||
|
|
||||||
|
After the initial extraction and deduplication (Steps 1-2 of the compression process), look for natural semantic boundaries:
|
||||||
|
- Distinct problem domains or functional areas
|
||||||
|
- Different stakeholder perspectives (users, technical, business)
|
||||||
|
- Temporal boundaries (current state vs future vision)
|
||||||
|
- Scope boundaries (in-scope vs out-of-scope vs deferred)
|
||||||
|
- Phase boundaries (analysis, design, implementation)
|
||||||
|
|
||||||
|
Choose boundaries that produce sections a downstream workflow might load independently.
|
||||||
|
|
||||||
|
### 2. Assign Items to Sections
|
||||||
|
|
||||||
|
For each extracted item, assign it to the most relevant section. Items that span multiple sections go in the root distillate.
|
||||||
|
|
||||||
|
Cross-cutting items (items relevant to multiple sections):
|
||||||
|
- Constraints that affect all areas → root distillate
|
||||||
|
- Decisions with broad impact → root distillate
|
||||||
|
- Section-specific decisions → section distillate
|
||||||
|
|
||||||
|
### 3. Produce Root Distillate
|
||||||
|
|
||||||
|
The root distillate contains:
|
||||||
|
- **Orientation** (3-5 bullets): what was distilled, from what sources, for what consumer, how many sections
|
||||||
|
- **Cross-references**: list of section distillates with 1-line descriptions
|
||||||
|
- **Cross-cutting items**: facts, decisions, and constraints that span multiple sections
|
||||||
|
- **Scope summary**: high-level in/out/deferred if applicable
|
||||||
|
|
||||||
|
### 4. Produce Section Distillates
|
||||||
|
|
||||||
|
Each section distillate must be self-sufficient — a reader loading only one section should understand it without the others.
|
||||||
|
|
||||||
|
Each section includes:
|
||||||
|
- **Context header** (1 line): "This section covers [topic]. Part N of M from [source document names]."
|
||||||
|
- **Section content**: thematically-grouped bullets following the same compression rules as a single distillate
|
||||||
|
- **Cross-references** (if needed): pointers to other sections for related content
|
||||||
|
|
||||||
|
### 5. Output Structure
|
||||||
|
|
||||||
|
Create a folder `{base-name}-distillate/` containing:
|
||||||
|
|
||||||
|
```
|
||||||
|
{base-name}-distillate/
|
||||||
|
├── _index.md # Root distillate: orientation, cross-cutting items, section manifest
|
||||||
|
├── 01-{topic-slug}.md # Self-contained section
|
||||||
|
├── 02-{topic-slug}.md
|
||||||
|
└── 03-{topic-slug}.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```
|
||||||
|
product-brief-distillate/
|
||||||
|
├── _index.md
|
||||||
|
├── 01-problem-solution.md
|
||||||
|
├── 02-technical-decisions.md
|
||||||
|
└── 03-users-market.md
|
||||||
|
```
|
||||||
|
|
||||||
|
## Size Targets
|
||||||
|
|
||||||
|
When a token_budget is specified:
|
||||||
|
- Root distillate: ~20% of budget (orientation + cross-cutting items)
|
||||||
|
- Remaining budget split proportionally across sections based on content density
|
||||||
|
- If a section exceeds its proportional share, compress more aggressively or sub-split
|
||||||
|
|
||||||
|
When no token_budget but splitting is needed:
|
||||||
|
- Aim for sections of 3,000-5,000 tokens each
|
||||||
|
- Root distillate as small as possible while remaining useful standalone
|
||||||
|
|
@ -0,0 +1,300 @@
|
||||||
|
# /// script
|
||||||
|
# /// requires-python = ">=3.10"
|
||||||
|
# /// dependencies = []
|
||||||
|
# ///
|
||||||
|
"""Analyze source documents for the distillation generator.
|
||||||
|
|
||||||
|
Enumerates files from paths/folders/globs, computes sizes and token estimates,
|
||||||
|
detects document types from naming conventions, and suggests groupings for
|
||||||
|
related documents (e.g., a brief paired with its discovery notes).
|
||||||
|
|
||||||
|
Accepts: file paths, folder paths (scans recursively for .md/.txt/.yaml/.yml/.json),
|
||||||
|
or glob patterns. Skips node_modules, .git, __pycache__, .venv, _bmad-output.
|
||||||
|
|
||||||
|
Output JSON structure:
|
||||||
|
status: "ok" | "error"
|
||||||
|
files[]: path, filename, size_bytes, estimated_tokens, doc_type
|
||||||
|
summary: total_files, total_size_bytes, total_estimated_tokens
|
||||||
|
groups[]: group_key, files[] with role (primary/companion/standalone)
|
||||||
|
- Groups related docs by naming convention (e.g., brief + discovery-notes)
|
||||||
|
routing: recommendation ("single" | "fan-out"), reason
|
||||||
|
- single: ≤3 files AND ≤15K estimated tokens
|
||||||
|
- fan-out: >3 files OR >15K estimated tokens
|
||||||
|
split_prediction: prediction ("likely" | "unlikely"), reason, estimated_distillate_tokens
|
||||||
|
- Estimates distillate at ~1/3 source size; splits if >5K tokens
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import glob
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Extensions to include when scanning folders
|
||||||
|
INCLUDE_EXTENSIONS = {".md", ".txt", ".yaml", ".yml", ".json"}
|
||||||
|
|
||||||
|
# Directories to skip when scanning folders
|
||||||
|
SKIP_DIRS = {
|
||||||
|
"node_modules", ".git", "__pycache__", ".venv", "venv",
|
||||||
|
".claude", "_bmad-output", ".cursor", ".vscode",
|
||||||
|
}
|
||||||
|
|
||||||
|
# Approximate chars per token for estimation
|
||||||
|
CHARS_PER_TOKEN = 4
|
||||||
|
|
||||||
|
# Thresholds
|
||||||
|
SINGLE_COMPRESSOR_MAX_TOKENS = 15_000
|
||||||
|
SINGLE_DISTILLATE_MAX_TOKENS = 5_000
|
||||||
|
|
||||||
|
# Naming patterns for document type detection
|
||||||
|
DOC_TYPE_PATTERNS = [
|
||||||
|
(r"discovery[_-]notes", "discovery-notes"),
|
||||||
|
(r"product[_-]brief", "product-brief"),
|
||||||
|
(r"research[_-]report", "research-report"),
|
||||||
|
(r"architecture", "architecture-doc"),
|
||||||
|
(r"prd", "prd"),
|
||||||
|
(r"distillate", "distillate"),
|
||||||
|
(r"changelog", "changelog"),
|
||||||
|
(r"readme", "readme"),
|
||||||
|
(r"spec", "specification"),
|
||||||
|
(r"requirements", "requirements"),
|
||||||
|
(r"design[_-]doc", "design-doc"),
|
||||||
|
(r"meeting[_-]notes", "meeting-notes"),
|
||||||
|
(r"brainstorm", "brainstorming"),
|
||||||
|
(r"interview", "interview-notes"),
|
||||||
|
]
|
||||||
|
|
||||||
|
# Patterns for grouping related documents
|
||||||
|
GROUP_PATTERNS = [
|
||||||
|
# base document + discovery notes
|
||||||
|
(r"^(.+?)(?:-discovery-notes|-discovery_notes)\.(\w+)$", r"\1.\2"),
|
||||||
|
# base document + appendix
|
||||||
|
(r"^(.+?)(?:-appendix|-addendum)(?:-\w+)?\.(\w+)$", r"\1.\2"),
|
||||||
|
# base document + review/feedback
|
||||||
|
(r"^(.+?)(?:-review|-feedback)\.(\w+)$", r"\1.\2"),
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_inputs(inputs: list[str]) -> list[Path]:
|
||||||
|
"""Resolve input arguments to a flat list of file paths."""
|
||||||
|
files: list[Path] = []
|
||||||
|
for inp in inputs:
|
||||||
|
path = Path(inp)
|
||||||
|
if path.is_file():
|
||||||
|
files.append(path.resolve())
|
||||||
|
elif path.is_dir():
|
||||||
|
for root, dirs, filenames in os.walk(path):
|
||||||
|
dirs[:] = [d for d in dirs if d not in SKIP_DIRS]
|
||||||
|
for fn in sorted(filenames):
|
||||||
|
fp = Path(root) / fn
|
||||||
|
if fp.suffix.lower() in INCLUDE_EXTENSIONS:
|
||||||
|
files.append(fp.resolve())
|
||||||
|
else:
|
||||||
|
# Try as glob
|
||||||
|
matches = glob.glob(inp, recursive=True)
|
||||||
|
for m in sorted(matches):
|
||||||
|
mp = Path(m)
|
||||||
|
if mp.is_file() and mp.suffix.lower() in INCLUDE_EXTENSIONS:
|
||||||
|
files.append(mp.resolve())
|
||||||
|
# Deduplicate while preserving order
|
||||||
|
seen: set[Path] = set()
|
||||||
|
deduped: list[Path] = []
|
||||||
|
for f in files:
|
||||||
|
if f not in seen:
|
||||||
|
seen.add(f)
|
||||||
|
deduped.append(f)
|
||||||
|
return deduped
|
||||||
|
|
||||||
|
|
||||||
|
def detect_doc_type(filename: str) -> str:
|
||||||
|
"""Detect document type from filename."""
|
||||||
|
name_lower = filename.lower()
|
||||||
|
for pattern, doc_type in DOC_TYPE_PATTERNS:
|
||||||
|
if re.search(pattern, name_lower):
|
||||||
|
return doc_type
|
||||||
|
return "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
def suggest_groups(files: list[Path]) -> list[dict]:
|
||||||
|
"""Suggest document groupings based on naming conventions."""
|
||||||
|
groups: dict[str, list[dict]] = {}
|
||||||
|
ungrouped: list[dict] = []
|
||||||
|
|
||||||
|
file_map = {f.name: f for f in files}
|
||||||
|
|
||||||
|
assigned: set[str] = set()
|
||||||
|
|
||||||
|
for f in files:
|
||||||
|
if f.name in assigned:
|
||||||
|
continue
|
||||||
|
|
||||||
|
matched = False
|
||||||
|
for pattern, base_pattern in GROUP_PATTERNS:
|
||||||
|
m = re.match(pattern, f.name, re.IGNORECASE)
|
||||||
|
if m:
|
||||||
|
# This file is a companion — find its base
|
||||||
|
base_name = re.sub(pattern, base_pattern, f.name, flags=re.IGNORECASE)
|
||||||
|
group_key = base_name
|
||||||
|
if group_key not in groups:
|
||||||
|
groups[group_key] = []
|
||||||
|
# Add the base file if it exists
|
||||||
|
if base_name in file_map and base_name not in assigned:
|
||||||
|
groups[group_key].append({
|
||||||
|
"path": str(file_map[base_name]),
|
||||||
|
"filename": base_name,
|
||||||
|
"role": "primary",
|
||||||
|
})
|
||||||
|
assigned.add(base_name)
|
||||||
|
groups[group_key].append({
|
||||||
|
"path": str(f),
|
||||||
|
"filename": f.name,
|
||||||
|
"role": "companion",
|
||||||
|
})
|
||||||
|
assigned.add(f.name)
|
||||||
|
matched = True
|
||||||
|
break
|
||||||
|
|
||||||
|
if not matched:
|
||||||
|
# Check if this file is a base that already has companions
|
||||||
|
if f.name in groups:
|
||||||
|
continue # Already added as primary
|
||||||
|
ungrouped.append({
|
||||||
|
"path": str(f),
|
||||||
|
"filename": f.name,
|
||||||
|
})
|
||||||
|
|
||||||
|
result = []
|
||||||
|
for group_key, members in groups.items():
|
||||||
|
result.append({
|
||||||
|
"group_key": group_key,
|
||||||
|
"files": members,
|
||||||
|
})
|
||||||
|
for ug in ungrouped:
|
||||||
|
if ug["filename"] not in assigned:
|
||||||
|
result.append({
|
||||||
|
"group_key": ug["filename"],
|
||||||
|
"files": [{"path": ug["path"], "filename": ug["filename"], "role": "standalone"}],
|
||||||
|
})
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def analyze(inputs: list[str], output_path: str | None = None) -> None:
|
||||||
|
"""Main analysis function."""
|
||||||
|
files = resolve_inputs(inputs)
|
||||||
|
|
||||||
|
if not files:
|
||||||
|
result = {
|
||||||
|
"status": "error",
|
||||||
|
"error": "No readable files found from provided inputs",
|
||||||
|
"inputs": inputs,
|
||||||
|
}
|
||||||
|
output_json(result, output_path)
|
||||||
|
return
|
||||||
|
|
||||||
|
# Analyze each file
|
||||||
|
file_details = []
|
||||||
|
total_chars = 0
|
||||||
|
for f in files:
|
||||||
|
size = f.stat().st_size
|
||||||
|
total_chars += size
|
||||||
|
file_details.append({
|
||||||
|
"path": str(f),
|
||||||
|
"filename": f.name,
|
||||||
|
"size_bytes": size,
|
||||||
|
"estimated_tokens": size // CHARS_PER_TOKEN,
|
||||||
|
"doc_type": detect_doc_type(f.name),
|
||||||
|
})
|
||||||
|
|
||||||
|
total_tokens = total_chars // CHARS_PER_TOKEN
|
||||||
|
groups = suggest_groups(files)
|
||||||
|
|
||||||
|
# Routing recommendation
|
||||||
|
if len(files) <= 3 and total_tokens <= SINGLE_COMPRESSOR_MAX_TOKENS:
|
||||||
|
routing = "single"
|
||||||
|
routing_reason = (
|
||||||
|
f"{len(files)} file(s), ~{total_tokens:,} estimated tokens — "
|
||||||
|
f"within single compressor threshold"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
routing = "fan-out"
|
||||||
|
routing_reason = (
|
||||||
|
f"{len(files)} file(s), ~{total_tokens:,} estimated tokens — "
|
||||||
|
f"exceeds single compressor threshold "
|
||||||
|
f"({'>' + str(SINGLE_COMPRESSOR_MAX_TOKENS) + ' tokens' if total_tokens > SINGLE_COMPRESSOR_MAX_TOKENS else '> 3 files'})"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Split prediction
|
||||||
|
estimated_distillate_tokens = total_tokens // 3 # rough: distillate is ~1/3 of source
|
||||||
|
if estimated_distillate_tokens > SINGLE_DISTILLATE_MAX_TOKENS:
|
||||||
|
split_prediction = "likely"
|
||||||
|
split_reason = (
|
||||||
|
f"Estimated distillate ~{estimated_distillate_tokens:,} tokens "
|
||||||
|
f"exceeds {SINGLE_DISTILLATE_MAX_TOKENS:,} threshold"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
split_prediction = "unlikely"
|
||||||
|
split_reason = (
|
||||||
|
f"Estimated distillate ~{estimated_distillate_tokens:,} tokens "
|
||||||
|
f"within {SINGLE_DISTILLATE_MAX_TOKENS:,} threshold"
|
||||||
|
)
|
||||||
|
|
||||||
|
result = {
|
||||||
|
"status": "ok",
|
||||||
|
"files": file_details,
|
||||||
|
"summary": {
|
||||||
|
"total_files": len(files),
|
||||||
|
"total_size_bytes": total_chars,
|
||||||
|
"total_estimated_tokens": total_tokens,
|
||||||
|
},
|
||||||
|
"groups": groups,
|
||||||
|
"routing": {
|
||||||
|
"recommendation": routing,
|
||||||
|
"reason": routing_reason,
|
||||||
|
},
|
||||||
|
"split_prediction": {
|
||||||
|
"prediction": split_prediction,
|
||||||
|
"reason": split_reason,
|
||||||
|
"estimated_distillate_tokens": estimated_distillate_tokens,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
output_json(result, output_path)
|
||||||
|
|
||||||
|
|
||||||
|
def output_json(data: dict, output_path: str | None) -> None:
|
||||||
|
"""Write JSON to file or stdout."""
|
||||||
|
json_str = json.dumps(data, indent=2)
|
||||||
|
if output_path:
|
||||||
|
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
Path(output_path).write_text(json_str + "\n")
|
||||||
|
print(f"Results written to {output_path}", file=sys.stderr)
|
||||||
|
else:
|
||||||
|
print(json_str)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description=__doc__,
|
||||||
|
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"inputs",
|
||||||
|
nargs="+",
|
||||||
|
help="File paths, folder paths, or glob patterns to analyze",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"-o", "--output",
|
||||||
|
help="Output JSON to file instead of stdout",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
analyze(args.inputs, args.output)
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -0,0 +1,204 @@
|
||||||
|
"""Tests for analyze_sources.py"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import tempfile
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest.mock import patch
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
# Add parent dir to path so we can import the script
|
||||||
|
import sys
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent))
|
||||||
|
|
||||||
|
from analyze_sources import (
|
||||||
|
resolve_inputs,
|
||||||
|
detect_doc_type,
|
||||||
|
suggest_groups,
|
||||||
|
analyze,
|
||||||
|
INCLUDE_EXTENSIONS,
|
||||||
|
SKIP_DIRS,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@pytest.fixture
|
||||||
|
def temp_dir():
|
||||||
|
"""Create a temp directory with sample files."""
|
||||||
|
with tempfile.TemporaryDirectory() as d:
|
||||||
|
# Create sample files
|
||||||
|
(Path(d) / "product-brief-foo.md").write_text("# Product Brief\nContent here")
|
||||||
|
(Path(d) / "product-brief-foo-discovery-notes.md").write_text("# Discovery\nNotes")
|
||||||
|
(Path(d) / "architecture-doc.md").write_text("# Architecture\nDesign here")
|
||||||
|
(Path(d) / "research-report.md").write_text("# Research\nFindings")
|
||||||
|
(Path(d) / "random.txt").write_text("Some text content")
|
||||||
|
(Path(d) / "image.png").write_bytes(b"\x89PNG")
|
||||||
|
# Create a subdirectory with more files
|
||||||
|
sub = Path(d) / "subdir"
|
||||||
|
sub.mkdir()
|
||||||
|
(sub / "prd-v2.md").write_text("# PRD\nRequirements")
|
||||||
|
# Create a skip directory
|
||||||
|
skip = Path(d) / "node_modules"
|
||||||
|
skip.mkdir()
|
||||||
|
(skip / "junk.md").write_text("Should be skipped")
|
||||||
|
yield d
|
||||||
|
|
||||||
|
|
||||||
|
class TestResolveInputs:
|
||||||
|
def test_single_file(self, temp_dir):
|
||||||
|
f = str(Path(temp_dir) / "product-brief-foo.md")
|
||||||
|
result = resolve_inputs([f])
|
||||||
|
assert len(result) == 1
|
||||||
|
assert result[0].name == "product-brief-foo.md"
|
||||||
|
|
||||||
|
def test_folder_recursion(self, temp_dir):
|
||||||
|
result = resolve_inputs([temp_dir])
|
||||||
|
names = {f.name for f in result}
|
||||||
|
assert "product-brief-foo.md" in names
|
||||||
|
assert "prd-v2.md" in names
|
||||||
|
assert "random.txt" in names
|
||||||
|
|
||||||
|
def test_folder_skips_excluded_dirs(self, temp_dir):
|
||||||
|
result = resolve_inputs([temp_dir])
|
||||||
|
names = {f.name for f in result}
|
||||||
|
assert "junk.md" not in names
|
||||||
|
|
||||||
|
def test_folder_skips_non_text_files(self, temp_dir):
|
||||||
|
result = resolve_inputs([temp_dir])
|
||||||
|
names = {f.name for f in result}
|
||||||
|
assert "image.png" not in names
|
||||||
|
|
||||||
|
def test_glob_pattern(self, temp_dir):
|
||||||
|
pattern = str(Path(temp_dir) / "product-brief-*.md")
|
||||||
|
result = resolve_inputs([pattern])
|
||||||
|
assert len(result) == 2
|
||||||
|
names = {f.name for f in result}
|
||||||
|
assert "product-brief-foo.md" in names
|
||||||
|
assert "product-brief-foo-discovery-notes.md" in names
|
||||||
|
|
||||||
|
def test_deduplication(self, temp_dir):
|
||||||
|
f = str(Path(temp_dir) / "product-brief-foo.md")
|
||||||
|
result = resolve_inputs([f, f, f])
|
||||||
|
assert len(result) == 1
|
||||||
|
|
||||||
|
def test_mixed_inputs(self, temp_dir):
|
||||||
|
file_path = str(Path(temp_dir) / "architecture-doc.md")
|
||||||
|
folder_path = str(Path(temp_dir) / "subdir")
|
||||||
|
result = resolve_inputs([file_path, folder_path])
|
||||||
|
names = {f.name for f in result}
|
||||||
|
assert "architecture-doc.md" in names
|
||||||
|
assert "prd-v2.md" in names
|
||||||
|
|
||||||
|
def test_nonexistent_path(self):
|
||||||
|
result = resolve_inputs(["/nonexistent/path/file.md"])
|
||||||
|
assert len(result) == 0
|
||||||
|
|
||||||
|
|
||||||
|
class TestDetectDocType:
|
||||||
|
@pytest.mark.parametrize("filename,expected", [
|
||||||
|
("product-brief-foo.md", "product-brief"),
|
||||||
|
("product_brief_bar.md", "product-brief"),
|
||||||
|
("foo-discovery-notes.md", "discovery-notes"),
|
||||||
|
("foo-discovery_notes.md", "discovery-notes"),
|
||||||
|
("architecture-overview.md", "architecture-doc"),
|
||||||
|
("my-prd.md", "prd"),
|
||||||
|
("research-report-q4.md", "research-report"),
|
||||||
|
("foo-distillate.md", "distillate"),
|
||||||
|
("changelog.md", "changelog"),
|
||||||
|
("readme.md", "readme"),
|
||||||
|
("api-spec.md", "specification"),
|
||||||
|
("design-doc-v2.md", "design-doc"),
|
||||||
|
("meeting-notes-2026.md", "meeting-notes"),
|
||||||
|
("brainstorm-session.md", "brainstorming"),
|
||||||
|
("user-interview-notes.md", "interview-notes"),
|
||||||
|
("random-file.md", "unknown"),
|
||||||
|
])
|
||||||
|
def test_detection(self, filename, expected):
|
||||||
|
assert detect_doc_type(filename) == expected
|
||||||
|
|
||||||
|
|
||||||
|
class TestSuggestGroups:
|
||||||
|
def test_groups_brief_with_discovery_notes(self, temp_dir):
|
||||||
|
files = [
|
||||||
|
Path(temp_dir) / "product-brief-foo.md",
|
||||||
|
Path(temp_dir) / "product-brief-foo-discovery-notes.md",
|
||||||
|
]
|
||||||
|
groups = suggest_groups(files)
|
||||||
|
# Should produce one group with both files
|
||||||
|
paired = [g for g in groups if len(g["files"]) > 1]
|
||||||
|
assert len(paired) == 1
|
||||||
|
filenames = {f["filename"] for f in paired[0]["files"]}
|
||||||
|
assert "product-brief-foo.md" in filenames
|
||||||
|
assert "product-brief-foo-discovery-notes.md" in filenames
|
||||||
|
|
||||||
|
def test_standalone_files(self, temp_dir):
|
||||||
|
files = [
|
||||||
|
Path(temp_dir) / "architecture-doc.md",
|
||||||
|
Path(temp_dir) / "research-report.md",
|
||||||
|
]
|
||||||
|
groups = suggest_groups(files)
|
||||||
|
assert len(groups) == 2
|
||||||
|
for g in groups:
|
||||||
|
assert len(g["files"]) == 1
|
||||||
|
|
||||||
|
def test_mixed_grouped_and_standalone(self, temp_dir):
|
||||||
|
files = [
|
||||||
|
Path(temp_dir) / "product-brief-foo.md",
|
||||||
|
Path(temp_dir) / "product-brief-foo-discovery-notes.md",
|
||||||
|
Path(temp_dir) / "architecture-doc.md",
|
||||||
|
]
|
||||||
|
groups = suggest_groups(files)
|
||||||
|
paired = [g for g in groups if len(g["files"]) > 1]
|
||||||
|
standalone = [g for g in groups if len(g["files"]) == 1]
|
||||||
|
assert len(paired) == 1
|
||||||
|
assert len(standalone) == 1
|
||||||
|
|
||||||
|
|
||||||
|
class TestAnalyze:
|
||||||
|
def test_basic_analysis(self, temp_dir):
|
||||||
|
f = str(Path(temp_dir) / "product-brief-foo.md")
|
||||||
|
output_file = str(Path(temp_dir) / "output.json")
|
||||||
|
analyze([f], output_file)
|
||||||
|
result = json.loads(Path(output_file).read_text())
|
||||||
|
assert result["status"] == "ok"
|
||||||
|
assert result["summary"]["total_files"] == 1
|
||||||
|
assert result["files"][0]["doc_type"] == "product-brief"
|
||||||
|
assert result["files"][0]["estimated_tokens"] > 0
|
||||||
|
|
||||||
|
def test_routing_single_small_input(self, temp_dir):
|
||||||
|
f = str(Path(temp_dir) / "product-brief-foo.md")
|
||||||
|
output_file = str(Path(temp_dir) / "output.json")
|
||||||
|
analyze([f], output_file)
|
||||||
|
result = json.loads(Path(output_file).read_text())
|
||||||
|
assert result["routing"]["recommendation"] == "single"
|
||||||
|
|
||||||
|
def test_routing_fanout_many_files(self, temp_dir):
|
||||||
|
# Create enough files to trigger fan-out (> 3 files)
|
||||||
|
for i in range(5):
|
||||||
|
(Path(temp_dir) / f"doc-{i}.md").write_text("x" * 1000)
|
||||||
|
output_file = str(Path(temp_dir) / "output.json")
|
||||||
|
analyze([temp_dir], output_file)
|
||||||
|
result = json.loads(Path(output_file).read_text())
|
||||||
|
assert result["routing"]["recommendation"] == "fan-out"
|
||||||
|
|
||||||
|
def test_folder_analysis(self, temp_dir):
|
||||||
|
output_file = str(Path(temp_dir) / "output.json")
|
||||||
|
analyze([temp_dir], output_file)
|
||||||
|
result = json.loads(Path(output_file).read_text())
|
||||||
|
assert result["status"] == "ok"
|
||||||
|
assert result["summary"]["total_files"] >= 4 # at least the base files
|
||||||
|
assert len(result["groups"]) > 0
|
||||||
|
|
||||||
|
def test_no_files_found(self):
|
||||||
|
output_file = "/tmp/test_analyze_empty.json"
|
||||||
|
analyze(["/nonexistent/path"], output_file)
|
||||||
|
result = json.loads(Path(output_file).read_text())
|
||||||
|
assert result["status"] == "error"
|
||||||
|
os.unlink(output_file)
|
||||||
|
|
||||||
|
def test_stdout_output(self, temp_dir, capsys):
|
||||||
|
f = str(Path(temp_dir) / "product-brief-foo.md")
|
||||||
|
analyze([f])
|
||||||
|
captured = capsys.readouterr()
|
||||||
|
result = json.loads(captured.out)
|
||||||
|
assert result["status"] == "ok"
|
||||||
Loading…
Reference in New Issue