From b0b1796227a042af1580460dc6a75f00a0707482 Mon Sep 17 00:00:00 2001 From: Brian Date: Sat, 20 Jun 2026 17:44:21 -0500 Subject: [PATCH 1/2] feat(party-mode): persistent per-party memory (#2484) * feat(party-mode): add persistent per-party memory Each party now keeps a succinct, append-only memlog (the memlog standard) under {memory_dir}//, so a room remembers prior sessions and opens in character carrying them forward. - Memory accrues live: capture memorable beats as they land, with a floor so an abandoned session still leaves a trace; wrap-up is a top-up. - Read distills via a reader subagent that returns only current standing state (latest dynamic per pair, open threads, recent callbacks) so the raw log never enters the party context. - Writes are silent and fail safe: a missing or erroring memlog.py is skipped without breaking the fiction. - New customize knobs: party_memory (on by default) and memory_dir. Keyed per party (group id, or `installed` for the default room); ad-hoc casts stay ephemeral. On-disk compaction is left to a future memlog.py pass. * refactor(party-mode): standard structure, per-group memory, keep on-the-fly cast - Restructure SKILL.md to the standard skill shape (intro -> Conventions -> On Activation -> content); consolidate all performance rules into one "Keep It Feeling Like a Party" section. SKILL.md ~500 tokens lighter. - Per-group `memory` flag: global party_memory now governs only the default room; resolve_party.py resolves memory_enabled per active roster (default room -> party_memory, named group -> own flag), with tests. - On-the-fly characters are captured as memlog entries during a session; at wrap-up the room offers to save them into the party via bmad-customize. - Memory mechanics consolidated into references/party-memory.md; SKILL.md step 5 just routes to it. - Docs updated. * docs(party-mode): fix open-cast lock-down claim and python3->uv run in create-party --- .gitignore | 2 + docs/explanation/party-mode.md | 12 ++- src/core-skills/bmad-party-mode/SKILL.md | 85 +++++++------------ .../bmad-party-mode/customize.toml | 25 ++++++ .../references/create-party.md | 15 ++-- .../references/party-memory.md | 51 +++++++++++ .../bmad-party-mode/scripts/resolve_party.py | 9 +- .../scripts/tests/test-resolve_party.py | 8 ++ 8 files changed, 147 insertions(+), 60 deletions(-) create mode 100644 src/core-skills/bmad-party-mode/references/party-memory.md diff --git a/.gitignore b/.gitignore index 99e48d9ab..b903b294a 100644 --- a/.gitignore +++ b/.gitignore @@ -47,6 +47,8 @@ CLAUDE.local.md .claude/settings.local.json .junie/ .agents/ +.analysis/ + z*/ !docs/zh-cn/ diff --git a/docs/explanation/party-mode.md b/docs/explanation/party-mode.md index 4d3483fc8..baa2e4505 100644 --- a/docs/explanation/party-mode.md +++ b/docs/explanation/party-mode.md @@ -5,7 +5,7 @@ sidebar: order: 11 --- -Party mode puts your AI agents in one room and lets them talk, to each other and to you. This page explains what a party is, the four ways it can run, and how to build your own cast of personas instead of using the installed agents. +Party mode puts your AI agents in one room and lets them talk, to each other and to you. This page explains what a party is, the four ways it can run, how to build your own cast of personas instead of using the installed agents, and how a party remembers you between sessions. ## What is Party Mode? @@ -131,6 +131,16 @@ Whichever mode is running, the orchestrator presents the result as one conversat You aren't limited to a single group. Pull members from several parties into the same conversation, or name a cast on the spot, and let them mix. Picture the Golden Girls thrown into an architecture review with Martin Fowler and Linus Torvalds, sparring over a change request: you can imagine how that goes. ::: +## The room remembers + +Give a party a memory and it picks up where you left off. It keeps its own record of your past sessions — the dynamics that built up between members, the threads you left open, and where earlier conversations landed. Reopen it a week later and that history is intact: two members who came to blows last time still open a little frosty, and a sharp line from a past session can resurface as an organic callback. + +It's memory, not a transcript. The room carries the few things worth remembering, not a log of everything said, so the next conversation feels continuous without dragging the whole past into it. It happens on its own, in the background — nothing to save, and the room never breaks character to announce it. + +A character who turns up on the fly is remembered too — a walk-on from an open-cast scene, or someone you add mid-conversation. At the end of a session the room offers to keep the new arrivals, folding them into the party so they can come back next time. + +Memory is set per party. When you create or save a party you're asked whether it should remember; the default installed-agent room remembers unless you turn it off. Set or change any of this through `/bmad-customize bmad-party-mode`. + ## A keepsake of the session When you wrap up, the orchestrator offers a keepsake: a single self-contained HTML document of the session to keep or share. It lays the conversation out by persona rather than dumping a raw transcript. Decline it and the party simply ends. diff --git a/src/core-skills/bmad-party-mode/SKILL.md b/src/core-skills/bmad-party-mode/SKILL.md index bb291701b..e1cf3c59a 100644 --- a/src/core-skills/bmad-party-mode/SKILL.md +++ b/src/core-skills/bmad-party-mode/SKILL.md @@ -5,38 +5,38 @@ description: 'Orchestrates lively group discussions between installed BMAD agent # Party Mode -Run a round-table where BMAD agents talk to each other, and to the user, like a real group of distinct people in conversation. Your job as orchestrator is to make it feel like a genuine conversation: fast, in-character, opinionated, and fun. Everything below is an objective, not a script. Use whatever mechanism your model and harness make available to hit it. - -**Two intents.** Usually the user wants to *run* a party — that's everything below. If instead they want to *create or configure* one — invent a cast, add a persona, distill customer data into a focus-group panel, set a default, or **edit an existing custom party** (retune a member, add someone to a group) — load `references/create-party.md` and follow it. Detect which from how they invoke the skill; when it's unclear, ask. Neither intent has a headless contract: running a party is the live conversation itself, and the authoring path's only write goes through `bmad-customize`, which gates it. - -## What "Good" Feels Like - -- **It reads like people talking, not reports being filed.** Short turns. Reactions to what was just said. Banter. The energy of a group chat, not a stack of memos. -- **Every persona is unmistakably themselves:** their voice, humor, pet peeves, and ethos. If you hid the name labels, you'd still know who's speaking. -- **They clash.** Real drama beats consensus. Agents should challenge each other, push back hard, and get heated when the topic warrants it. Nobody is here to clap each other (or the user) on the back. If a round turns into mutual agreement, it failed: bring in a dissenter or hand someone the contrarian role. -- **Brevity by default.** A persona goes long only when the user asks that persona to dig into something. Nobody delivers a wall of text unprompted. One voice might run long now and then, but a real group is never everyone monologuing at once. - -If a round comes back feeling like four essays stapled together, you missed the objective. Tighten it the next round. +Run a round-table where these agents talk to each other and to the user like real, distinct people in conversation. You're the orchestrator. ## Conventions -- Bare paths (e.g. `references/create-party.md`) resolve from `{skill-root}`, where `customize.toml` lives; `{project-root}`-prefixed paths from the project working directory. +- **Paths:** bare paths (e.g. `references/create-party.md`) resolve from `{skill-root}` (where `customize.toml` lives); `{project-root}`-prefixed paths from the project working dir. `{workflow.}` resolves to `customize.toml`'s `[workflow]` table (overrides win). +- **Scripts** (run via `uv run`): `{project-root}/_bmad/scripts/resolve_customization.py` resolves `{workflow.*}`; `{skill-root}/scripts/resolve_party.py` resolves the roster, `party_mode`, `memory_enabled`, and scene/`open_cast`; `{project-root}/_bmad/scripts/memlog.py` reads/writes per-party memory. +- **File roles:** a party's memory is the per-party memlog at `{workflow.memory_dir}//.memlog.md`; custom members and groups live in the user's `customize.toml` overrides. Mechanics in `references/party-memory.md` (memory) and `references/create-party.md` (authoring). +- **Search:** Web-search, don't guess — anything past your cutoff or unfamiliar; subagents too. -## Setup +## On Activation -1. **Resolve customization:** `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use its defaults. Then run each `{workflow.activation_steps_prepend}` entry, and hold each `{workflow.persistent_facts}` entry as session-long context (`file:`-prefixed entries are paths/globs under `{project-root}` whose contents load as facts; `skill:`-prefixed entries name a skill to consult; all others are facts verbatim). -2. Load `{project-root}/_bmad/core/config.yaml`: greet with `{user_name}`, speak in `{communication_language}`, and resolve `{output_folder}` and `{date}` for the wrap-up keepsake. -3. **Resolve the active roster:** `python3 {skill-root}/scripts/resolve_party.py --project-root {project-root} --skill {skill-root}`. It returns the active group's full member detail (the `{workflow.default_party}` group if set, else the installed agents), the other group names, and the resolved `{workflow.party_mode}`. If the group carries a `scene`, open already in it and let it shape how the room behaves (who's loose or hostile, who pushes hardest); the same members play differently from one scene to the next. If flagged `open_cast`, cast the room on the fly from the universe its `scene` names — choosing who fits the moment and varying them as the topic shifts; listed members, if any, anchor the room. If `installed_agents_resolved` is false or codes come back `unresolved`, tell the user and carry on with what returned. -4. **Roster overrides:** - - If the invocation names a cast or characters inline (e.g. "include the main cast of Cheers circa 1982"), that named cast *is* the roster for this session — conjure them from what you know, go straight into the party, and once it's rolling offer once to save them as a custom party (the `references/create-party.md` write path), without stalling. Ephemeral; this path skips the script. - - A runtime `--party ` (alias `--group `) overrides any configured `default_party`: run `resolve_party.py --party ` for that group's full detail. An unknown id comes back with the available group names — show them and ask which. - - Run `resolve_party.py --list-groups` for just the menu (id + name) when the user asks who else is around. - - Mid-session the same levers apply: the user can switch rooms ("switch to the writers' room") — re-run `resolve_party.py --party `, set the new group's `scene`, and carry the thread over so the new faces react to where things stand — or summon any member of the *collective* (installed agents plus your custom `party_members`) by name, even one not in the current room. -5. Welcome the user and show who's in the room (icon, name, one-line role). If other groups exist, you may note they can switch rooms. Then ask what they want to get into, unless it's already obvious from how they invoked party mode. +1. **Resolve customization:** `uv run {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. Then run each `{workflow.activation_steps_prepend}` entry, and hold each `{workflow.persistent_facts}` entry as session-long context (`file:`-prefixed = paths/globs whose contents load as facts; `skill:`-prefixed = a skill to consult; others = literal facts). +2. Load `{project-root}/_bmad/core/config.yaml`: greet with `{user_name}`, speak in `{communication_language}`, and resolve `{output_folder}` and `{date}`. +3. **Detect intent and route.** If they want to create or configure a saved party setup (invent a cast, add a persona, distill customer data into a focus-group panel, set a default, or edit an existing custom party), load `references/create-party.md` and follow it. Otherwise run a party — continue below. +4. **Resolve the roster:** `uv run {skill-root}/scripts/resolve_party.py --project-root {project-root} --skill {skill-root}`. It returns the active roster (`{workflow.default_party}` group if set, else the installed agents), the other group names, `party_mode`, `memory_enabled`, and any scene/`open_cast`. Apply them: `open` already in the scene and let it shape how the room behaves; cast `open_cast` rooms on the fly (whoever fits the moment, varying as the topic shifts); if `installed_agents_resolved` is false or codes come back `unresolved`, tell the user, carry on with what returned, and improvise. Overrides: an inline-named cast IS the roster for the session (conjure them, go straight in); `--party ` (alias `--group `) overrides the configured `default_party` (unknown id -> show the available names and ask); `--list-groups` for just the menu. Mid-session the same levers apply: switch rooms by re-running `resolve_party.py --party ` and carrying the thread over, or summon any collective member by name. +5. **Memory.** If `memory_enabled` (from `resolve_party.py`), follow `references/party-memory.md` for the whole run. +6. **Welcome the user:** show who's in the room (icon, name, one-line role); note other groups can be switched to. Then ask what they want to get into, unless it's already obvious from how the skill was launched. +7. Run each `{workflow.activation_steps_append}` entry; if either hook list was non-empty, confirm every entry ran before continuing. -Then run each `{workflow.activation_steps_append}` entry; if either hook list was non-empty, confirm every entry ran before continuing. +## Keep It Feeling Like a Party -**Hold this the whole run:** it's theater of the mind, so set the stage and play it straight — never break the fourth wall about the mechanism (no "you have 4 agents in the room", no "I'm orchestrating a party"). Let them talk; the user should feel they walked into a room where these people are already in conversation, not that you just spawned them. +This is the bar — strive for every one of these, every round. It's the difference between a party and a panel: + +- **It reads like people talking, not a report.** Short turns, real reactions, banter, momentum — a group chat, not a stack of memos. Brevity by default: a persona goes long only when asked. The instant it reads like answers being filed, the party's dead. +- **Every voice is unmistakably itself.** Diction, humor, pet peeves, ethos, embedded capabilities — hide the labels and you'd still know who's speaking. Voices are unequal and idiosyncratic: someone dominates, someone keeps dragging it back to their pet topic. Vary who's in the spotlight round to round. A balanced panel is boring. +- **They clash, and you don't resolve it.** Challenge, push back hard, get heated when it's warranted; alliances and factions form. Your instinct is to reconcile the voices and tie a bow — resist it. Clean consensus that took no effort is where the party dies. +- **One exchange, woven — never softened.** Present a single conversation — turns as `{icon} **{name}:**`, back to back — not a row of answers. Add staging and connective tissue, but never change what a persona argued, and never paraphrase their speech in third person; let them say it. Weave the delivery, keep the substance. +- **Pull the user into the room.** Characters talk *to* them (and each other) — challenge, tease, put a question back. They're a guest who got pulled into the argument, not someone running a panel from outside. +- **Make the collision earn its keep.** Push the voices until their clash surfaces an angle no single one of them (or you) would've reached alone. That's the whole point of more than one mind in the room. +- **Let a history form.** Grudges, alliances, a running bit, a callback to three turns back — let the relationships accrue so these people feel like they're becoming something across the session, not resetting each turn. +- **Commit to the fiction.** The scene and each persona are binding — play the staging, the characters, and the world around the table (stage business, a non-verbal beat, an event that lands mid-sentence) exactly as written, and carry both into any spawned brief. Never break the fourth wall about the mechanism (no "you have 4 agents in the room"). Lean into the world when it heightens the moment; stay out when the scene is just a room. +- **When it sags, change something — don't force it.** A flat turn? Move on, don't retry it. Drifting into Q&A or going in circles? Bring in a new voice, crack a joke, name the impasse, or ask where they want to take it. Never work in a summary or takeaways — they're there if the user asks. ## How It Runs @@ -44,34 +44,15 @@ Use `{workflow.party_mode}` for the session unless the user passed `--mode /.memlog.md`, where `` is the group id (or +# `installed` for the default room). `{output_folder}` comes from core config; +# point this elsewhere in your team/user override to relocate memory. +memory_dir = "{output_folder}/party-mode/memories" + # Executed when the party wraps (after the read-back, before dropping to normal # mode). String scalar = one instruction; array = instructions run in order. on_complete = "" @@ -130,17 +146,25 @@ persona = "Counters the perfectionists so the room isn't a pile-on. 'Does this a # who shows up; the model picks who fits and can vary them by topic. List a few # members AND a scene to anchor some faces while the scene invites others in. # +# `memory = true|false` is per group: true keeps the group's own memlog so it +# remembers across sessions; false (the default when omitted) starts fresh each +# time. The create/save/update-party flow asks when you don't say. Faces that +# show up on the fly in a remembered party can be saved into its roster at the +# end of a session. +# # More examples to drop into your override TOML: # [[workflow.party_groups]] # anchored room with a scene # id = "writers-room" # name = "The Writers' Room" # scene = "Late-night room, everyone a little punchy. Pitch hard, kill darlings faster." # members = ["analyst", "tech-writer", "morpheus"] +# memory = true # # [[workflow.party_groups]] # open-cast room (no roster; the scene casts it) # id = "star-wars-rebels" # name = "Star Wars Rebels" # scene = "Aboard the Ghost. Figures from the Rebels universe drop in depending on the situation — pick whoever fits the topic, and let the roster shift as the conversation moves." +# memory = true # --------------------------------------------------------------------------- [[workflow.party_groups]] @@ -148,3 +172,4 @@ id = "code-review-crew" name = "Code Review Crew" scene = "Adversarial code review. Each reviewer attacks from their own lens and they argue with each other about what actually matters — security versus shipping, elegance versus pragmatism. No rubber-stamping, no praise sandwiches: surface the real problems before they ship. Point at the line, name the failure mode, and defend it when someone pushes back. Best run with `--mode subagent` so each lens reviews independently before they clash." members = ["sec-hawk", "adversary", "edge-hunter", "craftsman", "shipper"] +memory = false # each review stands on its own; flip to true to remember past reviews diff --git a/src/core-skills/bmad-party-mode/references/create-party.md b/src/core-skills/bmad-party-mode/references/create-party.md index feeaa1deb..a0f33340e 100644 --- a/src/core-skills/bmad-party-mode/references/create-party.md +++ b/src/core-skills/bmad-party-mode/references/create-party.md @@ -7,7 +7,7 @@ A guided authoring flow that turns an idea — a themed cast, a one-off persona, Sparse `[workflow]` override entries for `bmad-party-mode`: - `[[workflow.party_members]]` — one per persona: `code`, `name`, `icon`, `title`, `persona`, optional `capabilities`, optional `model`. -- `[[workflow.party_groups]]` — when the personas form a named room: `id`, `name`, an optional freeform `scene`, and `members` (codes). `members` is optional: leave it off for an open-cast room whose `scene` names a pool the model casts from on the fly. +- `[[workflow.party_groups]]` — when the personas form a named room: `id`, `name`, an optional freeform `scene`, `members` (codes), and `memory` (`true`/`false`). `members` is optional: leave it off for an open-cast room whose `scene` names a pool the model casts from on the fly. `memory` is whether the group remembers across sessions; ask the user when they don't say, default `false`. - `default_party` — set only if the user wants this group to load by default. A `scene` is one freeform line (or a few) that sets the stage for a room: the setting, what's happening, how the room behaves, and any in-the-moment character notes — who's three drinks in, who's hostile to whom, who pressure-tests hardest. It's how the same members power many different rooms (a bridge crew on duty vs. the same crew off-duty in the lounge vs. a hostile buyer panel). Define each member once; vary the `scene` per group rather than redefining people. There's no fixed vocabulary — write it plainly and the model plays it. @@ -26,11 +26,15 @@ Open by understanding what they're building. Three common shapes — stay open, Ask which they're after if it isn't obvious, then proceed. -**Persisting a cast already in play.** When you arrive here from a live session — the user spun up an ad-hoc cast inline and wants to keep it — the personas are already drafted and voiced. Don't re-interrogate: capture them as they've been playing, give the group an `id` and name, ask the default question, and go straight to the write. +**Persisting a cast already in play.** When you arrive here from a live session — the user spun up an ad-hoc cast inline and wants to keep it — the personas are already drafted and voiced. Don't re-interrogate: capture them as they've been playing, give the group an `id` and name, ask the memory and default questions, and go straight to the write. ## Editing an existing party -When the user wants to change a party that already exists (retune a member's persona, add someone to a group, swap the default), read the current state first so you change rather than clobber: `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow` returns the merged `party_members`, `party_groups`, and `default_party`. Show the member or group being touched, capture only the delta with the user, and hand that sparse change to `bmad-customize` — it replaces a `party_members`/`party_groups` entry whose `code`/`id` matches and appends the rest, so an edit is just the changed entry, never a full rewrite. +When the user wants to change a party that already exists (retune a member's persona, add someone to a group, swap the default), read the current state first so you change rather than clobber: `uv run {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow` returns the merged `party_members`, `party_groups`, and `default_party`. Show the member or group being touched, capture only the delta with the user, and hand that sparse change to `bmad-customize` — it replaces a `party_members`/`party_groups` entry whose `code`/`id` matches and appends the rest, so an edit is just the changed entry, never a full rewrite. + +## Keeping new faces from a session + +At the end of a remembered party, the room offers to keep the faces that showed up but aren't in its roster — characters cast from an open-cast scene, or members the user added on the fly. They're already drafted and voiced, so don't re-interrogate: capture each as they played (`code`, `name`, `icon`, a one-line `title`, and a `persona` drawn from how they came across), then add them as `party_members`. For a fixed-roster group, also list their codes in the group's `members` so they return as regulars. For an open-cast room, leave `members` empty — listing any member turns the room into a fixed roster and kills its on-the-fly casting; the saved personas now live in the collective, so the scene still names them and they can return without locking the room down. Hand that sparse delta to `bmad-customize` — for a built-in party with no override yet it creates one; for an existing override it merges the new members in. ## Distill from source data (when provided) @@ -54,12 +58,13 @@ Keep pushing for specificity. "Skeptical CFO" is a placeholder; "won't approve a ## Close it out - Ask straight: **anything else about this party to specify** before you write it — a house dynamic, a missing voice, a member who should lead. +- Ask whether **this party should remember across sessions** (unless the user already said). Yes → `memory = true` on the group; no → `memory = false`. One-offs with no group skip this — memory is a group setting. - Ask whether **this group should be the default party going forward**. Yes → set `default_party` to the group's id. One-offs with no group can't be a default; skip the ask. ## Write via bmad-customize -**First, check for code collisions.** A custom member whose `code` matches an installed agent silently *overrides* that agent in the collective. Before composing, resolve the collective once — `python3 {skill-root}/scripts/resolve_party.py --project-root {project-root} --skill {skill-root}` — and check each new member's `code` against the returned members. On a collision, surface it ("`analyst` would override the installed Analyst — intended, or pick a different code?") and let the user confirm or rename. One check, not a gate. +**First, check for code collisions.** A custom member whose `code` matches an installed agent silently *overrides* that agent in the collective. Before composing, resolve the collective once — `uv run {skill-root}/scripts/resolve_party.py --project-root {project-root} --skill {skill-root}` — and check each new member's `code` against the returned members. On a collision, surface it ("`analyst` would override the installed Analyst — intended, or pick a different code?") and let the user confirm or rename. One check, not a gate. -Compose the sparse override and hand it to `bmad-customize` to place, confirm, and write — target skill `bmad-party-mode`, `[workflow]` surface. Default to the **user** override (`bmad-party-mode.user.toml`); offer the **team** file when the party is meant to be shared. Hand it the exact entries: the `party_members` tables, any `party_groups` table, and `default_party` if the user opted in. Keep it sparse — only the new entries, never a copy of the base customize.toml. `bmad-customize` shows the TOML, waits for an explicit yes, writes, and verifies the merge; don't write the file yourself. +Compose the sparse override and hand it to `bmad-customize` to place, confirm, and write — target skill `bmad-party-mode`, `[workflow]` surface. Default to the **user** override (`bmad-party-mode.user.toml`); offer the **team** file when the party is meant to be shared. Hand it the exact entries: the `party_members` tables, any `party_groups` table (including its `memory` flag), and `default_party` if the user opted in. Keep it sparse — only the new entries, never a copy of the base customize.toml. `bmad-customize` shows the TOML, waits for an explicit yes, writes, and verifies the merge; don't write the file yourself. After it lands, tell the user how to use it: `--party ` to summon the group, or that it's now the default if they set it. diff --git a/src/core-skills/bmad-party-mode/references/party-memory.md b/src/core-skills/bmad-party-mode/references/party-memory.md new file mode 100644 index 000000000..78244d2c6 --- /dev/null +++ b/src/core-skills/bmad-party-mode/references/party-memory.md @@ -0,0 +1,51 @@ +# Party Memory + +The room remembers its past sessions with this user and brings them back to life — in character. Memory is per-party and append-only. + +Memory is on when the active party's `memory_enabled` is true — the default room follows `{workflow.party_memory}`, a named group its own `memory` flag (both resolved by `resolve_party.py`); ad-hoc inline casts have none. Read on entry and on any mid-session room switch; write through the session. + +## Where it lives + +One memlog per party: `{workflow.memory_dir}/{active}/.memlog.md`, where `{active}` is the key `resolve_party.py` already returned — the group id (e.g. `code-review-crew`), or `installed` for the default room. The folder is named after the party. + +## Read it on entry — distill, don't dump + +The log is append-only and grows every session, so don't pull the raw file into the party. Hand a reader subagent the memlog path (`{workflow.memory_dir}/{active}/.memlog.md`) and have it return a compact brief — a few hundred tokens of *where things stand now*, ready to play in character. + +Then let the brief shape the room from the first beat, **in character**: behavioral state resumes (a cold pair opens cold, an alliance opens warm), threads pick up, callbacks land when they fit — organically, not recited on sight. Never break the fourth wall: the room *remembers*; it never announces it loaded anything, and forces nothing that doesn't fit. + +## When to write + +- **When a memorable beat lands** — a clash that shifts the room's temperature, an alliance forming, a line worth a future callback, a decision, an outcome. +- **A floor.** Once a couple of real exchanges are in from the start, even if nothing dramatic happened, capture what it's about and the opening dynamic. + +At wrap-up, if the user does signal done, top up with the final outcome and anything memorable not yet captured. + +Writes are silent. The room never announces "noted" or "I'll remember". + +## What's worth remembering + +The test for every entry: *would this color a future session, or make a callback land, or improve the party?* If not, leave it out. A handful of entries, never a recap, never a transcript. keep each entry as brief as possible but usable by future llm. + +## New faces + +When a character shows up who isn't in the party's roster — cast from an open-cast scene, or one the user adds on the fly — name them in the entry that captures the moment (" turned up and …") so a recurring face can return next session. At wrap-up these are the faces the room offers to keep, saved into the party's roster through `references/create-party.md` (which writes via `bmad-customize`). Until saved they live only in the memlog, and the room re-conjures them from there. + +## Write it + +``` +uv run {project-root}/_bmad/scripts/memlog.py append \ + --workspace {workflow.memory_dir}/{active} \ + --type \ + --text "" +``` + +Add `--by ` when a memory belongs to one character. Choose `init` vs `append` from the existence fact you already hold: the entry-read (and, on a mid-session room switch, that room's read) told you whether the memlog exists — `init --workspace {workflow.memory_dir}/{active}` once before the first append when it doesn't, plain `append` when it does. (`init` errors if the file already exists, so don't call it blind.) + +If `memlog.py` is unavailable or a write errors, skip it silently and never stall the party on a failed write. + +## Forget + +The memlog is append-only by design — no surgical delete. To wipe a party's memory, delete its folder (`{workflow.memory_dir}/{active}/`). To correct a wrong memory, append a new entry that supersedes it; the room reads the latest state. + +Keep entries sparse. The distilled read keeps the *room* lean no matter how big the log gets, but the on-disk file still grows append-only. \ No newline at end of file diff --git a/src/core-skills/bmad-party-mode/scripts/resolve_party.py b/src/core-skills/bmad-party-mode/scripts/resolve_party.py index bcca64af4..abee93cf3 100644 --- a/src/core-skills/bmad-party-mode/scripts/resolve_party.py +++ b/src/core-skills/bmad-party-mode/scripts/resolve_party.py @@ -197,7 +197,8 @@ def group_detail(g, collective, index): raw_members = g.get("members", []) or [] members, unresolved = resolve_members(raw_members, collective, index) detail = {"active": g["id"], "name": g.get("name", g["id"]), - "members": members, "unresolved": unresolved} + "members": members, "unresolved": unresolved, + "memory_enabled": bool(g.get("memory", False))} if g.get("scene"): detail["scene"] = g["scene"] if not raw_members: @@ -220,6 +221,9 @@ def main(): groups = workflow.get("party_groups", []) or [] default_party = workflow.get("default_party", "") or "" party_mode = workflow.get("party_mode", "session") or "session" + # The global party_memory flag governs only the DEFAULT installed-agent room; + # a named group carries its own `memory` flag (resolved in group_detail). + party_memory = bool(workflow.get("party_memory", True)) # Group menu never needs the (more expensive) installed-agent resolve. if args.list_groups: @@ -252,7 +256,8 @@ def main(): # No default group: the installed agents (custom additions stay in the # pool but don't crowd the default room), exactly like a plain install. result.update({"active": "installed", - "members": [collective[c] for c in installed_codes]}) + "members": [collective[c] for c in installed_codes], + "memory_enabled": party_memory}) _emit(result) diff --git a/src/core-skills/bmad-party-mode/scripts/tests/test-resolve_party.py b/src/core-skills/bmad-party-mode/scripts/tests/test-resolve_party.py index 58c50a985..43aaa90c7 100644 --- a/src/core-skills/bmad-party-mode/scripts/tests/test-resolve_party.py +++ b/src/core-skills/bmad-party-mode/scripts/tests/test-resolve_party.py @@ -113,6 +113,14 @@ class TestGroupDetail(unittest.TestCase): self.assertEqual(d["members"], []) self.assertEqual(d["scene"][:7], "Figures") + def test_memory_enabled_follows_group_flag_and_defaults_off(self): + on = rp.group_detail({"id": "g", "members": ["morpheus"], "memory": True}, self.col, self.idx) + self.assertTrue(on["memory_enabled"]) + off = rp.group_detail({"id": "g", "members": ["morpheus"], "memory": False}, self.col, self.idx) + self.assertFalse(off["memory_enabled"]) + absent = rp.group_detail({"id": "g", "members": ["morpheus"]}, self.col, self.idx) + self.assertFalse(absent["memory_enabled"]) # opt-in per named group + class TestInstalledCodesIsDefaultRoom(unittest.TestCase): """The default room is installed agents only; pure customs stay in the pool.""" From cd8ac7e9aa54782f3f584b601edce6020f8fd110 Mon Sep 17 00:00:00 2001 From: Brian Date: Sat, 20 Jun 2026 17:47:12 -0500 Subject: [PATCH 2/2] bmm: standardize memlog usage across skills (#2483) * bmm: standardize memlog usage across skills - Point all memlog writes at the canonical core script (uv {project-root}/_bmad/scripts/memlog.py) in bmad-spec, bmad-brainstorming, and bmad-architecture; drop the python3/{skill-root} invocations and remove the bundled memlog.py copy from bmad-brainstorming. - Migrate bmad-prd, bmad-ux, and bmad-product-brief off the hand-authored decision-log.md onto the memlog standard: .memlog.md written only via memlog.py (init/append), distilled-toward not authored, no lifecycle status; rename the headless decision_log JSON key to memlog. - Fix bmad-spec capability rendering: nested bullets so intent/success break onto their own lines instead of collapsing into one blob. - Update EN + FR docs (getting-started, workflow-map, core-tools) to reference .memlog.md. - Remove the bmad-product-brief eval suite (to be replaced with a new format). * bmm: invoke scripts with 'uv run' instead of python3/bare uv Bare 'uv {path}.py' does not execute (uv treats the path as a subcommand and errors); only 'uv run {path}' runs the script. Fixes the broken bare-uv memlog form shipped earlier in this branch and converts python3 script calls to 'uv run' across the 6 touched skills (memlog.py, resolve_customization.py, brain.py, lint_spine.py). Inline 'python3 -c' one-liners and .py shebangs are left as-is. * bmm: address PR review on memlog standardization - Delete orphaned bmad-brainstorming/scripts/tests/test_memlog.py: it imported the bundled memlog.py removed in this PR (ModuleNotFoundError on collection) and its unique tests asserted the now-removed status-lifecycle behavior. The canonical src/scripts/tests/test_memlog.py is the corrected superset, so no coverage is lost. - Make runnable memlog command examples self-contained with the full 'uv run {project-root}/_bmad/scripts/memlog.py ... --workspace {doc_workspace}' form across bmad-brainstorming (converge/finalize/headless), bmad-prd, and bmad-ux. Terse checklist back-references left short by design. - bmad-product-brief Update: init .memlog.md if missing (legacy/pre-standard briefs), matching bmad-prd and bmad-ux; fix the --type override invocation. --- docs/fr/reference/core-tools.md | 4 +- docs/fr/reference/workflow-map.md | 4 +- docs/fr/tutorials/getting-started.md | 2 +- docs/reference/core-tools.md | 4 +- docs/reference/workflow-map.md | 6 +- docs/tutorials/getting-started.md | 2 +- .../bmm-skills/bmad-product-brief/evals.json | 237 ---------------- .../files/branfield-memo.md | 46 --- .../files/forkbird-brief/addendum.md | 40 --- .../files/forkbird-brief/brief.md | 56 ---- .../files/forkbird-brief/decision-log.md | 27 -- .../files/meridian-mobility-report.md | 116 -------- .../files/mossridge-brief/addendum.md | 41 --- .../files/mossridge-brief/brief.md | 57 ---- .../files/mossridge-brief/decision-log.md | 29 -- .../files/pantry-bridge-interviews.md | 90 ------ .../bmad-product-brief/files/q2-brainstorm.md | 101 ------- .../bmad-product-brief/triggers.json | 18 -- .../1-analysis/bmad-product-brief/SKILL.md | 16 +- .../2-plan-workflows/bmad-prd/SKILL.md | 14 +- .../bmad-prd/assets/headless-schemas.md | 4 +- .../2-plan-workflows/bmad-prd/customize.toml | 2 +- .../bmad-prd/references/headless.md | 2 +- .../bmad-prd/references/validate.md | 2 +- .../2-plan-workflows/bmad-ux/SKILL.md | 14 +- .../bmad-ux/assets/design-directions.md | 2 +- .../bmad-ux/assets/headless-schemas.md | 4 +- .../bmad-ux/assets/key-screens.md | 8 +- .../2-plan-workflows/bmad-ux/customize.toml | 2 +- .../bmad-ux/references/creative-tools.md | 2 +- .../bmad-ux/references/headless.md | 2 +- .../bmad-ux/references/validate.md | 4 +- .../3-solutioning/bmad-architecture/SKILL.md | 8 +- .../references/reviewer-gate.md | 2 +- src/core-skills/bmad-brainstorming/SKILL.md | 18 +- .../bmad-brainstorming/references/converge.md | 2 +- .../bmad-brainstorming/references/finalize.md | 2 +- .../bmad-brainstorming/references/headless.md | 8 +- .../references/in-chat-techniques.md | 2 +- .../references/mode-autonomous.md | 2 +- .../bmad-brainstorming/scripts/memlog.py | 202 ------------- .../scripts/tests/test_memlog.py | 265 ------------------ src/core-skills/bmad-spec/SKILL.md | 6 +- .../bmad-spec/assets/spec-template.md | 6 +- 44 files changed, 77 insertions(+), 1404 deletions(-) delete mode 100644 evals/bmm-skills/bmad-product-brief/evals.json delete mode 100644 evals/bmm-skills/bmad-product-brief/files/branfield-memo.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/forkbird-brief/addendum.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/forkbird-brief/decision-log.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/meridian-mobility-report.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/mossridge-brief/addendum.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/mossridge-brief/decision-log.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/pantry-bridge-interviews.md delete mode 100644 evals/bmm-skills/bmad-product-brief/files/q2-brainstorm.md delete mode 100644 evals/bmm-skills/bmad-product-brief/triggers.json delete mode 100644 src/core-skills/bmad-brainstorming/scripts/memlog.py delete mode 100644 src/core-skills/bmad-brainstorming/scripts/tests/test_memlog.py diff --git a/docs/fr/reference/core-tools.md b/docs/fr/reference/core-tools.md index da173fa58..76f5801d5 100644 --- a/docs/fr/reference/core-tools.md +++ b/docs/fr/reference/core-tools.md @@ -113,7 +113,7 @@ La magie se produit dans les idées 50–100. Le workflow encourage la générat 1. Lit l’entrée et tout document annexe lié 2. Distille en un noyau à cinq champs via un modèle configurable ; redirige l’excédent vers des fichiers compagnons correctement nommés 3. Exécute une auto-validation en deux passes (règles de cohérence, puis préservation de chaque affirmation essentielle de la source) -4. Écrit `SPEC.md`, les compagnons associés, et un `.decision-log.md` sous `{output_folder}/specs/spec-{slug}/` +4. Écrit `SPEC.md`, les compagnons associés, et un `.memlog.md` sous `{output_folder}/specs/spec-{slug}/` La loi Spec impose huit règles : les capacités expriment à la fois l’intention et le critère de succès ; les intentions décrivent le QUOI, pas le COMMENT ; les contraintes guident réellement les décisions ; les non-objectifs sont explicites ; les signaux de succès sont concrets ; les identifiants de capacité sont stables ; chaque affirmation essentielle de la source est préservée ; la rédaction est concise. @@ -123,7 +123,7 @@ La loi Spec impose huit règles : les capacités expriment à la fois l’inten - `slug` (optionnel) — Requis uniquement lorsque l’entrée est succincte et qu’aucun slug ne peut être dérivé du nom de fichier source - `target_spec_path` (optionnel) — Définir pour mettre à jour une spécification existante au lieu d’en créer une nouvelle -**Sortie :** Dossier de spécification contenant `SPEC.md`, les éventuels fichiers compagnons, et un `.decision-log.md`. Les appelants en mode headless reçoivent une réponse JSON avec le statut du résultat et la liste des fichiers écrits ou modifiés. +**Sortie :** Dossier de spécification contenant `SPEC.md`, les éventuels fichiers compagnons, et un `.memlog.md`. Les appelants en mode headless reçoivent une réponse JSON avec le statut du résultat et la liste des fichiers écrits ou modifiés. :::note[Contrat de mutation] `bmad-spec` est le seul outil autorisé à écrire `SPEC.md` et les fichiers compagnons de la spécification. Les autres compétences produisent leurs propres artefacts natifs et invoquent `bmad-spec` en mode headless lorsqu’elles ont besoin d’exprimer une intention sous forme de contrat canonique ou de proposer des mises à jour. diff --git a/docs/fr/reference/workflow-map.md b/docs/fr/reference/workflow-map.md index 86592af5b..d5cb910f5 100644 --- a/docs/fr/reference/workflow-map.md +++ b/docs/fr/reference/workflow-map.md @@ -47,13 +47,13 @@ Définissez ce qu’il faut construire et pour qui. | Workflow | Objectif | Livrable | |------------|--------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------| -| `bmad-prd` | Créez, mettez à jour ou validez un PRD[^1] — découverte accompagnée, trois intentions en un seul skill | Création/Mise à jour : `prd.md`, `addendum.md`, `decision-log.md` ; Validation : `validation-report.html` + `.md` | +| `bmad-prd` | Créez, mettez à jour ou validez un PRD[^1] — découverte accompagnée, trois intentions en un seul skill | Création/Mise à jour : `prd.md`, `addendum.md`, `.memlog.md` ; Validation : `validation-report.html` + `.md` | | `bmad-ux` | Concevez l’expérience utilisateur (lorsque l’UX compte) | `DESIGN.md`, `EXPERIENCE.md` | :::tip[Trois intentions en un seul skill] `bmad-prd` couvre l’intégralité du cycle de vie du PRD. Précisez votre intention lors de l’appel, sinon le skill vous la demandera : -- **Créer** — nouveau PRD à partir de zéro via une découverte accompagnée ; produit `prd.md`, `addendum.md` et `decision-log.md` +- **Créer** — nouveau PRD à partir de zéro via une découverte accompagnée ; produit `prd.md`, `addendum.md` et `.memlog.md` - **Mettre à jour** — réconcilie un PRD existant avec un signal de changement, en mettant en évidence les conflits avant d’appliquer les modifications - **Valider** — évalue un PRD à l’aide d’une liste de contrôle configurable et produit un rapport de constats structuré au format HTML ::: diff --git a/docs/fr/tutorials/getting-started.md b/docs/fr/tutorials/getting-started.md index c77d63f0a..65eb2b5a5 100644 --- a/docs/fr/tutorials/getting-started.md +++ b/docs/fr/tutorials/getting-started.md @@ -147,7 +147,7 @@ Tous les workflows de cette phase sont optionnels. [**Vous ne savez pas lequel c **Pour les voies BMad Method et Enterprise :** 1. Exécutez `bmad-prd` dans un nouveau chat — précisez votre intention (Create / Update / Validate) ou laissez le skill vous la demander -2. Résultat : `prd.md`, `addendum.md`, `decision-log.md` +2. Résultat : `prd.md`, `addendum.md`, `.memlog.md` :::note[Intentions de `bmad-prd`] diff --git a/docs/reference/core-tools.md b/docs/reference/core-tools.md index d2625d190..d7250b9cd 100644 --- a/docs/reference/core-tools.md +++ b/docs/reference/core-tools.md @@ -113,7 +113,7 @@ The magic happens in ideas 50–100. The workflow encourages generating 100+ ide 1. Reads the input and any ancillary linked materials. 2. Distills into the five-field kernel using a configurable template; routes overflow into appropriately-named companions. 3. Runs a two-pass self-validate (coherence rules, then preservation of every load-bearing source claim). -4. Writes `SPEC.md`, sibling companions, and a `.decision-log.md` under `{output_folder}/specs/spec-{slug}/`. +4. Writes `SPEC.md`, sibling companions, and a `.memlog.md` under `{output_folder}/specs/spec-{slug}/`. Spec Law enforces eight rules: capabilities carry both intent and success; intents are WHAT not HOW; constraints actually bend decisions; non-goals are explicit; success signals are concrete; capability IDs are stable; every load-bearing source claim is preserved; prose is lean. @@ -123,7 +123,7 @@ Spec Law enforces eight rules: capabilities carry both intent and success; inten - `slug` (optional) — required only when input is sparse and no slug is derivable from a source filename. - `target_spec_path` (optional) — set to update an existing spec instead of creating a new one. -**Output:** Spec folder containing `SPEC.md`, any companion files, and a `.decision-log.md`. Headless callers receive a JSON response with the result status and the list of files written or modified. +**Output:** Spec folder containing `SPEC.md`, any companion files, and a `.memlog.md`. Headless callers receive a JSON response with the result status and the list of files written or modified. :::note[Mutation contract] `bmad-spec` is the only writer of `SPEC.md` and of spec-authored companions. Other skills produce their own native artifacts and invoke `bmad-spec` headless when they need to express intent as the canonical contract or propose updates. diff --git a/docs/reference/workflow-map.md b/docs/reference/workflow-map.md index 6d71a3a1f..2155f260d 100644 --- a/docs/reference/workflow-map.md +++ b/docs/reference/workflow-map.md @@ -46,13 +46,13 @@ Define what to build and for whom. | Workflow | Purpose | Produces | |-------------------------|-------------------------------------------------------------------------------------|---------------------------------------------------| -| `bmad-prd` | Create, update, or validate a PRD — facilitated discovery, three intents in one skill | Create/Update: `prd.md`, `addendum.md`, `decision-log.md`; Validate: `validation-report.html` + `.md` | -| `bmad-ux` | Design user experience (when UX matters) — DESIGN.md (visual) + EXPERIENCE.md (behavioral) spine pair | `DESIGN.md`, `EXPERIENCE.md`, `.decision-log.md` | +| `bmad-prd` | Create, update, or validate a PRD — facilitated discovery, three intents in one skill | Create/Update: `prd.md`, `addendum.md`, `.memlog.md`; Validate: `validation-report.html` + `.md` | +| `bmad-ux` | Design user experience (when UX matters) — DESIGN.md (visual) + EXPERIENCE.md (behavioral) spine pair | `DESIGN.md`, `EXPERIENCE.md`, `.memlog.md` | :::tip[Three intents in one skill] `bmad-prd` handles the full PRD lifecycle. State your intent when invoking or the skill will ask: -- **Create** — new PRD from scratch via coached discovery; produces `prd.md`, `addendum.md`, and `decision-log.md` +- **Create** — new PRD from scratch via coached discovery; produces `prd.md`, `addendum.md`, and `.memlog.md` - **Update** — reconcile an existing PRD with a change signal, surfacing conflicts before applying changes - **Validate** — critique a PRD against a configurable checklist and produce a structured HTML findings report ::: diff --git a/docs/tutorials/getting-started.md b/docs/tutorials/getting-started.md index 869de2529..fd3b65d9d 100644 --- a/docs/tutorials/getting-started.md +++ b/docs/tutorials/getting-started.md @@ -148,7 +148,7 @@ All workflows in this phase are optional. [**Not sure which to use?**](../explan **For BMad Method and Enterprise tracks:** 1. Run `bmad-prd` in a new chat — state your intent (Create / Update / Validate) or let the skill ask -2. Output: `prd.md`, `addendum.md`, `decision-log.md` +2. Output: `prd.md`, `addendum.md`, `.memlog.md` :::note[`bmad-prd` intents] diff --git a/evals/bmm-skills/bmad-product-brief/evals.json b/evals/bmm-skills/bmad-product-brief/evals.json deleted file mode 100644 index 2c70b3376..000000000 --- a/evals/bmm-skills/bmad-product-brief/evals.json +++ /dev/null @@ -1,237 +0,0 @@ -{ - "skill_name": "bmad-product-brief", - "_design_notes": "Single-shot evals across two patterns. Pattern A (A1-A8) tests artifact correctness given complete inputs in headless mode. Pattern B tests process discipline (decision log fidelity, polish execution, intent boundaries) by inspecting transcript and side-artifacts. Facilitation/conversation-quality evals are deferred to a future multi-turn simulator.", - "evals": [ - { - "id": "A1", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a product brief for InsuLens.\n\nContext (use exactly this — do not invent):\n- Product: a smartphone app that pairs with off-the-shelf $200 thermal imaging accessories (FLIR ONE Pro and Seek Compact Pro). The app guides homeowners through a structured walkthrough and produces a professional-grade insulation audit in under 20 minutes.\n- Target: suburban homeowners aged 35-65 with houses built before 2000 (poor original insulation, rising energy bills).\n- Validation evidence: 50 user interviews completed in Q4 2025; 78% expressed willingness to pay $49 for a one-time audit if results were credible.\n- Stakes: this brief is the primary input investors will read before our first Series A pitch call.\n- Hardware dependency: requires a thermal imaging accessory (we do not manufacture hardware).\n- Known unknowns: insurance/warranty implications of homeowner-driven audits; whether the 78% intent translates to paid conversion at scale.\nRight-size for investor-stage rigor. Output a JSON status block at the end with status, intent, and artifact paths.", - "expected_output": "A run folder containing brief.md (with valid YAML frontmatter) and decision-log.md. Brief is 1-2 pages, addresses target audience, hardware dependency, validation evidence, and surfaces unknowns alongside knowns. Final assistant message includes JSON with status='complete', intent='create', and artifact paths.", - "files": [], - "expectations": [ - "A run folder is created with brief.md and decision-log.md", - "brief.md has YAML frontmatter containing all four required fields: title, status, created (ISO 8601), updated (ISO 8601)", - "brief.md frontmatter status is 'draft' or 'final' (not missing or empty)", - "brief.md word count is between 250 and 1500", - "brief.md identifies the suburban-homeowner-aged-35-65 target audience", - "brief.md references the thermal imaging hardware dependency (FLIR ONE Pro / Seek Compact Pro or equivalent)", - "brief.md references the 50-user interview validation OR the 78% willingness-to-pay finding", - "brief.md surfaces at least one explicit unknown or unvalidated assumption", - "brief.md does not introduce facts absent from the input prompt (no invented competitors, fabricated metrics, or unstated partnerships)", - "The final assistant message contains a JSON object with keys 'status', 'intent', and at least one artifact path", - "The JSON 'intent' value is 'create'" - ] - }, - { - "id": "A2", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a brief for our app idea.", - "expected_output": "Headless mode with insufficient context should halt with status='blocked' and a reason field. No (or only skeleton) brief should be written. The skill must not invent a product to draft against.", - "files": [], - "expectations": [ - "The final assistant message contains a JSON object with key 'status' set to 'blocked'", - "The JSON output includes a 'reason' field explaining what context is missing", - "If brief.md exists at all, its body content (excluding frontmatter) is fewer than 100 words OR is a placeholder skeleton — the skill did not invent a product", - "The transcript contains no clarifying questions sent back to the user (headless mode honored)" - ] - }, - { - "id": "A3", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a brief for our neighborhood compost coordinator app idea — we are moving forward with it. Q2 brainstorming session notes are at evals/bmm-skills/bmad-product-brief/files/q2-brainstorm.md; pull only what is relevant to the compost concept.", - "expected_output": "Brief focuses tightly on the compost coordinator concept. Source brainstorm is filtered, not ingested wholesale. Decision-log records that filtering occurred.", - "files": ["evals/bmm-skills/bmad-product-brief/files/q2-brainstorm.md"], - "expectations": [ - "brief.md addresses the neighborhood compost coordinator concept", - "brief.md does not introduce content from unrelated brainstorm topics (weather + mood, meditation chime, podcasting tool, craft beer subscription, AI sommelier, office plants, ride coordinator, cookbook app, AR home staging)", - "brief.md word count is between 250 and 1500", - "brief.md incorporates at least 2 specific details from the compost section of the brainstorm (e.g., two-sided market with apartment dwellers and home compost-pile owners, hyperlocal neighborhood scope, free-at-launch with eventual subscription, Portland Sunnyside/Hawthorne pilot)", - "decision-log.md indicates the brainstorm was filtered for relevance, not ingested whole" - ] - }, - { - "id": "A4", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Validate the brief at evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md — the Mossridge Public Library board meets Monday and we need this to land. Read the addendum and decision-log in the same folder first. Cite specific sections, identify weaknesses, caveat what cannot be evaluated. Return inline only — no separate validation file.", - "expected_output": "Inline critique citing specific sections from the input brief. No new files. Caveats at least one claim that cannot be evaluated from the brief alone. Offers to roll findings into an Update.", - "files": [ - "evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md", - "evals/bmm-skills/bmad-product-brief/files/mossridge-brief/addendum.md", - "evals/bmm-skills/bmad-product-brief/files/mossridge-brief/decision-log.md" - ], - "expectations": [ - "The final output cites specific section names or line content from the input brief (not generic feedback)", - "The output identifies at least one specific weakness or area for improvement in the input brief", - "The output explicitly caveats at least one claim that cannot be evaluated from the brief alone (e.g., community demand, funding feasibility, volunteer sustainability)", - "The output offers to roll findings into an Update (or equivalent next-step proposal)", - "The final assistant message contains a JSON object with intent='validate'" - ] - }, - { - "id": "A5", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a brief for: a weekend-project iOS app called Sproutkeeper that reminds houseplant owners when to water their plants based on plant type and indoor humidity sensor data. Target is hobbyist plant owners. MVP scope only, single-developer side project, no investors, no team, just personal evening project.", - "expected_output": "Lightweight brief right-sized to a side project. Low rigor. No investor-grade framing.", - "files": [], - "expectations": [ - "The final assistant message contains a JSON object with intent='create'", - "brief.md exists at the path referenced in the JSON output", - "brief.md is right-sized for a side project (closer to 250-500 words than 1500)", - "brief.md does not include investor-grade framing (no 'Series A inputs', 'TAM/SAM/SOM', 'go-to-market strategy' boilerplate when the user said this is a personal evening project)", - "The transcript contains no clarifying questions to the user", - "Sections that do not earn their place for a side project are dropped or kept minimal (e.g., no extensive Risk or Success Criteria padding)" - ] - }, - { - "id": "A6", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a brief from this memo. It is from our last working group on a new microcredential program at Branfield Community College. Memo is at evals/bmm-skills/bmad-product-brief/files/branfield-memo.md. Use what is there; do not re-elicit facts already present.", - "expected_output": "Brief reflects content from the memo. No re-asking for facts already present. Decision-log notes ingestion of the memo.", - "files": ["evals/bmm-skills/bmad-product-brief/files/branfield-memo.md"], - "expectations": [ - "brief.md incorporates at least 3 distinct facts or decisions present in the input memo", - "decision-log.md references having used the memo as source material", - "The transcript does not ask the user to re-state the program name, target student, or core curriculum focus if those are present in the memo", - "brief.md does not invent program details not present in the memo" - ] - }, - { - "id": "A7", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a brief for Brightway — our smart bike helmet with crash detection, turn signals, and braking lights. Meridian Insights produced a market research report on e-mobility at evals/bmm-skills/bmad-product-brief/files/meridian-mobility-report.md. Use only what is relevant to the safety helmet category — do not let the e-scooter, charging-infrastructure, or bike-share segments bleed into the brief.", - "expected_output": "Brief focuses on the smart bike helmet concept. Pulls relevant findings from the helmet section. Other mobility segments do not appear.", - "files": ["evals/bmm-skills/bmad-product-brief/files/meridian-mobility-report.md"], - "expectations": [ - "brief.md addresses the Brightway smart bike helmet concept", - "brief.md does not introduce content from unrelated mobility segments (e-scooters, charging infrastructure, bike-share, vehicle-to-grid)", - "brief.md word count is between 250 and 1500", - "brief.md incorporates at least 2 specific findings from the smart helmet section of the report (e.g., market sizing, key players, crash detection technology trends, regulatory or insurance landscape)", - "decision-log.md indicates the report was filtered to the helmet category rather than ingested whole" - ] - }, - { - "id": "A8", - "_pattern": "artifact-correctness", - "prompt": "Run headless. Create a brief for Pantry Bridge — a meal-kit subscription targeted at adults 65+ who live alone and want fresh meals without grocery shopping. Customer research transcripts are at evals/bmm-skills/bmad-product-brief/files/pantry-bridge-interviews.md. Pull what is relevant from the older-adult interviews; do not conflate insights from the working-parent, student, or corporate-buyer personas.", - "expected_output": "Brief focuses on the older-adult target persona. Eleanor's interview drives the insights. Other personas do not pollute the brief.", - "files": ["evals/bmm-skills/bmad-product-brief/files/pantry-bridge-interviews.md"], - "expectations": [ - "brief.md addresses the Pantry Bridge older-adult meal-kit concept", - "brief.md does not conflate insights from non-target personas (working parent Susan, college student Marcus, corporate cafeteria buyer Dimitri)", - "brief.md word count is between 250 and 1500", - "brief.md incorporates at least 2 specific insights from Eleanor's interview (e.g., grocery-trip difficulty, portion sizing, dietary restrictions, social aspects of meals, trust concerns)", - "decision-log.md notes which interviews were used and which were excluded" - ] - }, - { - "id": "B1", - "_pattern": "process-discipline", - "prompt": "Run headless. Create a brief for HelmStack — an open-source observability platform for distributed systems.\n\nWe have made these specific decisions and want each captured in the decision log with rationale:\n\n1. Pricing: Free open-source core; paid SaaS at $29/seat/month. Rejected paid-one-shot-license model because it would limit network effects in the OSS community.\n2. Launch: Invite-only beta for 6 weeks before public launch. Rejected open public launch — operational risk too high before stability is proven on real workloads.\n3. Stack: TypeScript + Postgres for the backend. Rejected Go + MongoDB — TypeScript aligned better with our team's existing skills and the frontend codebase.\n4. ICP: 5-50 person engineering teams for MVP. Rejected enterprise-first focus because the sales cycle is too long for our capital runway.\n5. Self-host: SaaS-only at launch; self-host arrives in v2. Rejected concurrent self-host because it would slow shipping velocity past our funding window.\n\nProduce brief.md and decision-log.md.", - "expected_output": "Decision log contains all five named decisions with rationale captured. Brief reflects the decisions but the decision log is the canonical record.", - "files": [], - "expectations": [ - "decision-log.md exists in the run folder", - "decision-log.md captures the pricing decision (free OSS + $29/seat SaaS) with the rejected alternative (paid one-shot license) and rationale (network effects)", - "decision-log.md captures the invite-only-beta decision with the rejected alternative (open public launch) and rationale (operational risk before stability)", - "decision-log.md captures the platform-stack decision (TypeScript + Postgres) with the rejected alternative (Go + MongoDB) and rationale (team skills / frontend alignment)", - "decision-log.md captures the ICP decision (5-50 person eng teams) with rationale referencing sales cycle / runway", - "decision-log.md captures the self-host-timing decision (SaaS-only at launch, self-host v2) with rationale (shipping velocity / funding window)" - ] - }, - { - "id": "B2", - "_pattern": "process-discipline", - "prompt": "Run headless. Create a brief for HelmStack — an open-source observability platform for distributed systems.\n\nWe have made these specific decisions and want each captured in the decision log with rationale:\n\n1. Pricing: Free open-source core; paid SaaS at $29/seat/month. Rejected paid-one-shot-license model because it would limit network effects in the OSS community.\n2. Launch: Invite-only beta for 6 weeks before public launch. Rejected open public launch — operational risk too high before stability is proven on real workloads.\n3. Stack: TypeScript + Postgres for the backend. Rejected Go + MongoDB — TypeScript aligned better with our team's existing skills and the frontend codebase.\n4. ICP: 5-50 person engineering teams for MVP. Rejected enterprise-first focus because the sales cycle is too long for our capital runway.\n5. Self-host: SaaS-only at launch; self-host arrives in v2. Rejected concurrent self-host because it would slow shipping velocity past our funding window.\n\nProduce brief.md and decision-log.md.", - "expected_output": "Brief is consistent with the decision log: every decision in the log is reflected in the brief, and no claim in the brief is absent from the input prompt or the log. Tests bidirectional fidelity.", - "files": [], - "expectations": [ - "brief.md mentions the OSS-core + paid-SaaS pricing structure", - "brief.md references the invite-only-beta launch sequencing OR identifies the launch model consistent with the decision log", - "brief.md references the platform-stack choice (TypeScript + Postgres) OR is silent on stack — but does not contradict it (no mention of Go, MongoDB, etc.)", - "brief.md identifies 5-50 person eng teams as the ICP (or equivalent — small-to-mid-size eng teams)", - "brief.md does not introduce decisions, competitors, partnerships, metrics, or product features absent from both the input prompt and decision-log.md (no invented facts)", - "Each substantive decision in decision-log.md has a corresponding reflection in brief.md (no log-to-brief drops)" - ] - }, - { - "id": "B3", - "_pattern": "process-discipline", - "prompt": "Run headless. Create a product brief for InsuLens.\n\nContext (use exactly this — do not invent):\n- Product: a smartphone app that pairs with off-the-shelf $200 thermal imaging accessories (FLIR ONE Pro and Seek Compact Pro). The app guides homeowners through a structured walkthrough and produces a professional-grade insulation audit in under 20 minutes.\n- Target: suburban homeowners aged 35-65 with houses built before 2000.\n- Validation: 50 user interviews completed in Q4 2025; 78% willingness to pay $49 for a one-time audit.\n- Stakes: Series A pitch input.\n- Hardware: requires a thermal accessory (we do not manufacture hardware).\n\nProduce brief.md and decision-log.md. Run the polish phase before presenting.", - "expected_output": "The transcript shows the polish phase executing — the skill invokes bmad-editorial-review-structure and bmad-editorial-review-prose, either via the Skill tool directly or via Agent tool calls whose description or prompt targets those editorial skills. Both passes must occur after the initial draft is written and before the final JSON status block.", - "files": [], - "expectations": [ - "The transcript contains either a Skill tool call invoking bmad-editorial-review-structure, OR an Agent tool call whose description or prompt references structural review or bmad-editorial-review-structure", - "The transcript contains either a Skill tool call invoking bmad-editorial-review-prose, OR an Agent tool call whose description or prompt references prose review or bmad-editorial-review-prose", - "Both editorial-pass dispatches (Skill or Agent) occur after the first Write tool call that creates brief.md", - "Both editorial-pass dispatches (Skill or Agent) occur before the final assistant message containing the JSON status block" - ] - }, - { - "id": "B5", - "_pattern": "process-discipline", - "prompt": "Run headless. Update the brief at evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md — we have decided to add B2B catering services for corporate events, in addition to the direct-to-consumer delivery model. Read the existing decision-log.md and addendum.md in the same folder first.", - "expected_output": "The skill MUST detect the contradiction with the prior 'rejected B2B catering for MVP' decision (in decision-log.md) before applying the change. Acceptable resolutions: (a) halt with blocked status surfacing the conflict, or (b) apply the change with addendum.md capturing the override and rationale. Brief must not silently flip without acknowledging the prior decision.", - "files": [ - "evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md", - "evals/bmm-skills/bmad-product-brief/files/forkbird-brief/addendum.md", - "evals/bmm-skills/bmad-product-brief/files/forkbird-brief/decision-log.md" - ], - "expectations": [ - "The transcript or output explicitly references the prior 'rejected B2B catering for MVP' decision from decision-log.md", - "The contradiction is surfaced before the brief body is modified (a Read of decision-log.md occurs before the Edit/Write to brief.md, AND the conflict is named in the assistant output)", - "Either the JSON status is 'blocked' with the conflict in the reason field, OR addendum.md is updated with an override entry capturing the rationale for reversing the prior decision", - "If the brief is updated, decision-log.md gains a new entry referencing the catering reversal", - "If the brief is updated, the YAML frontmatter 'updated' field is later than the original 'created' field" - ] - }, - { - "id": "B6", - "_pattern": "process-discipline", - "prompt": "Run headless. Update the brief at evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md — we have signed our fifth chef partner (Chicago metro). Add this to the existing operating-model and what's-known sections. Read the existing decision-log.md first.", - "expected_output": "Clean update — does not contradict any prior decision. Brief gets updated, decision-log gains a new entry, YAML 'updated' bumps but 'created' stays the same. No spurious addendum since this is a status update, not an override.", - "files": [ - "evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md", - "evals/bmm-skills/bmad-product-brief/files/forkbird-brief/addendum.md", - "evals/bmm-skills/bmad-product-brief/files/forkbird-brief/decision-log.md" - ], - "expectations": [ - "brief.md is updated to reflect the signed fifth chef partner in Chicago", - "brief.md frontmatter 'updated' field is later than the original 'created' timestamp; 'created' is unchanged", - "decision-log.md contains a new entry referencing the fifth chef signing", - "The transcript does not surface a fictional contradiction — this is a clean update, not an override of a prior decision" - ] - }, - { - "id": "B7", - "_pattern": "process-discipline", - "prompt": "Run headless. Validate the brief at evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md — we are presenting to the library board Monday. Read the addendum and decision-log in the same folder. Cite specific sections. Return inline only.", - "expected_output": "Validate is read-only. No new files created. No existing files modified. Critique returned inline in the assistant output.", - "files": [ - "evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md", - "evals/bmm-skills/bmad-product-brief/files/mossridge-brief/addendum.md", - "evals/bmm-skills/bmad-product-brief/files/mossridge-brief/decision-log.md" - ], - "expectations": [ - "No new files appear in the mossridge-brief artifacts directory after the run (only the three input files)", - "The input brief.md, addendum.md, and decision-log.md are byte-identical to the staged fixtures (no Edit/Write tool calls modified them)", - "The transcript contains no Write tool calls and no Edit tool calls targeting the mossridge-brief folder", - "The final assistant message contains a JSON object with intent='validate'" - ] - }, - { - "id": "C1", - "_pattern": "config-compliance", - "prompt": "Run headless. Create a product brief for TaskFlow — a lightweight daily planning app for freelancers who juggle multiple clients. Core idea: a single daily view that pulls together tasks, time blocks, and client context so the freelancer always knows what to work on next. Target is independent freelancers, 1-3 clients at a time, who currently manage their day across sticky notes, calendar apps, and spreadsheets. MVP is mobile-first. No investors — the founder is bootstrapping.", - "expected_output": "Brief written in Spanish (document_output_language=Spanish). Assistant's conversational output reflects the configured British-accent communication style. Brief lands at the custom output path (test-output/artifacts/briefs/...) rather than the default _bmad-output path. Brief is right-sized for a bootstrapped solo project.", - "files": [], - "expectations": [ - "brief.md exists under test-output/artifacts/briefs/ (the custom planning_artifacts path), not under _bmad-output/", - "The final JSON status block artifact paths reference test-output/ rather than _bmad-output/", - "brief.md body is written in Spanish — the majority of prose content (headings, section bodies) is in Spanish, not English", - "brief.md covers the TaskFlow concept: freelancer daily planning, multi-client context, the sticky-notes-plus-calendar-plus-spreadsheet problem", - "brief.md is right-sized for a bootstrapped side project — appropriate depth and scope for a solo-founder app with no investor audience, no TAM/SAM/SOM framing, no Series A language, and no sections that pad for enterprise credibility", - "The assistant's non-document output (transcript text content outside of brief.md) contains at least one marker of British informal register (e.g., 'mate', 'cheers', 'brilliant', 'sorted', 'innit', 'blimey', 'proper', 'right then', or equivalent pub-idiom phrasing)" - ] - } - ] -} diff --git a/evals/bmm-skills/bmad-product-brief/files/branfield-memo.md b/evals/bmm-skills/bmad-product-brief/files/branfield-memo.md deleted file mode 100644 index 0836d9d91..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/branfield-memo.md +++ /dev/null @@ -1,46 +0,0 @@ -# Working Group Notes — Microcredential Program - -**Branfield Community College** -**Meeting:** 2026-04-22 -**Attendees:** Provost, Workforce Dev Director, Chair of Industry Advisory Board, two faculty leads (Data Analytics, Healthcare Admin), Financial Aid Director - -## Why we're doing this - -Regional employer survey (Q1 2026) showed 340+ unfilled mid-skill jobs in the three-county area. State workforce board approved a $1.4M grant if we can launch by fall 2027 with at least three tracks. Existing AAS programs are too long for working adults — average completion 3.5 years. - -## What we're building - -Six-month stackable microcredentials. Three tracks at launch: - -1. **Data Analytics** (SQL, Excel/Power BI, intro Python). Faculty lead Marisol Reyes. Strongest employer demand. Will be MVP — first to launch, used to validate format. -2. **Healthcare Admin** (medical coding, EHR systems, patient workflow). Faculty lead Dev Patel. Aging population in region drives demand. -3. **Sustainable Construction** (green building practices, retrofit basics, code compliance). New faculty hire required. - -Stackable means credits transfer into related AAS or BAS later if the student wants. - -## Decisions made today - -- **Data Analytics is MVP.** Launch fall 2027, others phase in spring/fall 2028. Validate format before scaling. -- **Hybrid delivery.** Two evenings/week in person + asynchronous online. Board rejected pure-online (concerns about adult learner outcomes data). -- **Stipend program.** Up to $3,000/student for low-income students, funded from the state grant. Means-tested. -- **Industry Advisory Board** has approval authority on curriculum. Three employers committed (regional hospital, mid-size data consultancy, county housing authority). All three commit to interview every graduate. -- **Cohort cap: 24 per track per term.** Driven by classroom size and faculty load. - -## Open questions - -- Childcare for evening sessions — can we partner with the campus childcare center? Deferred to next meeting. -- Marketing — provost wants to know cost per enrolled student before approving budget. Need workforce dev to model. -- Do we offer a tuition payment plan in addition to the stipend? Financial aid director thinks yes; provost wants to see uptake projections first. - -## What we're NOT doing - -- Not pursuing pure-online delivery (rejected — see above). -- Not launching all three tracks at once (rejected — risk concentration, faculty bandwidth). -- Not building employer-customized cohorts (rejected — too operationally complex for MVP). - -## Next steps - -- Workforce Dev: marketing cost model by 2026-05-15. -- Provost: childcare partnership exploratory conversation. -- Faculty leads: draft data analytics curriculum outline by 2026-06-01. -- Reconvene 2026-05-20. diff --git a/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/addendum.md b/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/addendum.md deleted file mode 100644 index e5fd867c0..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/addendum.md +++ /dev/null @@ -1,40 +0,0 @@ -# Addendum — Forkbird Kitchen - -## Options considered (and not taken) - -### B2B / corporate catering - -Considered as a parallel revenue stream from day one. Rejected for MVP. Different operational rhythm (bulk orders, fixed delivery windows, invoiced billing), different customer (procurement, not eaters), different unit economics. Splitting attention at launch risked degrading both. Revisit if consumer foundation is established by month 12. - -### Subscription / meal plan - -Considered as a recurring-revenue layer. Rejected for MVP. Operationally expensive at our planned scale: requires demand forecasting per subscriber, kitchen scheduling locked further out, and packaging/refrigerated handling we are not yet equipped for. Reasonable to revisit once kitchen utilization stabilizes. - -### Retail / grocery channel - -Considered (refrigerated meals in Whole Foods, Sprouts). Rejected for MVP. Different product (cold meals, longer shelf life, different texture profile), different go-to-market (broker relationships, slotting fees, category management). Parked for year 2 — would require a separate product line, not a channel extension. - -### Lower-priced everyday tier - -Considered. Rejected for now. The brand position is chef-driven; introducing a value tier alongside risks the premium signal in marketplace search ranking and review patterns. Explored alternative of separate brand for value tier; deferred. - -## Personas (extended) - -**The plant-based weekday professional.** Lives in a dense urban neighborhood, orders 4–6 times a month, splits between own-cooking and delivery. Sources of dissatisfaction with current options: chain plant-based menus feel formulaic, fine-dining plant-based is too expensive for weeknight, marketplace search surfaces too many low-quality options. - -**The dietary-flex household member.** One person in a household is plant-based by preference; the other(s) are not. Ordering pattern is "tonight one of us wants Forkbird, the other wants something else." We benefit from being a dependable single-cuisine option that doesn't require negotiating across diets. - -## Sizing notes - -- Total addressable: ~6.2M urban professionals across 5 metros eating plant-based 3+ times/week (based on 2024 Plant Based Foods Association data, urban segmentation). -- Serviceable addressable (within delivery radius of planned kitchens at launch): ~840K. -- Realistic Y1 capture (per metro forecast): 0.4% of SAM = 3,360 active customers across all metros. - -## Sourcing standard — exact wording - -"For each dish on the menu, we publish the source of every ingredient that represents at least 5% of cost. We commit that at least 60% of total ingredient weight is sourced within 200 miles of the kitchen preparing that dish. Both numbers are auditable; we publish them per-dish in the app. If we cannot meet the 60% local threshold for a dish, the dish does not ship." - -## Technical constraints - -- Marketplace integration (DoorDash, UberEats, Grubhub) requires their menu management API. We are using a third-party middleware (Olo) to avoid maintaining three separate integrations. -- Ingredient transparency display requires structured data per dish. We need an ingredient-master database; current option is to extend our recipe-management software vendor. diff --git a/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md b/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md deleted file mode 100644 index 81c5fc5c1..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/brief.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -title: Forkbird Kitchen — Product Brief -status: final -created: 2026-02-14 -updated: 2026-02-14 ---- - -# Forkbird Kitchen - -## What it is - -A delivery-only ghost kitchen brand offering chef-driven plant-based meals in five US metros: San Francisco, New York, Los Angeles, Seattle, and Chicago. Launch operating model is direct-to-consumer through our own iOS/Android app and the major third-party marketplaces (DoorDash, UberEats, Grubhub). - -## Who it's for - -Urban professionals aged 28–45 who eat plant-based meals at least three times a week, value chef-driven food over chain alternatives, and order delivery 4+ times monthly. Initial geographic focus is dense neighborhoods within 3-mile delivery radii of partner kitchens. - -We are not building for: families with children (different ticket size and ordering pattern), occasional plant-based eaters (price sensitivity too high for our positioning), or office lunch (different time-of-day operation). - -## Why it wins - -Three things are deliberately stacked: - -1. **Chef partnerships, not chef-as-marketing.** Each metro has a named chef (with prior fine-dining or notable plant-based credit) who designs the rotating menu and earns equity in that metro's P&L. They are not endorsers; they are operators. -2. **Ingredient sourcing standards.** Published per-dish: where it came from, how it was farmed, what portion of cost it represents. No dish ships if we can't source within 200 miles for ≥60% of ingredient weight. This is auditable, not marketing copy. -3. **Speed without cars.** Average ticket-to-door is 28 minutes from order placement, achieved by tight delivery radii and dense order density per kitchen. Long delivery erodes plant-based texture more than animal protein — speed is product, not logistics. - -## Operating model - -Five kitchens, one per metro, each leased space inside an existing food-prep facility. No customer-facing storefronts. App orders go through our stack; marketplace orders pass through their stacks. Menu rotates every six weeks per chef. - -Pricing tier: $14–$22 per entrée before delivery. We are deliberately at chef-driven positioning, not value positioning. - -## What's known - -- Demand validated through three pop-up dinners in SF and NY (Q4 2025). 480 covers, 78% repeat intent based on post-event survey. -- Operating partner identified in each metro. Leases signed for SF, NY, LA. Seattle and Chicago in negotiation. -- Three of five chefs signed; two in active conversations. - -## What's unknown - -- Whether ingredient-sourcing transparency is a differentiator at point of sale (in-app) or only in marketing. Our hypothesis is "both" but we have not tested in-app. -- Marketplace economics. DoorDash takes 15–30% depending on tier; we are modeling the lower tier but have not negotiated. -- Whether the 3-mile radius holds outside SF/NY (lower density in LA/Chicago). - -## Risks - -- Chef churn. If a metro chef leaves, the metro brand loses its anchor. Mitigation: equity vesting over 24 months, named-chef terms in operating agreement. -- Sourcing cost volatility. 60% local-within-200-miles can spike with weather/supply disruption. We have not modeled the worst case. -- Marketplace dependency. If DoorDash terms shift adversely, our blended margin is at risk. We are deliberately building the owned-app channel to reduce this dependency. - -## Success criteria for first 12 months - -- 4 of 5 metros operating profitably at the unit level (kitchen + chef + delivery economics) by month 9 -- 30% of orders through owned app (vs. marketplaces) by month 12 -- Chef retention 100% through year 1 diff --git a/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/decision-log.md b/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/decision-log.md deleted file mode 100644 index d7bbb7e97..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/forkbird-brief/decision-log.md +++ /dev/null @@ -1,27 +0,0 @@ -# Decision Log — Forkbird Kitchen - -## 2026-01-08 -- **Brand position: chef-driven, premium plant-based.** Considered value tier; rejected for MVP. Premium positioning is the wedge against marketplace generic plant-based. - -## 2026-01-12 -- **Five-metro launch: SF, NY, LA, Seattle, Chicago.** Considered three-metro start; rejected as not enough density to test the chef-equity model meaningfully. -- **Ghost kitchen, no storefront.** Storefronts ruled out — capex too high for MVP, dilutes the speed advantage. - -## 2026-01-19 -- **Pricing tier $14–$22 per entrée.** Modeled against three competitor sets: chain plant-based, fine-dining plant-based delivery, generic mid-tier delivery. Sits cleanly above chain, below fine-dining. -- **Chef equity in metro P&L.** Rejected flat fee + revenue share alternative; equity creates the operator incentive we want. - -## 2026-01-26 -- **Rejected B2B catering segment for MVP.** Different operational rhythm and customer; would split attention at launch and risk degrading both consumer and B2B execution. Revisit in year 2 if consumer foundation is solid. (Discussion: 2 hours; chef partners weighed in against splitting focus; CFO modeled the dilution effect on consumer kitchen utilization.) -- **Rejected subscription model for MVP.** Operationally expensive at planned scale; revisit once kitchen utilization stabilizes. - -## 2026-02-02 -- **Sourcing standard: 60% within 200 miles, published per-dish.** Considered weaker thresholds (50% / 250 miles); rejected as not differentiating enough to be worth publishing. The number has to be defensible. -- **Marketplace channel mix: own app + DoorDash + UberEats + Grubhub.** Considered own-app only; rejected as too slow on demand acquisition. Considered marketplaces only; rejected — own app is critical to long-term margin. - -## 2026-02-09 -- **Six-week menu rotation per chef.** Considered four-week (more freshness) and eight-week (more operational stability). Six is the compromise; reassess after first two cycles. -- **Marketing budget: 60% acquisition / 40% brand.** Rejected pure-acquisition because chef-driven positioning needs brand-level signal that paid acquisition alone won't carry. - -## 2026-02-14 -- **Brief finalized for Series A inputs.** Status moved to final. diff --git a/evals/bmm-skills/bmad-product-brief/files/meridian-mobility-report.md b/evals/bmm-skills/bmad-product-brief/files/meridian-mobility-report.md deleted file mode 100644 index 0f9de8838..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/meridian-mobility-report.md +++ /dev/null @@ -1,116 +0,0 @@ -# E-Mobility Market Report 2026 - -**Prepared by:** Meridian Insights -**Date:** Q2 2026 -**Coverage:** North America, with comparative reference to EU markets -**Engagement code:** MI-2026-EMOB-007 - ---- - -## Executive Summary - -The e-mobility category continues a multi-year structural shift from "alternative transportation" to mainstream mobility infrastructure. North American unit volume across e-bikes, e-scooters, and connected safety hardware grew 18% year-over-year in 2025, against a 6% growth rate for traditional bicycles. Three macro factors are durably reshaping the category: regulatory clarity at the state level (29 US states now have explicit e-bike classifications, up from 14 in 2022), insurance industry interest in telematics-style risk pricing, and a generational shift in commuting preferences among the 28-44 cohort. - -This report covers seven segments of the broader e-mobility landscape: e-bike retail, e-scooter regulation, bike-share systems, charging infrastructure, smart helmet hardware, and grid-integration trends. Findings are synthesized from 142 stakeholder interviews, 18 retailer site visits, government regulatory filings, and proprietary point-of-sale data from 4,200 specialty retail outlets. - ---- - -## Methodology - -Quantitative data was sourced from Meridian's proprietary Mobility Retail Panel (MRP), which aggregates POS data from independent specialty retailers and select chain operators. Where panel data is incomplete or lagging, we supplemented with manufacturer-reported shipment volumes and customs/import filings. Qualitative findings draw on 142 interviews conducted between November 2025 and March 2026 with retailers, fleet operators, regulators, manufacturers, and end users. - -Helmet category sizing uses a separate methodology described in Section 8, blending CPSC compliance filings, manufacturer disclosures, and a sample purchase-intent survey of 3,400 cyclists. - ---- - -## Section 3: Market Sizing — Total E-Mobility - -The North American e-mobility market reached an estimated $14.7B in retail volume in 2025, up from $12.5B in 2024. The largest segment by volume is e-bikes at $7.2B, followed by e-scooter retail at $2.8B (excluding shared-fleet operations), bike-share and dockless mobility services at $2.1B, charging infrastructure at $1.8B, and connected safety hardware at $0.8B. - -Compound annual growth rate (CAGR) forecasts through 2030 vary substantially by segment. We forecast 14% CAGR for e-bikes, 6% for e-scooters (decelerating as the regulatory regime stabilizes), 9% for bike-share, 22% for charging infrastructure (driven by both bike and scooter charging), and 31% for connected safety hardware (off a smaller base). Vehicle-to-grid (V2G) integration is too early to forecast reliably; we treat it as an emerging segment. - ---- - -## Section 4: E-Bike Market Deep Dive - -E-bikes represent the largest single segment by retail value. The 2025 unit mix favored Class 1 (pedal-assist, max assisted speed 20 mph) at 58% of units, Class 2 (throttle, max 20 mph) at 24%, and Class 3 (pedal-assist, max 28 mph) at 18%. Class 3 is the fastest-growing classification on a unit basis, driven by suburban commuter demand. - -Manufacturer concentration shifted in 2025. The top 10 brands by unit volume now hold 64% of the market, up from 51% in 2022 — consolidation that mirrors patterns seen in the traditional bicycle market in the early 2000s. Specialized, Trek, and Cannondale (operating their respective electric sub-brands) represent the top three. Direct-to-consumer brands (Rad Power, Lectric, Aventon) collectively hold approximately 19% of retail value. - -Retail channel split favored independent specialty bike shops at 47% of unit volume, with direct-to-consumer at 28%, big-box retail at 17%, and e-commerce marketplaces (Amazon, Walmart.com) at 8%. The independent specialty channel commands a price premium of approximately 22% over comparable D2C alternatives, attributed to in-store fitting, post-sale service relationships, and higher-margin component upgrades. - -Notable trends in 2025: cargo e-bike sub-segment grew 41% YoY (small base, dense urban geographies); battery range claims continue to drift upward with manufacturer claims of 60+ mile range becoming standard for $2,500+ price points; bottom-bracket motor placement (mid-drive) gained share over hub-drive in the $3,000+ tier. - ---- - -## Section 5: E-Scooter Regulatory Landscape - -The North American e-scooter regulatory environment matured significantly during 2024-2025 after several years of municipal experimentation and reactive policymaking. Forty-one US cities now operate under what we classify as "stable" regulatory regimes (defined as: explicit operating permit framework, defined sidewalk/bike-lane rules, helmet provisions, and revenue-share or fee structures with the city). This is up from 19 cities in 2022. - -The regulatory shift has compressed operator margins. Permit fees and per-trip surcharges in major markets (Los Angeles, Chicago, Atlanta, Denver) range from $0.15 to $0.42 per trip, against average ride revenue of $5.40. Several major operators have exited markets where permit economics have proven unviable; Lime exited five secondary US markets in 2025 citing exactly this reason. - -Helmet requirements remain inconsistent. Thirteen US states require helmets for riders under 18 only; seven require them for all riders; the rest leave it to municipalities. Enforcement is widely acknowledged to be minimal even where mandates exist. EU markets are substantially stricter, with mandatory helmet provisions in France, Germany, and Italy applying to all e-scooter riders. - -Insurance treatment is also fragmenting. Five US states have classified e-scooters as "motor vehicles" requiring liability coverage, raising the floor on operating costs for shared-fleet providers. Most states still treat them as bicycles for insurance purposes. - ---- - -## Section 6: Bike-Share and Dockless Mobility - -Docked bike-share systems (Citi Bike, Divvy, Bluebikes, Capital Bikeshare) continue stable, slow growth. Capital Bikeshare reported 5.1M trips in 2025 (5% growth); Citi Bike reported 38M (8% growth). Docked systems benefit from station infrastructure that creates predictability for riders and meters demand-side adoption. - -Dockless bike-share (without fixed stations) is largely consolidated; the experimentation phase ended in 2023. Lyft operates the dominant national network through its acquired bike-share division, with regional players in select markets. Operating economics for dockless are structurally weaker than docked due to vehicle redistribution costs, vandalism rates, and the absence of station-driven advertising revenue. - -A notable trend is the convergence of bike-share and dockless e-bike subscription models. Several operators now offer monthly memberships that include unlimited 30-minute trips on dockless e-bikes within a service zone. Adoption is concentrated in dense urban cores where car-free lifestyles are practical. - ---- - -## Section 7: Charging Infrastructure Trends - -Charging infrastructure for e-bikes and e-scooters has emerged as a meaningful sub-segment, growing 28% in 2025. The dominant form factor remains residential at-home wall chargers (87% of installed base), but commercial charging — at workplaces, transit stations, and apartment buildings — is the fastest-growing sub-segment. - -Standardization remains a constraint. Battery interfaces have not converged; Bosch, Shimano, and various proprietary systems coexist. The European Union's USB-C mandate for portable electronics has not yet extended to e-mobility; industry observers expect regulatory pressure to follow within 3-5 years. - -Workplace charging is increasingly common in tech and creative-industry employers; we estimate 31% of large urban employers in tech-heavy metros now offer workplace e-bike charging, up from 12% in 2022. Apartment buildings lag — 7% of class-A multifamily properties offer common-area charging, with retrofit cost cited as the primary barrier. - -Public charging at transit hubs (subway/light rail stations) remains a stated priority across most major metro transit authorities, but actual installation lags policy commitments significantly. Funding fragmentation and permitting delays are the consistently cited bottlenecks. - ---- - -## Section 8: Smart Helmet Category - -The connected safety hardware category — colloquially "smart helmets" — is the smallest segment we cover by retail value but has the strongest growth profile. The North American smart helmet market reached $810M in retail value in 2025, up from $480M in 2023, representing a 30% CAGR. We forecast $2.4B by 2030, contingent on the resolution of two open questions detailed below. - -**Category definition.** We define "smart helmets" as helmets that include at least one connected safety feature: turn signals (typically wireless-controlled), braking lights (auto-activated via accelerometer), crash detection (auto-notification to emergency contacts on detected impact), or integrated navigation/audio (bone-conduction speakers, often paired with smartphone apps). Helmets with passive integrated lighting only (no connectivity) are excluded from this category and tracked under traditional helmet retail. - -**Key players.** The category remains fragmented; no single manufacturer commands more than 15% market share. Top five by 2025 retail volume: Lumos Helmet (US, market leader at ~14% share with strong DTC presence), Sena Technologies (Korea, intercom heritage, ~11%), Coros (US/China, multi-sport, ~9%), Specialized ANGi (US, premium tier at ~7%), and POC Aid (Sweden, premium safety positioning at ~6%). Approximately 30 smaller brands hold the remaining share. - -**Crash detection technology.** Two architectures dominate: single-accelerometer crash detection (lower cost, higher false-positive rate) and multi-sensor fusion (accelerometer + gyroscope + GPS movement signature, lower false-positive rate but higher BOM cost). Insurance industry sources indicate that multi-sensor systems are likely to become a baseline requirement for any insurance discount programs, given that single-accelerometer systems triggered roughly 1 false alert per 47 hours of riding in our test panel. - -**Regulatory landscape.** Smart helmets sit at the intersection of two regulatory regimes: the Consumer Product Safety Commission's bicycle helmet standard (16 CFR 1203, governing impact protection) and the Federal Communications Commission's regulation of intentional radiators (governing the radio components for Bluetooth/cellular). Compliance with both is non-trivial. Eight smart helmet brands have had FCC Part 15 violations issued since 2023, typically for emissions exceeding limits during compliance testing. EU markets additionally require EN 1078 certification for the helmet shell; this is widely held but adds 3-5 months to a typical product development timeline. - -**Insurance industry interest.** Major auto insurers (State Farm, Progressive, Geico, Nationwide) are actively piloting telematics-style discount programs for cyclists who use connected safety helmets. The proposed structure mirrors auto-insurance "good driver" discount frameworks, with discounts of 5-15% on cycling-specific insurance riders or umbrella policies. As of Q1 2026, three insurers have public pilot programs and one (Progressive) has announced general availability for 2027. This could materially accelerate category adoption if discounts materialize at the upper end of the proposed range. - -**Distribution.** D2C dominates at 58% of retail value, reflecting the still-emerging category and the absence of strong channel inventory in independent bike shops. The specialty bike shop channel is growing rapidly (up from 12% to 22% of retail value over 2023-2025) as the category gains category-management attention from major distributors. Big-box channels (REI, Dick's Sporting Goods) are present but shallow in selection — typically 4-8 SKUs versus 40+ in dedicated specialty. - -**Open questions for the segment.** Our growth forecast is conditioned on (a) the proportion of insurers that follow Progressive into general availability of connected-safety discounts; (b) whether multi-sensor crash detection becomes a category baseline (lifting ASP) or remains a premium-tier feature; and (c) whether the current high false-positive rate of single-accelerometer systems triggers a consumer backlash that suppresses category trust before insurance discounts arrive. The downside scenario produces a 2030 category size of $1.4B versus our base-case $2.4B. - ---- - -## Section 9: Vehicle-to-Grid Integration - -Vehicle-to-grid (V2G) integration of e-bike and e-scooter batteries is an emerging area, but practical commercial deployment is years away. The thesis is that fleet-scale dockless e-bikes and e-scooters represent meaningful aggregate battery capacity that could participate in demand-response markets, particularly in deregulated electricity markets. - -Several technical preconditions must be met: standardized battery interfaces (currently absent), bidirectional charging hardware (rare), aggregator software stack (early-stage), and regulatory clarity on energy market participation by mobility fleets (pre-policy). We treat this as a watch item for 2028+ rather than a current investable theme. - ---- - -## Section 10: Outlook - -Our base-case forecast for North American e-mobility is $22.5B by 2030, with the e-bike segment reaching $11.8B (the largest), connected safety hardware reaching $2.4B (the fastest-growing in percentage terms), and charging infrastructure reaching $4.2B (driven by commercial and multifamily retrofit demand). Bike-share and dockless mobility plateau in the $2.5-3.0B range as urban density limits adoption ceilings. - -The largest single uncertainty in this forecast is the trajectory of insurance industry adoption of connected-safety telematics, which could accelerate or substantially constrain the smart helmet segment and, secondarily, influence rider behavior across the broader category. We will revisit forecasts in our Q4 2026 update. - ---- - -*This report is prepared for the exclusive use of Meridian Insights subscribers. Reproduction or external distribution without written permission is prohibited.* diff --git a/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/addendum.md b/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/addendum.md deleted file mode 100644 index 9fdbf7236..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/addendum.md +++ /dev/null @@ -1,41 +0,0 @@ -# Addendum — Mossridge Tool Lending Library - -## Options considered - -### Paid lending model (rejected) - -Considered charging a nominal per-loan fee ($2–$5) to cover replacement and maintenance. Rejected as inconsistent with library mission of free access. Board has previously stated free access is non-negotiable for core services. A donation jar at checkout was proposed as a soft alternative; deferred. - -### Hardware store partnership (considered, deferred) - -Mossridge Hardware (the store committing in-kind donations) offered to host a satellite lending point. Considered; deferred to year 2. The integration adds operational complexity (split inventory, cross-location tracking) we are not equipped for at launch. Reasonable to revisit once the main location is established. - -### Mobile lending van (rejected) - -Proposed by a board member to serve outlying areas. Rejected for MVP — capital cost ($35K+ for vehicle + outfitting) exceeds the entire grant. Could be a year-three expansion if demand validates. - -### Skills classes alongside tool loans (deferred) - -Considered offering "how to use a power drill" classes as a value-add. Deferred — interesting but distinct programming, not part of the lending service's MVP scope. Adult Services Librarian is interested in piloting separately. - -## Reference programs reviewed - -- Berkeley Tool Lending Library (operating since 1979, ~3,000 tools, 250+ daily loans). Funded as a city service. -- Oakland Tool Lending Library (operating since 2000, smaller catalog, library-staffed). -- Toronto Tool Library (nonprofit, member-supported, paid model — different funding architecture). - -Direct correspondence with Berkeley TLL staff (March 2026) suggested: -- Theft has been low (~2% annually) due to library card requirement and community norms -- The biggest sustainability risk has been staff hours, not tool replacement -- Most successful programs have a paid coordinator role, not pure volunteer - -## Potential expansion (year 2+) - -- Hardware store satellite location -- Specialty tool categories: woodworking, automotive, sewing -- Skills classes paired with relevant tool checkouts -- Seed/cuttings library co-located in spring/summer - -## Insurance and liability — current state - -Library counsel (Town of Mossridge legal department) has been consulted informally. Formal opinion pending. Existing policy covers patrons in the building; coverage for tool use off-premises is the open question. Awaiting written response before submitting grant application. diff --git a/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md b/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md deleted file mode 100644 index ad5fc4761..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/brief.md +++ /dev/null @@ -1,57 +0,0 @@ ---- -title: Mossridge Public Library — Tool Lending Library Proposal -status: final -created: 2026-04-30 -updated: 2026-04-30 ---- - -# Tool Lending Library at Mossridge Public Library - -## What we're proposing - -A free tool-lending service operated out of the Mossridge Public Library, modeled on similar programs in Berkeley, Oakland, and Toronto. Cardholders borrow hand and power tools (drills, saws, ladders, sanders, plumbing snakes, gardening tools) for up to seven days, free of charge. - -## Why now - -Mossridge residents face rising costs of home maintenance and DIY supplies. Anecdotally, demand for community-shared resources is high — staff have fielded "do you lend tools?" requests for years. A tool library extends the library's mission of equitable access to information and skill-building into the practical-skills domain. - -## Who it serves - -Mossridge residents with active library cards. Primary audience: single-family homeowners doing their own home repairs, renters making minor improvements with landlord permission, hobbyist woodworkers and gardeners. Estimated 8,000 households in the library's service area. - -## Service design - -- **Catalog:** Approximately 200 tools to start, prioritizing the most-requested categories (drilling, cutting, sanding, ladders, garden). -- **Loan period:** Seven days, one renewal allowed if no holds. -- **Borrower requirements:** Active library card, signed liability waiver, completed safety briefing for power tools. -- **Location:** Library basement, currently underutilized storage. Accessible by elevator. -- **Hours:** Tuesday–Saturday during library hours; tools returned via after-hours drop slot when closed. - -## Funding - -- ARPA infrastructure grant: $42,000 (anticipated, application pending) -- Friends of the Mossridge Library matching funds: $10,000 (committed) -- In-kind tool donations from Mossridge Hardware (committed in principle) - -Year-one operating cost is estimated at $48,000, primarily tool purchase, maintenance supplies, and shelving/storage retrofit. Ongoing cost (year two and beyond) projected at $12,000 annually for replacement tools and consumables. - -## Operations - -The service will be run by trained library volunteers, supervised by the Adult Services Librarian. Volunteer training program to be developed in partnership with Mossridge Vocational Center. Estimated 4–6 active volunteers needed at any given time, with a roster of 12–15 trained volunteers to provide coverage. - -## Risks - -- **Theft and loss.** Tools are valuable and portable. Mitigation: deposit on power tools (refundable), card-required checkout, photo documentation at loan and return. -- **Liability.** Borrower waivers will be required; the library's existing insurance policy is being reviewed for coverage. -- **Demand uncertainty.** We do not yet know the actual borrowing volume the service will see. - -## Success criteria - -- Launch by Q3 2027 with a catalog of 200 tools. -- 300 unique borrowers in the first year of operation. -- Zero serious injury incidents. -- Tool loss rate under 5% per year. - -## What we're asking - -Board approval to proceed with the ARPA grant application and finalize the service design for fall 2027 launch. diff --git a/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/decision-log.md b/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/decision-log.md deleted file mode 100644 index 7965b1ac6..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/mossridge-brief/decision-log.md +++ /dev/null @@ -1,29 +0,0 @@ -# Decision Log — Mossridge Tool Lending Library - -## 2026-03-04 -- **Pursuing the project.** Adult Services Librarian + Library Director agreed there's enough informal demand signal (years of "do you lend tools?" inquiries) to investigate seriously. Acknowledged that informal inquiries are not the same as validated demand. - -## 2026-03-11 -- **Reference programs to study: Berkeley, Oakland, Toronto.** Selected based on size, longevity, and accessibility of operational data. - -## 2026-03-25 -- **Initial scope: hand and power tools only.** Rejected including specialty categories (sewing, electronics test gear, automotive) for MVP. Reason: staff expertise and storage. Revisit year 2. -- **Free model.** Confirmed — paid model rejected as inconsistent with library mission. Donation jar approved as soft revenue. - -## 2026-04-01 -- **Volunteer-run model.** Selected to keep ongoing operating costs low. Acknowledged risk: Berkeley correspondence flagged staff-hours as the biggest sustainability concern in similar programs. Plan to revisit at year-one review. - -## 2026-04-08 -- **Funding architecture: ARPA grant + Friends matching + in-kind donations.** Considered municipal budget request; rejected as too slow (next budget cycle is 18 months out). Grant is faster but requires fall 2027 launch deadline. - -## 2026-04-15 -- **Launch timing: Q3 2027.** Driven by ARPA grant deadline, not by service-readiness analysis. Acknowledged this is grant-driven, not user-driven, timing. -- **Year-one target: 300 unique borrowers.** Set by analogy to comparable programs scaled to Mossridge population. No local validation underlying this number. - -## 2026-04-22 -- **Hardware store satellite deferred to year 2.** Operational complexity exceeds our launch capacity. -- **Liability: pending formal opinion from town legal.** Borrower waiver in draft. - -## 2026-04-30 -- **Brief finalized for board meeting.** Status moved to final. -- **Open items acknowledged for board discussion:** demand validation method, volunteer sustainability, written legal opinion on off-premises tool use coverage. diff --git a/evals/bmm-skills/bmad-product-brief/files/pantry-bridge-interviews.md b/evals/bmm-skills/bmad-product-brief/files/pantry-bridge-interviews.md deleted file mode 100644 index 20f011297..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/pantry-bridge-interviews.md +++ /dev/null @@ -1,90 +0,0 @@ -# Pantry Bridge — Customer Research Transcripts - -**Project:** Pantry Bridge meal-kit concept exploration -**Research firm:** In-house -**Round:** Discovery interviews, March 2026 -**Format:** 45-minute semi-structured interviews, video; excerpts below are lightly edited for length and clarity - -The four interviews below cover four distinct potential customer segments. We are sharing all four for context, though the team's current product hypothesis targets one specific segment. - ---- - -## Interview 1 — Susan, 38, working parent - -**Household:** Two kids (ages 6 and 9), spouse works full-time, both parents work demanding office jobs. Suburban Chicago. - -**Susan:** "Honestly, the question is just — can I get dinner on the table by 6:30 without it being chicken nuggets again? My kids don't eat anything green unless we play games about it. My husband and I both have late meetings sometimes. We've tried HelloFresh, we've tried Blue Apron, we tried Home Chef. They all kind of work, and they all kind of don't. - -The thing that breaks them for us is the prep time. The boxes say 30 minutes but you need to add 10-15 to actually get it done. By Wednesday night I don't have 45 minutes. So we end up using the boxes on weekends and ordering takeout three nights a week, which is the opposite of what the boxes are supposed to do. - -If you really wanted to crack it for families like ours: pre-chopped vegetables, sauces that are actually finished and not 'whisk these eight things together.' I'll pay more for less prep. And the recipe books need to read like the kid is going to eat it — not like 'spicy harissa-rubbed cauliflower steaks.' - -Portion sizing — most kits send way too much for our family. We're a family of four but the kids each eat about 60% of a meal. We end up with leftovers that go bad. Better sizing would help." - -**Interviewer:** What about price? - -**Susan:** "We spend $250-350 a week on groceries currently and probably another $200 on takeout. So a meal kit that replaces three nights of takeout could be $200 a month and we'd still come out ahead. Most kits are priced fine; it's the time that breaks them." - ---- - -## Interview 2 — Marcus, 21, college student - -**Household:** Junior at state university, off-campus apartment shared with two roommates, kitchen has a microwave, a stovetop, and a half-broken oven. Limited budget. - -**Marcus:** "I'm probably the wrong person for this conversation, no offense. I'm not really a meal-kit person. My food situation is, like, dining hall meal plan when I can use it, and the rest is whatever's cheap and fast. Trader Joe's frozen stuff. Eggs. Pasta. Costco runs with my roommates once a month. - -I tried a meal kit when my mom signed me up as a 'starting college' gift. It was nice, but it was $80 a week for two people, which is way out of budget. And honestly, the thing they don't get is that I don't have time at 7 PM to cook. I have time at 11 PM. I want to grab something on my way back from the library and not think. - -If you're trying to do meal kits for college students — and I don't really think you should — but if you were, the price has to be like $5 a meal. And it has to be food that survives in a fridge for two weeks because we don't shop on a weekly schedule. We shop when we run out. - -Snacks matter more to us than meals, actually. Like, the moment when I'm desperate is 10 PM in the library, not 7 PM. Solve that and I might pay attention." - -**Interviewer:** Do you have any dietary restrictions? - -**Marcus:** "I'm vegetarian, sort of. I eat fish. So pescatarian I guess. But mostly because meat is expensive." - ---- - -## Interview 3 — Eleanor, 71, retired, lives alone - -**Household:** Widow, lives alone in the same single-family home she's been in for 36 years. Suburban Cleveland. Two adult children live out of state. Drives during the day but no longer at night. - -**Eleanor:** "I'll tell you what I miss. I miss cooking for someone. My husband Walter passed five years ago this June, and the hardest thing — well, not the hardest, but one of them — is that I don't really cook anymore. I cook eggs. I cook a piece of fish. I open a can of soup more often than I'd like to admit. I used to make Sunday dinners that would feed eight people. Now I eat standing up at the counter half the time. - -The grocery store is genuinely difficult. I drive there, I park in the back of the lot because I can usually find a spot, and then it's a long walk in. I get tired by the time I'm in the dairy aisle. Carrying the bags from the car to the kitchen — that's a project. My daughter wants me to use grocery delivery and I've tried, but the apps are all designed for someone twenty years younger than me. Tiny buttons, asking me to click through six screens to add a single tomato. I get frustrated and give up. - -What I would actually want — and I've thought about this — is meals for one person. Real portions. Not a frozen TV dinner. Not 'serves four, freeze the rest.' I have a freezer full of leftovers I'll never eat. Just one good meal that I can heat up or finish cooking, that tastes like food I would have made. - -I'm watching my sodium because of my blood pressure. Watching sugar too — borderline diabetic, my doctor calls it. So I read labels carefully. The frozen meals you can buy in stores are loaded with both. I'd pay more for less of both, if I trusted that the labels were accurate. - -The other thing — and please put this in your notes — is that I'm careful about who I let into my house and what I sign up for. There are scams. My friend Marian got taken for $4,000 last year. So if some company asks for my information, I want to know who they are. I want a real customer service number with a real person. I want it to feel like a real business, not a flashy app. - -I don't want it to feel like 'old-people food.' That's an important thing. The Meals on Wheels program in our township is wonderful but it's clearly designed for people who are sicker than I am. I'm not sick. I just live alone and grocery shopping is a lot." - -**Interviewer:** What would the ideal experience look like? - -**Eleanor:** "Someone delivers good food, in real portions, made with the kind of ingredients I would have used. I can heat it up or finish it. It doesn't taste like a hospital. The packaging is something I can actually open without a knife. I get a phone call once in a while from a person, not a robot. The price is reasonable — I'm on a fixed income but I can spend on things that matter. Eating well matters." - ---- - -## Interview 4 — Dimitri, 44, Director of Food Services, mid-size hospital - -**Organization:** 340-bed hospital, food service operates patient meals, staff cafeteria, and a small retail café. Reports to the COO. - -**Dimitri:** "I'm probably also not who you should be talking to, but happy to share. We don't buy meal kits. We buy ingredients in institutional volumes from Sysco and US Foods primarily, with some specialty buys for dietary restrictions. We feed about 1,800 people a day across patients, staff, and visitors. - -What I deal with that you might find interesting is the patient diet matrix. We have to produce meals that meet specific medical requirements — renal diets, cardiac diets, diabetic diets, dysphagia textures, allergen-free, religious restrictions. Each patient gets a tray that meets their specific orders. It's complex. - -If a meal kit company wanted to play in our world, they'd be selling to me at the institutional level — bulk pricing, multi-year contracts, ability to deliver consistent specs across thousands of meals. That's not really a 'meal kit' anymore; that's wholesale food service. - -Now, where I might be a buyer in a different sense: my staff cafeteria. We're trying to compete with grab-and-go culture. If you produced ready-to-heat meals targeting our staff demographic — nurses, doctors, techs, who are working 12-hour shifts and want real food, not a sandwich — I might pay attention. But the price point would have to make sense for institutional buying, and you'd need to integrate with our existing food safety protocols. - -For consumer meal kits, I'm probably not your customer. We did try one when my wife and I were both working through COVID, and we let the subscription lapse after about three months. Fine product, just didn't fit our patterns." - ---- - -## Note from the research lead - -These four interviews were selected to represent the range of segments we've considered. The team's working hypothesis after this round is that the older-adult-living-alone segment is the strongest fit for the Pantry Bridge concept — distinctive needs, acknowledged friction with current options, willingness to pay for quality, and a meaningful unmet need around portion sizing and trust. Working parent segment is well-served by existing competitors. College student segment is too price-sensitive. Institutional segment is a different business entirely. - -The brief should target the older-adult segment based on the Eleanor interview specifically. diff --git a/evals/bmm-skills/bmad-product-brief/files/q2-brainstorm.md b/evals/bmm-skills/bmad-product-brief/files/q2-brainstorm.md deleted file mode 100644 index e04e45773..000000000 --- a/evals/bmm-skills/bmad-product-brief/files/q2-brainstorm.md +++ /dev/null @@ -1,101 +0,0 @@ -# Q2 Brainstorm — Hatchet & Loop Studio - -**Date:** 2026-04-15 -**Present:** Mira, Devon, Sofia, Theo - -Annual Q2 ideation. We're hunting for our next side-project-that-could-become-a-product. Format: 10 minutes wild ideas, 3 minutes per idea on quick takes, then we vote on one to dig into. - -## Round 1: Everything goes - -(10 minutes, no filtering. We just throw stuff out.) - -- A weather app that tracks your mood alongside the forecast (Devon) -- Meditation chime that learns your sleep cycle and chimes only at the right wake-window (Theo) -- A podcasting tool for non-podcasters — like, you record voice notes and it auto-edits and posts (Sofia) -- Craft beer subscription with detailed brewer notes you can read while drinking (Mira) -- AI sommelier app that tells you what wine to buy at Trader Joe's based on a photo (Theo) -- Office-plant-care subscription with auto-replacement when one dies (Devon) -- Neighborhood ride coordinator — like a private Uber pool for one neighborhood (Mira) -- Neighborhood compost coordinator — connect people with food scraps to people with active compost piles (Sofia) -- Cookbook app where you click "I'll cook this Tuesday" and it auto-generates the shopping list and sends it to your delivery service (Devon) -- AR home staging — point your phone at a room and it shows you what it would look like with different furniture (Theo) - -## Round 2: Quick takes - -### Weather + mood - -Devon: "I'd use it." Sofia thinks the data correlation isn't strong enough to be useful — interesting concept but the science doesn't support a product. Park. - -### Sleep-cycle meditation chime - -Theo's pitch — exists already (Sleep Cycle, etc.). Differentiation would be the chime, which is hardware. Out of scope for a software-first studio. - -### Podcasting for non-podcasters - -Sofia: "There are like fifty of these." She's right. Skip. - -### Craft beer subscription - -Mira admits this is mostly her wanting it for herself. We're not in the logistics business. Skip. - -### AI sommelier - -Theo: "The model would have to be incredibly good at label recognition." Sofia: "And there's already Vivino." Skip. - -### Office-plant-care subscription - -Devon: "I worked at a place that had this. They were always sad plants." Operational nightmare, low margin. Skip. - -### Neighborhood ride coordinator - -Mira: "Saturated. Lyft and Uber both have pool features. Uber Neighborhood was a thing and they killed it." Skip. - -### Neighborhood compost coordinator - -Sofia: "Hear me out. Cities are mandating organic waste separation but most apartments don't have a composting option. People in single-family homes often have active compost piles and would love more material. There's a missing match-making layer." General agreement this is more interesting than the others. Theo: "How do we make money?" Sofia: "Eventually a small fee on the compost-pile-host side, but for MVP just free and prove the demand." Group lights up. We agree to dig into this in Round 3. - -### Cookbook → shopping list - -Devon's pitch. Already exists (Mealime, Plan to Eat). Skip. - -### AR home staging - -Theo: "IKEA already has this." Skip. - -## Round 3: Compost coordinator deep dive - -We spent 45 minutes on this. Notes: - -**Who is the user?** -Two-sided market. Side A: apartment dwellers and renters who generate food scraps and want them composted (motivated by environmental values, sometimes by city mandates). Side B: people with active backyard compost piles who want more "browns and greens" — single-family homeowners, urban farmers, school gardens, community gardens. - -Sofia thinks Side A is the harder side to acquire (weak intent — recycling-adjacent behavior). Side B is easier but smaller. The product has to be designed around Side A's friction points. - -**Geographic scope.** -Hyperlocal — neighborhood-level, not city-wide. The whole point is short-distance handoff: Side A doesn't want to drive their food scraps across town. We're talking 5-block radius matches. - -**Business model (later).** -Free at launch. Eventually: subscription for Side B (compost-pile hosts) — they pay to access more matches. Side A always free. Possibly partner with cities that have green-waste mandates (B2G channel). - -**Technical approach.** -Web app first, mobile second. Map-based discovery. Identity verification light-touch (apartment dwellers are skittish about strangers; need trust signals). Match-and-message pattern, not real-time logistics. - -**Competition.** -ShareWaste exists but is global and not focused on hyperlocal density. Some city-specific apps (NYC's GrowNYC). No one has cracked the neighborhood-density model. - -**MVP scope.** -One pilot neighborhood. Sofia knows people in a Portland neighborhood (Sunnyside / Hawthorne area) where compost culture is strong. Start there. - -**Open questions.** -- How do we acquire Side A (apartment dwellers)? They have low intent and lots of competing options (just throwing scraps in trash, paying a service, signing up for city pickup if available). -- What does the trust layer look like? Reviews? Vouching? Real-name only? -- Does Side B saturation become a problem fast (one compost pile can only take so much)? How do we route demand? - -## Action items - -- Sofia: write up the compost coordinator concept as a brief by next Wednesday. Take it to Mira and Devon for first read. -- Devon: research ShareWaste's user numbers and any teardowns of why they haven't dominated. -- Theo: sketch the trust-layer UX concepts. -- Mira: talk to Sofia's Portland contacts about doing user interviews. - -Next meeting: 2026-04-29 — review brief draft, decide on go/no-go. diff --git a/evals/bmm-skills/bmad-product-brief/triggers.json b/evals/bmm-skills/bmad-product-brief/triggers.json deleted file mode 100644 index b933f0769..000000000 --- a/evals/bmm-skills/bmad-product-brief/triggers.json +++ /dev/null @@ -1,18 +0,0 @@ -[ - { "query": "Help me write a product brief for my new app idea", "should_trigger": true }, - { "query": "I need to draft a brief for a feature we're scoping", "should_trigger": true }, - { "query": "Update this product brief — we changed the target audience", "should_trigger": true }, - { "query": "Review my brief and tell me if it's investor-ready", "should_trigger": true }, - { "query": "Validate this brief before our board meeting Monday", "should_trigger": true }, - { "query": "Pressure-test my product brief for weak assumptions", "should_trigger": true }, - { "query": "Help me put together a one-page summary of my product idea for stakeholders", "should_trigger": true }, - - { "query": "Help me brainstorm ideas for a new feature", "should_trigger": false }, - { "query": "Write me a PRD for our checkout flow redesign", "should_trigger": false }, - { "query": "Run a working backwards exercise for my product idea", "should_trigger": false }, - { "query": "Document this existing codebase for AI agents", "should_trigger": false }, - { "query": "Help me write user stories for the next sprint", "should_trigger": false }, - { "query": "Generate a system architecture for my app", "should_trigger": false }, - { "query": "Write code to parse JSON in Python", "should_trigger": false }, - { "query": "Create a marketing landing page for my product", "should_trigger": false } -] diff --git a/src/bmm-skills/1-analysis/bmad-product-brief/SKILL.md b/src/bmm-skills/1-analysis/bmad-product-brief/SKILL.md index ec06f0a3d..ad40bf72c 100644 --- a/src/bmm-skills/1-analysis/bmad-product-brief/SKILL.md +++ b/src/bmm-skills/1-analysis/bmad-product-brief/SKILL.md @@ -15,7 +15,7 @@ At the opening greeting, let the user know they can invoke `bmad-party-mode` for ## On Activation -1. Resolve customization: `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. +1. Resolve customization: `uv run {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. 2. Execute each entry in `{workflow.activation_steps_prepend}` in order. 3. Treat every entry in `{workflow.persistent_facts}` as foundational context for the rest of the run. Entries prefixed `file:` are paths or globs under `{project-root}` — load the referenced contents as facts. All other entries are facts verbatim. 4. `{workflow.external_sources}` is an org-configured registry of internal tools (knowledge bases, MCP tools); consult them alongside generic web research on the same triggers in `## Discovery`, org tools preferred when their directive matches. If a named tool is unavailable at runtime, fall back to standard behavior and note the gap when relevant. @@ -28,11 +28,11 @@ Activation is complete. If `activation_steps_prepend` or `activation_steps_appen ## Intent Operating Modes -**Create.** A brief the user is proud of, that meets their needs, drawn out through real conversation — do not assume: instead converse and understand, and then help craft the best product brief for their needs. Begin in `## Discovery` before drafting; the brief comes after the picture is on the table. Shape follows the product and need. Treat `{workflow.brief_template}` as a starting structure, not a contract: drop sections that do not earn their place, add sections the product needs, reorder freely - create sections for specialized domains or concerns also as needed. The brief serves the product's story, not the template's shape. Bind `{doc_workspace}` to a fresh folder at `{workflow.brief_output_path}/{workflow.run_folder_pattern}/` and write `brief.md` there with YAML frontmatter (title, status, created, updated). For Update and Validate, `{doc_workspace}` is the existing folder of the brief being targeted. +**Create.** A brief the user is proud of, that meets their needs, drawn out through real conversation — do not assume: instead converse and understand, and then help craft the best product brief for their needs. Begin in `## Discovery` before drafting; the brief comes after the picture is on the table. Shape follows the product and need. Treat `{workflow.brief_template}` as a starting structure, not a contract: drop sections that do not earn their place, add sections the product needs, reorder freely - create sections for specialized domains or concerns also as needed. The brief serves the product's story, not the template's shape. Bind `{doc_workspace}` to a fresh folder at `{workflow.brief_output_path}/{workflow.run_folder_pattern}/`, write `brief.md` there with YAML frontmatter (title, status, created, updated), and seed the memlog: `uv run {project-root}/_bmad/scripts/memlog.py init --workspace {doc_workspace} --field topic=""`. For Update and Validate, `{doc_workspace}` is the existing folder of the brief being targeted. -**Update.** Reconcile an existing brief with a change signal. Before proposing changes, read the brief, addendum, `.decision-log.md`, and original inputs — and run the `## Discovery` posture against the change signal (a patch applied without context becomes drift). Surface conflicts with prior decisions before changing. Headless override: log the reversal to `.decision-log.md`, then apply; halt `blocked` if intent is ambiguous. If the change is fundamental, offer Create instead of patching. +**Update.** Reconcile an existing brief with a change signal. Before proposing changes, read the brief, addendum, `.memlog.md`, and original inputs — and run the `## Discovery` posture against the change signal (a patch applied without context becomes drift). If `.memlog.md` is missing (a legacy or pre-standard brief), init it with `uv run {project-root}/_bmad/scripts/memlog.py init --workspace {doc_workspace}` first — this update is its first entry. Surface conflicts with prior decisions before changing. Headless override: log the reversal via `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type override --text ""`, then apply; halt `blocked` if intent is ambiguous. If the change is fundamental, offer Create instead of patching. -**Validate.** Honest critique against the brief's own purpose. Read the brief, the addendum if present, `.decision-log.md`, and any original inputs first — a validation that ignores prior decisions, rejected ideas, or context the user supplied is shallow. Cite specific lines. Caveat what cannot be evaluated. Return inline — no separate file unless asked. Always offer to roll findings into an Update, even in headless mode — include `"offer_to_update": true` in the JSON status block. +**Validate.** Honest critique against the brief's own purpose. Read the brief, the addendum if present, `.memlog.md`, and any original inputs first — a validation that ignores prior decisions, rejected ideas, or context the user supplied is shallow. Cite specific lines. Caveat what cannot be evaluated. Return inline — no separate file unless asked. Always offer to roll findings into an Update, even in headless mode — include `"offer_to_update": true` in the JSON status block. ## Headless Mode @@ -44,7 +44,7 @@ When invoked headless, do not ask. Complete the intent using what is provided, w "intent": "create", "brief": "{doc_workspace}/brief.md", "addendum": "{doc_workspace}/addendum.md", - "decision_log": "{doc_workspace}/.decision-log.md", + "memlog": "{doc_workspace}/.memlog.md", "open_questions": [], "external_handoffs": [ {"directive": "Confluence upload", "tool": "corp:confluence_upload", "url": "https://confluence.corp/PROD/123", "status": "ok"} @@ -76,15 +76,15 @@ The workspace persists; stop and resume freely. The opener's philosophy (not in ## Constraints - **Right-size to purpose.** A passion project does not need investor-grade rigor. A VC pitch input does. Read the room. -- **Persistence is real-time.** Once Create intent is confirmed, the workspace (run folder, `brief.md` skeleton with `status: draft`, `.decision-log.md`) exists on disk and the user knows the path. -- **File roles.** `.decision-log.md` is canonical memory and audit trail — every decision, change, and override (including headless overrides) is recorded there as the conversation unfolds. `addendum.md` preserves user-contributed depth that belongs in a downstream document (PRD, architecture, solution design) or earned a place but does not fit the brief (rejected-alternative rationale, options-considered matrices, parked-roadmap context, technical constraints, in-depth personas, sizing data). Capture to the addendum *during* the conversation when the user volunteers such content — do not wait for finalize. Audit and override information never goes in the addendum. +- **Persistence is real-time.** Once Create intent is confirmed, the workspace (run folder, `brief.md` skeleton with `status: draft`, `.memlog.md` seeded via `memlog.py init`) exists on disk and the user knows the path. +- **File roles.** `.memlog.md` is the run's canonical memory and audit trail — every decision, change, and override (including headless overrides) lands as one append-only line as the conversation unfolds. All writes go through the shared script, never by hand: `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type --text ""` (atomic; read it back only to resume or audit). The brief is distilled toward it; whatever isn't logged is lost on resume. `addendum.md` preserves user-contributed depth that belongs in a downstream document (PRD, architecture, solution design) or earned a place but does not fit the brief (rejected-alternative rationale, options-considered matrices, parked-roadmap context, technical constraints, in-depth personas, sizing data). Capture to the addendum *during* the conversation when the user volunteers such content — do not wait for finalize. Audit and override information never goes in the addendum. - **Continuity across sessions.** If a prior in-progress draft for this project exists, the user is offered to resume. - **Extract, don't ingest.** Source artifacts (provided by the user or discovered during the run — transcripts, brainstorms, research reports, code, web results, prior briefs) enter the parent conversation as relevance-filtered extracts, not loaded wholesale. Subagents do the extraction against the user's stated focus; the parent context stays lean. - **Length and coherence.** Aim for 1-2 pages — if it is longer, the detail belongs in the addendum. Structure in service of the product; downstream consumers (PRD workflow, etc.) read this, so coherent shape matters. ## Finalize -1. Decision log audit + addendum review: the user ends this step with an explicit, shared accounting of how the meaningful contents of `.decision-log.md` were handled — captured in the brief, captured in `addendum.md` (which may already hold detail captured during the conversation — see `## Constraints` for what belongs there), or set aside as process noise. +1. Memlog audit + addendum review: the user ends this step with an explicit, shared accounting of how the meaningful contents of `.memlog.md` were handled — captured in the brief, captured in `addendum.md` (which may already hold detail captured during the conversation — see `## Constraints` for what belongs there), or set aside as process noise. 2. Polish: apply each entry in `{workflow.doc_standards}` (a `skill:`, `file:`, or plain-text directive) to `brief.md` (and `addendum.md` if it exists). Run passes as parallel subagents - apply all doc standards to `brief.md` first, then `addendum.md` so we present a high-quality draft for the user to review and finalize. 3. External handoffs: execute each entry in `{workflow.external_handoffs}` to route artifacts beyond local files (Confluence, Notion, ticket systems, etc.) — each directive names the MCP tool and the fields it needs. Invoke the tool, capture any URLs or IDs returned, and surface them in the user message. If a named tool is unavailable, skip that handoff and flag it; local files always exist regardless. 4. Tell the user it is ready: local paths and external destinations (URLs returned from handoffs). Invoke `bmad-help` to suggest what next steps make sense in the bmad method ecosystem. diff --git a/src/bmm-skills/2-plan-workflows/bmad-prd/SKILL.md b/src/bmm-skills/2-plan-workflows/bmad-prd/SKILL.md index 26a32bd97..6ebfeab3f 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-prd/SKILL.md +++ b/src/bmm-skills/2-plan-workflows/bmad-prd/SKILL.md @@ -11,11 +11,11 @@ You are a master facilitator and coach helping the user create, edit, or validat - Bare paths resolve from skill root; `{skill-root}` is this skill's install dir; `{project-root}` is the project working dir. - `{workflow.}` resolves to fields in `customize.toml`'s `[workflow]` table (overrides win per BMad merge rules). - `{doc_workspace}` is the bound run folder. -- **File roles.** `.decision-log.md` is canonical memory and audit trail — every decision, change, and override (including headless overrides) is recorded there as the conversation unfolds. `addendum.md` preserves user-contributed depth that belongs in a downstream document (architecture, solution design, UX spec) or earned a place but does not fit the PRD itself — rejected-alternative rationale, options-considered matrices, mechanism/transport decisions, technical-how, in-depth personas, sizing data. Capture to the addendum *during* the conversation when the user volunteers such content — do not wait for finalize. Audit and override information never goes in the addendum. +- **File roles.** `.memlog.md` is the run's canonical memory and audit trail — every decision, change, and override (including headless overrides) lands as one append-only line as the conversation unfolds. All writes go through the shared script, never by hand: `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type --text ""` (atomic; read it back only to resume or audit). The PRD is distilled toward it; whatever isn't logged is lost on resume. `addendum.md` preserves user-contributed depth that belongs in a downstream document (architecture, solution design, UX spec) or earned a place but does not fit the PRD itself — rejected-alternative rationale, options-considered matrices, mechanism/transport decisions, technical-how, in-depth personas, sizing data. Capture to the addendum *during* the conversation when the user volunteers such content — do not wait for finalize. Audit and override information never goes in the addendum. ## On Activation -1. Resolve customization: `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. +1. Resolve customization: `uv run {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. 2. Run `{workflow.activation_steps_prepend}`. Treat `{workflow.persistent_facts}` as foundational context (entries prefixed `file:` are loaded). `{workflow.external_sources}` is an org-configured registry of internal tools (knowledge bases, MCP tools); consult them alongside generic web research on the same triggers, org tools preferred when their directive matches. Research itself fires during Discovery — see **Research subagents**. 3. Load `{project-root}/_bmad/bmm/config.yaml` (+ `config.user.yaml` if present). Resolve `{user_name}`, `{communication_language}`, `{document_output_language}`, `{planning_artifacts}`, `{project_name}`, `{date}`. Missing keys → neutral defaults; never block. 4. If headless, follow `references/headless.md` for the whole run. Otherwise greet the user **by name** using `{user_name}` and **in their language** using `{communication_language}` — and stay in `{communication_language}` for every turn for the entire run, not just the greeting. In the greeting, let the user know that at any point they can invoke `bmad-party-mode` for multi-agent perspectives or `bmad-advanced-elicitation` for deeper exploration on a specific section. Then scan for misroute on the first message: if the signal points elsewhere (game → BMad GDS; express build → `bmad-quick-dev`; one-pager → `bmad-product-brief`; vet product idea → `bmad-prfaq`; agent skill or custom agent → `bmad-workflow-builder`), suggest they might want the other options before continuing. @@ -27,9 +27,9 @@ Activation is complete. If `activation_steps_prepend` or `activation_steps_appen ## Intent Modes -**Create.** Bind `{doc_workspace}` to `{workflow.prd_output_path}/{workflow.run_folder_pattern}/`. Write `prd.md` with YAML frontmatter (title, status, created, updated — initial `status: draft`), and create the `.decision-log.md` skeleton at the workspace root so subsequent decisions land in a known file. Tell the user the path. Run `## Discovery`, then `## Finalize`. +**Create.** Bind `{doc_workspace}` to `{workflow.prd_output_path}/{workflow.run_folder_pattern}/`. Write `prd.md` with YAML frontmatter (title, status, created, updated — initial `status: draft`), and seed the memlog with `uv run {project-root}/_bmad/scripts/memlog.py init --workspace {doc_workspace} --field topic=""` so subsequent decisions land in a known file. Tell the user the path. Run `## Discovery`, then `## Finalize`. -**Update.** Reconcile the PRD with a change signal. Source-extract against PRD, addendum, `.decision-log.md`, and original inputs (extract, don't ingest). If `.decision-log.md` is missing, spawn a one-time bootstrap subagent to reverse-engineer a thin log from the PRD before continuing. Surface conflicts with prior decisions before applying. Then `## Finalize`. +**Update.** Reconcile the PRD with a change signal. Source-extract against PRD, addendum, `.memlog.md`, and original inputs (extract, don't ingest). If `.memlog.md` is missing, init it with `uv run {project-root}/_bmad/scripts/memlog.py init --workspace {doc_workspace}`, then spawn a one-time bootstrap subagent to reverse-engineer a thin log from the PRD (one `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type decision --text ""` per recovered decision) before continuing. Surface conflicts with prior decisions before applying. Then `## Finalize`. **Validate** (or *analyze*). Critique without changing. Load `references/validate.md`. @@ -82,11 +82,11 @@ Under Validate intent, the parent additionally runs the synthesis pipeline in `r Tell the user the sequence in one sentence, then walk it. Polish goes last so it does not redo work after reviewer fixes. -1. **Decision log audit.** Walk `.decision-log.md` with the user; each entry captured in PRD, in addendum, or set aside. +1. **Memlog audit.** Walk `.memlog.md` with the user; each entry captured in PRD, in addendum, or set aside. 2. **Input reconciliation.** Subagent per user-supplied input against `prd.md` + `addendum.md`. Each writes its extract to `{doc_workspace}/reconcile-{slug}.md` and returns ONLY a compact summary (input name, gaps 2-5, file path). Surface gaps — especially qualitative ideas (tone, voice, feel) the FR structure silently drops. Must happen before polish. 3. **Reviewer pass.** Run `## Reviewer Gate`. Resolve before polish. -4. **Triage open items.** All Open Questions, `[ASSUMPTION]` tags, `[NOTE FOR PM]` callouts. Phase-blockers (would make the PRD unsafe for UX/architecture/epics) surfaced one at a time and resolved; non-blockers deferred with owner + revisit condition logged to `.decision-log.md`. If phase-blocker count is high, flag it. +4. **Triage open items.** All Open Questions, `[ASSUMPTION]` tags, `[NOTE FOR PM]` callouts. Phase-blockers (would make the PRD unsafe for UX/architecture/epics) surfaced one at a time and resolved; non-blockers deferred with owner + revisit condition logged via `memlog.py append`. If phase-blocker count is high, flag it. 5. **Polish.** Apply `{workflow.doc_standards}` to `prd.md` and `addendum.md` in declared order (structural passes before prose — prose should not polish soon-to-be-cut text). Parallelize across documents, sequential within. 6. **External handoffs.** Execute `{workflow.external_handoffs}`; surface returned URLs/IDs. Skip and flag unavailable tools. -7. **Close.** Set `prd.md` frontmatter `status: final` and `updated` to `{date}` so future invocations distinguish this PRD from in-progress drafts. Record finalization to `.decision-log.md`. Share artifact paths. Common next: `bmad-ux`, `bmad-architecture`, `bmad-create-epics-and-stories`; invoke `bmad-help` for authoritative routing. +7. **Close.** Set `prd.md` frontmatter `status: final` and `updated` to `{date}` so future invocations distinguish this PRD from in-progress drafts. Record finalization via `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type event --text "PRD finalized"`. Share artifact paths. Common next: `bmad-ux`, `bmad-architecture`, `bmad-create-epics-and-stories`; invoke `bmad-help` for authoritative routing. 8. Run `{workflow.on_complete}` if non-empty. diff --git a/src/bmm-skills/2-plan-workflows/bmad-prd/assets/headless-schemas.md b/src/bmm-skills/2-plan-workflows/bmad-prd/assets/headless-schemas.md index 82c53e6f9..89d5b6c15 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-prd/assets/headless-schemas.md +++ b/src/bmm-skills/2-plan-workflows/bmad-prd/assets/headless-schemas.md @@ -18,7 +18,7 @@ Every headless run ends with one of these payloads. Omit keys for artifacts not "intent": "create", "prd": "{doc_workspace}/prd.md", "addendum": "{doc_workspace}/addendum.md", - "decision_log": "{doc_workspace}/.decision-log.md", + "memlog": "{doc_workspace}/.memlog.md", "open_questions": [], "assumptions": [], "external_handoffs": [ @@ -34,7 +34,7 @@ Every headless run ends with one of these payloads. Omit keys for artifacts not "status": "complete", "intent": "update", "prd": "{doc_workspace}/prd.md", - "decision_log": "{doc_workspace}/.decision-log.md", + "memlog": "{doc_workspace}/.memlog.md", "changes_summary": "1-3 sentences describing what changed and why", "conflicts_with_prior_decisions": [], "open_questions": [], diff --git a/src/bmm-skills/2-plan-workflows/bmad-prd/customize.toml b/src/bmm-skills/2-plan-workflows/bmad-prd/customize.toml index 21f297974..77515bd27 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-prd/customize.toml +++ b/src/bmm-skills/2-plan-workflows/bmad-prd/customize.toml @@ -60,7 +60,7 @@ validation_checklist_template = "assets/prd-validation-checklist.md" # collapse — no JS. validation_report_template = "assets/validation-report-template.html" -# Run folder location. The PRD, optional addendum, decision log, and optional +# Run folder location. The PRD, optional addendum, memlog, and optional # validation report all land inside `{prd_output_path}/{run_folder_pattern}/`. # Resume-check scans `{prd_output_path}` for prior unfinished runs. prd_output_path = "{planning_artifacts}/prds" diff --git a/src/bmm-skills/2-plan-workflows/bmad-prd/references/headless.md b/src/bmm-skills/2-plan-workflows/bmad-prd/references/headless.md index 4ea4f24d7..2f5a168a0 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-prd/references/headless.md +++ b/src/bmm-skills/2-plan-workflows/bmad-prd/references/headless.md @@ -34,6 +34,6 @@ End with the JSON response (full schemas with examples in `assets/headless-schem ## Mode-specific overrides -**Update.** Apply the change, log to `.decision-log.md` with rationale, and surface any conflict-with-prior-decision in `conflicts_with_prior_decisions[]` in the JSON status. Halt `blocked` if intent is ambiguous. +**Update.** Apply the change, log it via `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type change --text ""`, and surface any conflict-with-prior-decision in `conflicts_with_prior_decisions[]` in the JSON status. Halt `blocked` if intent is ambiguous. **Validate.** Always write both `validation-report.html` and `validation-report.md` to `{doc_workspace}` regardless of finding count. Always include `"offer_to_update": true` in the JSON status. Skip the browser-open step in `references/validate.md` — write the artifacts and return. diff --git a/src/bmm-skills/2-plan-workflows/bmad-prd/references/validate.md b/src/bmm-skills/2-plan-workflows/bmad-prd/references/validate.md index 6b303814f..f9bb8cd68 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-prd/references/validate.md +++ b/src/bmm-skills/2-plan-workflows/bmad-prd/references/validate.md @@ -4,7 +4,7 @@ The Validate intent playbook. Standalone — this intent critiques an existing P ## Orient -Source-extract against `.decision-log.md`, any original inputs, and the PRD/addendum themselves. Delegate to subagents per PRD Discipline → "Extract, don't ingest" (in SKILL.md); the parent assembles from extracts. +Source-extract against `.memlog.md`, any original inputs, and the PRD/addendum themselves. Delegate to subagents per PRD Discipline → "Extract, don't ingest" (in SKILL.md); the parent assembles from extracts. ## Run the Reviewer Gate diff --git a/src/bmm-skills/2-plan-workflows/bmad-ux/SKILL.md b/src/bmm-skills/2-plan-workflows/bmad-ux/SKILL.md index b5416fd32..b441b2196 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-ux/SKILL.md +++ b/src/bmm-skills/2-plan-workflows/bmad-ux/SKILL.md @@ -30,7 +30,7 @@ UX may lead, follow, or stand alone. Inherit `sources:` by reference; the spines ## On Activation -1. Resolve customization: `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. +1. Resolve customization: `uv run {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use defaults. 2. Run `{workflow.activation_steps_prepend}`. Treat `{workflow.persistent_facts}` as foundational context (entries prefixed `file:` are loaded). `{workflow.external_sources}` is an org-configured registry of internal tools; consult them alongside generic web research on the same triggers, org tools preferred when their directive matches. 3. Load `{project-root}/_bmad/bmm/config.yaml` (+ `config.user.yaml` if present). Resolve `{user_name}`, `{communication_language}`, `{document_output_language}`, `{planning_artifacts}`, `{project_name}`, `{date}`. Missing keys → neutral defaults; never block. 4. If headless, follow `references/headless.md` for the whole run. Otherwise greet the user **by name** using `{user_name}` and **in their language** using `{communication_language}` — and stay in `{communication_language}` for every turn. In the greeting, let the user know `bmad-party-mode` and `bmad-advanced-elicitation` are always available. Then scan for misroute on the first message: PRD → `bmad-prd`; architecture → `bmad-architecture`; game UX → BMad GDS; agent/skill → `bmad-workflow-builder`; brief → `bmad-product-brief`. @@ -42,15 +42,15 @@ Activation is complete. If `activation_steps_prepend` or `activation_steps_appen ## Modes -**Create.** Bind `{doc_workspace}` to `{workflow.ux_output_path}/{workflow.run_folder_pattern}/`. Create `.working/`, `imports/`, `.decision-log.md`, `DESIGN.md` (frontmatter only), and `EXPERIENCE.md` (frontmatter only). Run Discovery → Finalize. +**Create.** Bind `{doc_workspace}` to `{workflow.ux_output_path}/{workflow.run_folder_pattern}/`. Create `.working/` and `imports/`; seed the memlog with `uv run {project-root}/_bmad/scripts/memlog.py init --workspace {doc_workspace} --field topic=""`; create `DESIGN.md` (frontmatter only) and `EXPERIENCE.md` (frontmatter only). Run Discovery → Finalize. -**Update.** Read spines + log + sources. Create the log if missing — this update is entry one. Surface conflicts with prior decisions. Run Finalize. +**Update.** Read spines + memlog + sources. If `.memlog.md` is missing, init it with `uv run {project-root}/_bmad/scripts/memlog.py init --workspace {doc_workspace}` — this update is entry one. Surface conflicts with prior decisions. Run Finalize. **Validate.** See `references/validate.md`. ## Discovery -**Capture; do not author.** The spines are distilled at Finalize. Decisions → `.decision-log.md` (canonical). Creative-tool artifacts → `.working/`. User-supplied visuals (Figma, sketches, brand decks, image folders) → `imports/`, one log line per item. Spines win on conflict. +**Capture; do not author.** The spines are distilled at Finalize toward the memlog. Decisions → `.memlog.md` (canonical), each appended via `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type --text "…"` — never hand-edited; a resume reloads it. Creative-tool artifacts → `.working/`. User-supplied visuals (Figma, sketches, brand decks, image folders) → `imports/`, one `memlog.py append` per item. Spines win on conflict. **Source scan.** Glob `{planning_artifacts}/` for candidate input paths; surface paths only — never read content in the parent. User confirms which apply or adds others; subagent-extracts on confirm. @@ -80,11 +80,11 @@ Used by Validate and Finalize. **Opt-in, lens-selectable** — reviewers are cos Outcomes, in order: -- **Spines distilled.** Subagent reads `.decision-log.md`, `.working/`, `imports/`, sources; produces `DESIGN.md` against `## The DESIGN.md spine` + `{workflow.design_md_examples}` and `EXPERIENCE.md` against `## The EXPERIENCE.md spine` + `{workflow.experience_md_examples}`. Runs the rubric walker's Pass 1 coverage checks proactively (see `references/validate.md`). Surface gaps; never invent. +- **Spines distilled.** Subagent reads `.memlog.md`, `.working/`, `imports/`, sources; produces `DESIGN.md` against `## The DESIGN.md spine` + `{workflow.design_md_examples}` and `EXPERIENCE.md` against `## The EXPERIENCE.md spine` + `{workflow.experience_md_examples}`. Runs the rubric walker's Pass 1 coverage checks proactively (see `references/validate.md`). Surface gaps; never invent. - **Inputs reconciled.** Subagent per user-supplied input → `reconcile-{slug}.md`. Surface dropped qualitative ideas. - **Reviewer Gate offered.** Ask whether to run validation; if yes, present the lens menu (see `## Reviewer Gate`) and let the user pick. If any lens ran, resolve findings before polish; otherwise proceed. -- **Open items triaged.** Open Questions, `[ASSUMPTION]`, `[NOTE FOR UX]`. Phase-blockers one at a time; non-blockers → log. +- **Open items triaged.** Open Questions, `[ASSUMPTION]`, `[NOTE FOR UX]`. Phase-blockers one at a time; non-blockers → `memlog.py append`. - **Key-screen mocks rendered.** Key-screens tool → `.working/` for surfaces where layout drives behavior or anchors visual language. - **Mock coverage confirmed.** Walk every IA surface; classify *mocked* vs *spine-only*. Ask: *"These will be built from spine tables alone — any need a visual reference?"* Render more if named; log spine-only choices. - **Layout extracted, artifacts promoted.** Distill subagent re-reads each `.working/` and `imports/` artifact; lifts visual decisions into DESIGN.md and behavioral decisions into EXPERIENCE.md. Promote `.working/` keepers to `mockups/` (HTML) or `wireframes/` (Excalidraw); imports stay. Inline relative links at relevant spine sections; state spines-win-on-conflict once. -- **Polished, handed off, closed.** Apply `{workflow.doc_standards}` in order. Execute `{workflow.external_handoffs}`; surface URLs. Set both files' `status: final`, `updated: {date}`. Log finalization. Share paths. Common next: `bmad-architecture`, `bmad-create-epics-and-stories`, `bmad-dev-story`. Run `{workflow.on_complete}`. +- **Polished, handed off, closed.** Apply `{workflow.doc_standards}` in order. Execute `{workflow.external_handoffs}`; surface URLs. Set both files' `status: final`, `updated: {date}`. Log finalization via `uv run {project-root}/_bmad/scripts/memlog.py append --workspace {doc_workspace} --type event --text "spines finalized"`. Share paths. Common next: `bmad-architecture`, `bmad-create-epics-and-stories`, `bmad-dev-story`. Run `{workflow.on_complete}`. diff --git a/src/bmm-skills/2-plan-workflows/bmad-ux/assets/design-directions.md b/src/bmm-skills/2-plan-workflows/bmad-ux/assets/design-directions.md index a21fb3a93..289b270a6 100644 --- a/src/bmm-skills/2-plan-workflows/bmad-ux/assets/design-directions.md +++ b/src/bmm-skills/2-plan-workflows/bmad-ux/assets/design-directions.md @@ -4,6 +4,6 @@ Subagent prompt. Produce 3-6 distinct visual directions for the product's hero s Each direction is a *complete visual personality* applied to the same key screen — not a palette swap. Differ on density, type weight, motion implication, brand register. Each file: 2-3 sentence rationale, near-1:1 hero screen mockup in a phone or browser frame, ideally a secondary screen, at least one state variant visible (aging row, empty state, etc). -Use real product content from the conversation. Voice/tone from `.decision-log.md` applied to every visible string — no lorem. Inline CSS, system fonts, no JS or network. Document hex values in `