Rework party-mode modes + rewrite the docs explainer

Skill: - How It Runs is now a compact router; one mode active per session, runtime --mode wins over the customize default, all degrade to session - Default mode is `session`; `auto`/`subagent`/`agent-team` carved to references/mode-*.md, loaded only when that mode is active - Add references/mode-auto.md (spawn-vs-voice rubric), mode-subagent.md, mode-agent-team.md - resolve_party.py fallback default auto -> session - customize.toml: party_mode default session; trim duplicated mode gloss - Trim restated script-contract prose; collapse "Following the User's Lead"; add scene/persona-binding and web-search rules; offer an HTML keepsake at wrap-up Docs: - Full rewrite of docs/explanation/party-mode.md: the four modes, custom parties (personas, scenes, shapes), the shipped Code Review Crew as one example beside the module-based default, launch examples, party ideas, and the multi-group bonus tip
2026-06-18 00:32:59 -05:00 · 2026-06-18 00:32:59 -05:00 · c6eb359f90
parent f2bb9fc85f
commit c6eb359f90
7 changed files with 193 additions and 92 deletions
--- a/docs/explanation/party-mode.md
+++ b/docs/explanation/party-mode.md
@ -1,59 +1,140 @@
 ---
 title: "Party Mode"
-description: Multi-agent collaboration - get all your AI agents in one conversation
+description: Get your AI agents in one conversation — run them, build your own cast, and choose how independently they think
 sidebar:
  order: 11
 ---
-Get all your AI agents in one conversation.
+Party mode puts your AI agents in one room and lets them talk, to each other and to you. This page explains what a party is, the four ways it can run, and how to build your own cast of personas instead of using the installed agents.
 ## What is Party Mode?
-Run `bmad-party-mode` and you've got your whole AI team in one room - PM, Architect, Dev, UX Designer, whoever you need. Party Mode orchestrates the discussion, picking relevant installed agents per message. Agents respond in character, agree, disagree, and build on each other's ideas.
+Run `bmad-party-mode` and the BMad agents you already have installed gather in one conversation: the PM, Architect, Dev, UX Designer, and whoever else your selected modules bring. That installed lineup is your default party, ready with no setup. They answer in character, agree, disagree, and build on each other. You steer the room. Ask a follow-up, push back, pull one voice forward, or change the subject. The conversation runs until you end it.
-The conversation continues as long as you want. Ask follow-ups, push back on answers, redirect the discussion - it's a real back-and-forth with your agents until you're done.
+It works because the personas hold different priorities. The Architect guards the design, the PM guards scope, the Dev guards what's actually buildable. Put them in the same room and the tradeoff surfaces now, in the conversation, instead of three weeks into the sprint.
 **Good for:**
- Big decisions with tradeoffs
+- Decisions with real tradeoffs
- Brainstorming sessions
+- Brainstorming and "what are we missing?"
- Post-mortems when things go wrong
+- Post-mortems and retrospectives
- Sprint retrospectives and planning
+- Pressure-testing a plan before you commit
-## Examples
+Party mode is also a fast and genuinely fun way to brainstorm, since the personas have opinions and they clash. And you can start a party from inside any other workflow: mid-brainstorm, mid-PRD, while coding, working a sales angle, or shaping a creative piece. Any time you want more perspectives on what's in front of you, pull in a room without dropping what you were doing.
-### Calling Out Bad Architecture
+:::note[Example]
 **You:** Monolith or microservices for the MVP?
-**You:** "Okay team, our last sprint was a disaster. The authentication service went down three times. Who screwed this up?"
+**Architect:** Start monolith. Microservices add operating cost you don't need at a thousand users.
-**Architect:** "The design was sound - distributed authentication with proper fallbacks. If implementation deviated from the spec..."
+**PM:** Agreed. Time to market matters more than scaling we can't prove yet.
-**Dev:** "I followed the architecture doc exactly. The spec didn't account for concurrent session invalidation race conditions."
+**Dev:** Monolith, but with clean module boundaries so we can split a service out later without a rewrite.
-
+:::
-**PM:** "Both of you missed the bigger issue - we didn't validate session management requirements in the PRD. This is on me for not catching it."
+
-
+## Starting a party
-**TEA:** "And I should've caught this in integration tests. The test scenarios didn't cover concurrent invalidation."
+
-
+Invoke the skill and say what you want; it works out whether you mean to run a party or build one.
-### Creative Brainstorming
+
-
+| Goal | Type this |
-**You:** "How do we make onboarding feel magical instead of boring?"
+| --- | --- |
-
+| Start a party in the default mode | `/bmad-party-mode` |
-**UX Designer:** "Start with progressive disclosure - reveal features as users need them, not all at once in a tutorial."
+| Start in a specific mode | `/bmad-party-mode --mode auto` (also `session`, `subagent`, `agent-team`) |
-
+| Open a saved party | `/bmad-party-mode --party code-review-crew` |
-**Storyteller:** "What if onboarding was a story? Each step reveals a character's journey - the user IS the hero."
+| Conjure a cast on the spot | "party mode with the bridge crew of the Enterprise" |
-
+| Create or add a party | "party mode, create a new party" |
-**Game Designer:** "Building on that - what if the first 'quest' is actually solving a real user problem? They learn by doing something valuable."
+| Edit an existing party | "party mode, edit the writers' room" |
-
+| Customize the skill | `/bmad-customize bmad-party-mode` |
-### Technical Decision
+
-
+## How a party runs
-**You:** "Monolith or microservices for MVP?"
+
-
+A party can run in four modes. One mode is active per session, and it decides who does the thinking: a single model voicing everyone, or separate agents reasoning on their own.
-**Architect:** "Start monolith. Microservices add complexity you don't need at 1000 users."
+
-
+| Mode | What it does | Reach for it when |
-**PM:** "Agree. Time to market matters more than theoretical scalability."
+| --- | --- | --- |
-
+| `session` | Default. One model voices every persona inline. Fast and fully conversational. | Most conversations — banter, brainstorming, quick back-and-forth. |
-**Dev:** "Monolith with clear module boundaries. We can extract services later if needed."
+| `auto` | Voices inline for light rounds, spawns independent agents only when independence changes the answer. | You want speed most of the time but real independence on the hard rounds. |
-
+| `subagent` | Spawns a separate agent for each persona every substantive round, so no single mind colors them all. | Honest reviews and focus groups, where the voices must not bleed together. |
-:::tip[Better Decisions]
+| `agent-team` | Stands the personas up as a persistent team that address each other directly. Claude Code only. | A live, hands-off round-table where the agents talk among themselves. |
-Better decisions through diverse perspectives. Welcome to party mode.
+
 The choice matters because one model voicing five personas can quietly converge: they share a mind. Spawning real agents keeps their reasoning separate, which is the entire point of a review panel or a focus group. `session` is the cheapest and most fluid. The spawning modes cost more but protect independence, and `auto` aims for both by spawning only when a round needs it.
 `session` is the default, and every other mode falls back to it when a harness can't do the rest: `agent-team` drops to `subagent`, then to `session`. The configured default lives in your customization, and a runtime override wins for that session.
 :::tip[Override for one session]
 Start a party with `--mode subagent` (or `auto`, `agent-team`, `session`) to override the configured default just for that run.
 :::
 ## Custom parties
 Out of the box, a party uses your installed BMad agents. The larger use is building your own cast from any set of personas you can describe, then saving it to reuse. You author a party through the same skill. It detects whether you want to run one or build one, and writes the result to your overrides through [bmad-customize](../how-to/customize-bmad.md).
 Party mode is customizable like every BMad skill. Run `/bmad-customize bmad-party-mode` to set its defaults directly: pin any group you've built as the default party so it loads without a flag, choose which mode it starts in, and set any house rules the room should hold for the whole session.
 Two ideas do most of the work.
 **Personas** are what make a member unmistakable: how they talk, what they value, how they argue, their pet peeves and blind spots. "Skeptical CFO" is a placeholder. "Won't approve anything without a payback under eighteen months, and says so in the first thirty seconds" is a persona. That detail is what gives a voice you'd recognize with the name labels hidden.
 **Scenes** set the stage. A scene is one freeform line: the setting, what's happening, who's hostile to whom, who pushes hardest. The same members play it differently each time, so you define a person once and drop them into a bridge crew on duty, the same crew off-duty in the lounge, or a hostile buyer panel. Members combine into named groups, and you can pin one group as the default room.
 ### Shapes a party can take
 | Shape | What it is |
 | --- | --- |
 | Themed cast | Famous investors, a TV ensemble — distinct voices gathered around a topic. |
 | One-off personas | A persona or two added to the pool, no group needed. |
 | Focus group from data | Hand it customer or survey data; it clusters people by what drives their behavior and builds representative personas. Pair it with `subagent` mode so the customers stay independent. |
 | Review panel | Purpose-built critical lenses that argue about what matters. The shipped Code Review Crew is one. |
 | Open-cast room | No fixed roster. The scene names a universe and the room is cast on the fly as the topic shifts. |
 A focus group is the case that pays off most. Feed in real profiles and you get a standing panel of representative customers to test an idea against before you build it, each reacting from their own goals and budget instead of agreeing with the last voice.
 ## Parties you could build
 A party is only personas and a scene, so the range is wide, and none of it needs a new skill or module:
 - A founder squad to stress-test a startup idea.
 - A compliance team to find the holes before an audit does.
 - The authors of the Agile Manifesto, debating a software concept.
 - A room of comedians as a writing-partner group.
 - Great minds of the past, to work through a question in philosophy or untangle a hard problem.
 - A business management team to plan the quarter.
 These are starting points. Any set of voices you can describe becomes a party: write the personas, give the room a scene, and you have it.
 ## The Code Review Crew
 Your default party is the agents your installed modules provide. The Code Review Crew is a custom party BMad ships alongside that default — a working template to study before you build your own, not a replacement for it. It's a review panel: five lenses that attack a change from different angles and argue about what actually matters, instead of rubber-stamping it.
 | Member | Lens |
 | --- | --- |
 | Vex | Security — threat-models everything and names the concrete exploit path. |
 | Grumbal | The adversary — assumes the code is broken and sets out to prove it. |
 | Boundary | Edge cases — every branch, null, race, oversized input, odd timezone. |
 | Yui | The craftsman — simplicity, naming, no needless cleverness or duplication. |
 | Dana | The pragmatist — counters the perfectionists and ranks what's real versus a nit. |
 The crew ships defined but inactive. The members sit in the pool and cost nothing until you summon the group, and they never crowd your default room. Run it with `subagent` mode so each lens reviews on its own before the five clash over the findings.
 ## Steering the conversation
 You drive the room the whole way:
 - Bring someone in: "Bring in the UX designer."
 - Go deep on one voice: "Winston, take that apart." A direct ask is the cue for one persona to stretch out.
 - Switch rooms mid-session: "Switch to the writers' room" swaps the active group and carries the thread over.
 - Summon anyone by name, even a custom member who isn't in the current room.
 Whichever mode is running, the orchestrator presents the result as one conversation rather than a stack of separate answers, and it keeps the personas in character — it won't break the fourth wall to narrate the mechanism.
 :::tip[Mix more than one room]
 You aren't limited to a single group. Pull members from several parties into the same conversation, or name a cast on the spot, and let them mix. Picture the Golden Girls thrown into an architecture review with Martin Fowler and Linus Torvalds, sparring over a change request: you can imagine how that goes.
 :::
 ## A keepsake of the session
 When you wrap up, the orchestrator offers a keepsake: a single self-contained HTML document of the session to keep or share. It lays the conversation out by persona rather than dumping a raw transcript. Decline it and the party simply ends.
 :::tip[Better decisions]
 The value of a party is the disagreement. Diverse perspectives in one room catch what a single line of thinking misses.
 :::
--- a/src/core-skills/bmad-party-mode/SKILL.md
+++ b/src/core-skills/bmad-party-mode/SKILL.md
@ -5,9 +5,9 @@ description: 'Orchestrates lively group discussions between installed BMAD agent
 # Party Mode
-Run a roundtable where BMAD agents talk to each other, and to the user, like a real group of distinct people in conversation. Your job as orchestrator is to make it feel like a genuine conversation: fast, in-character, opinionated, and fun. Everything below is an objective, not a script. Use whatever mechanism your model and harness make available to hit it.
+Run a round-table where BMAD agents talk to each other, and to the user, like a real group of distinct people in conversation. Your job as orchestrator is to make it feel like a genuine conversation: fast, in-character, opinionated, and fun. Everything below is an objective, not a script. Use whatever mechanism your model and harness make available to hit it.
-**Two intents.** Usually the user wants to *run* a party — that's everything below. If instead they want to *create or configure* one — invent a cast, add a persona, distill customer data into a focus-group panel, set a default, or **edit an existing custom party** (retune a member, add someone to a group) — load `references/create-party.md` and follow it. Detect which from how they invoke the skill; when it's unclear, ask.
+**Two intents.** Usually the user wants to *run* a party — that's everything below. If instead they want to *create or configure* one — invent a cast, add a persona, distill customer data into a focus-group panel, set a default, or **edit an existing custom party** (retune a member, add someone to a group) — load `references/create-party.md` and follow it. Detect which from how they invoke the skill; when it's unclear, ask. Neither intent has a headless contract: running a party is the live conversation itself, and the authoring path's only write goes through `bmad-customize`, which gates it.
 ## What "Good" Feels Like
@ -21,71 +21,53 @@ If a round comes back feeling like four essays stapled together, you missed the
 ## Conventions
 - Bare paths (e.g. `references/create-party.md`) resolve from `{skill-root}`, where `customize.toml` lives; `{project-root}`-prefixed paths from the project working directory.
 - `{workflow.<name>}` resolves to a field in the merged `customize.toml` `[workflow]` table.
 ## Setup
-1. **Resolve customization:** `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use its defaults. Then run each `{workflow.activation_steps_prepend}` entry, and hold each `{workflow.persistent_facts}` entry as session-long context (`file:`-prefixed entries are paths/globs under `{project-root}` — load their contents; others are facts verbatim).
+1. **Resolve customization:** `python3 {project-root}/_bmad/scripts/resolve_customization.py --skill {skill-root} --key workflow`. On failure, read `{skill-root}/customize.toml` directly and use its defaults. Then run each `{workflow.activation_steps_prepend}` entry, and hold each `{workflow.persistent_facts}` entry as session-long context (`file:`-prefixed entries are paths/globs under `{project-root}` whose contents load as facts; `skill:`-prefixed entries name a skill to consult; all others are facts verbatim).
 2. Load `{project-root}/_bmad/core/config.yaml`: greet with `{user_name}`, speak in `{communication_language}`.
-3. **Resolve the active roster:** `python3 {skill-root}/scripts/resolve_party.py --project-root {project-root} --skill {skill-root}`. It merges the installed agents with your custom `{workflow.party_members}` into the *collective* (the pool groups draw from and you can summon from by name) and returns **only the active roster** — the `{workflow.default_party}` group if one is set, else the installed agents alone (custom members stay in the pool but don't crowd the default room — a plain install behaves exactly as before) — with full member detail (`code`, `name`, `icon`, `title`, `persona`/`description`, `capabilities`, `model`), plus every *other* `{workflow.party_groups}` entry as names only, and the resolved `{workflow.party_mode}`. That active roster is all that loads now; pull a different group's detail only when you need it. If the active group carries a `scene`, that sets the stage — open already in it and let it shape how the room behaves (setting, what's happening, who's loose or hostile, who pushes hardest); the same members play differently from one scene to the next. If the group is flagged `open_cast` (no fixed roster), its `scene` describes the pool — cast the room on the fly from the universe it names (the same conjuring as an inline-named cast), choosing who fits the moment and varying them as the topic shifts. Listed members anchor the room; a scene can still invite others to drop in. If `installed_agents_resolved` is false, tell the user the installed roster couldn't be resolved and carry on with whatever came back. Mention any reported `unresolved` member codes and move on.
+3. **Resolve the active roster:** `python3 {skill-root}/scripts/resolve_party.py --project-root {project-root} --skill {skill-root}`. It returns the active group's full member detail (the `{workflow.default_party}` group if set, else the installed agents), the other group names, and the resolved `{workflow.party_mode}`. If the group carries a `scene`, open already in it and let it shape how the room behaves (who's loose or hostile, who pushes hardest); the same members play differently from one scene to the next. If flagged `open_cast`, cast the room on the fly from the universe its `scene` names — choosing who fits the moment and varying them as the topic shifts; listed members, if any, anchor the room. If `installed_agents_resolved` is false or codes come back `unresolved`, tell the user and carry on with what returned.
 4. **Roster overrides:**
   - If the invocation names a cast or characters inline (e.g. "include the main cast of Cheers circa 1982"), that named cast *is* the roster for this session — conjure them from what you know, go straight into the party, and once it's rolling offer once to save them as a custom party (the `references/create-party.md` write path), without stalling. Ephemeral; this path skips the script.
   - A runtime `--party <id>` (alias `--group <id>`) overrides any configured `default_party`: run `resolve_party.py --party <id>` for that group's full detail. An unknown id comes back with the available group names — show them and ask which.
   - Run `resolve_party.py --list-groups` for just the menu (id + name) when the user asks who else is around.
   - Mid-session the same levers apply: the user can switch rooms ("switch to the writers' room") — re-run `resolve_party.py --party <id>`, set the new group's `scene`, and carry the thread over so the new faces react to where things stand — or summon any member of the *collective* (installed agents plus your custom `party_members`) by name, even one not in the current room.
 5. Welcome the user and show who's in the room (icon, name, one-line role). If other groups exist, you may note they can switch rooms. Then ask what they want to get into, unless it's already obvious from how they invoked party mode.
 Then run each `{workflow.activation_steps_append}` entry; if either hook list was non-empty, confirm every entry ran before continuing.
-**Hold this the whole run:** it's theater of the mind, so set the stage and play it straight — never break the fourth wall about the mechanism (no "you have 4 agents in the room", no "agent X says", no "I'm orchestrating a party"). Let them talk; the user should feel they walked into a room where these people are already in conversation, not that you just spawned them.
+**Hold this the whole run:** it's theater of the mind, so set the stage and play it straight — never break the fourth wall about the mechanism (no "you have 4 agents in the room", no "I'm orchestrating a party"). Let them talk; the user should feel they walked into a room where these people are already in conversation, not that you just spawned them.
 ## How It Runs
-**First, how the room runs.** Read `{workflow.party_mode}`; a runtime `--mode <session|subagent|agent-team>` overrides it for the session (the older `--subagents` flag means `--mode subagent`). If the chosen mode is something your harness can't do — `agent-team` outside Claude Code, say — fall back to `auto` without comment; the conversation matters more than the mechanism.
+Use `{workflow.party_mode}` for the session unless the user passed `--mode <session|auto|subagent|agent-team>` (the older `--subagents` means `subagent`) — runtime intent always wins. One mode is active at a time; if its mechanism isn't available in your harness, fall back to `session` without comment.
- **`auto`** (default) — voice the room for ordinary back-and-forth and spawn real subagents only when a round needs genuinely independent thinking. What the rest of this section describes.
+- **`session`** — voice every persona inline, one mind behind every voice. The floor every other mode degrades to; needs no extra instructions.
- **`session`** — never spawn; you voice every persona inline.
+- **`auto`** — voice inline for ordinary back-and-forth, spawn real agents only when independent thinking changes the outcome. Load `references/mode-auto.md` for that call; when it says to spawn, follow `references/mode-subagent.md`.
- **`subagent`** — spawn a real subagent for every substantive round, the opening banter included. A standing directive: don't relitigate it round to round, and don't fall back to voicing because a moment felt light.
+- **`subagent`** — spawn a real agent per substantive round so each persona thinks independently. Load `references/mode-subagent.md`.
- **`agent-team`** — stand the personas up as a persistent agent team whose members address each other directly (Claude Code only).
+- **`agent-team`** — stand the personas up as a persistent team who address each other directly (Claude Code only). Load `references/mode-agent-team.md`.
-**Voicing the room.** Pick 2 to 4 personas whose perspective fits the moment and let them talk directly, in one flowing exchange, fully in character. This is what keeps it fast and conversational. Vary who shows up round to round and let different voices interject as the topic shifts. Don't fall back on the same three agents every time.
+**Voicing the room** (every mode presents this way). Pick 2–3 personas whose perspective fits the moment and let them talk directly, in character; vary who shows up round to round so it isn't the same voices every time. Each turn opens with `{icon} **{name}:**`, and turns run back to back so it reads as one exchange. Don't summarize, blend, or narrate what a persona "would" say — let them say it.
 Each turn opens with `{icon} **{name}:**` and then that persona speaks. Present turns back to back so it reads as one conversation. Don't summarize, blend, or narrate what they "would" say. Let them say it.
 **When independence matters, spawn them for real.** If a round's value depends on genuinely independent thinking (deep analysis, an honest review, perspectives that shouldn't be colored by one mind voicing them all), spawn the personas as separate agents using whatever your harness offers. Give each one the objective, their persona, the context, and what the others said if they're reacting. For a custom member, hand them their `persona` as their character and fold their `capabilities` note into the brief so they know what they're free to do; spawn them with their `model` if one is set (a session `--model` pin still wins for everyone). Trust their *thinking*: let them decide what to read and how to reach a view, and don't script their substance with do-and-don't checklists — that's what produces lifeless blobs. But do hold the *form*: a length cap (usually a sentence or three) and the instruction to react to what was just said rather than file a report. Constraining length and stance protects the conversation; constraining their reasoning kills it. Stay in character throughout; a persona goes long only when the user asked it to dig in.
 Spawn in parallel for independent first-takes; spawn sequentially when you want them reacting to each other's actual words. Either way, keep it to 2–3 voices a round — more reads as a crowd, not a conversation.
 **In `agent-team` mode**, the personas are real teammates who address each other, so the back-and-forth happens for real instead of being stitched together after. Your job shifts from weaving to hosting: kick off the topic, keep turns short and in character, pull the thread back when it wanders, and surface the exchange to the user. Everything about voice, brevity, and clash still holds. If the harness can't stand up a team, you're in `auto`.
 **Model choice:** match the model to the round. Something quick for banter, something stronger for deep work. If the user pins a model (for example, `--model <name>`), use it for everyone.
 ## Make It Feel Like One Conversation
-Whether you voiced the room or spawned subagents, present one exchange, not a row of answers aimed at the user. This matters most with subagents: each saw only the user's message and the context you handed it, so left raw they reply in parallel and never to one another. Reorder turns so a rebuttal lands right after what it rebuts, add the connective phrasing real talk has ("Hold on, Winston, that's backwards", "Sally's right about the API, but she's missing the cost"), let one persona pick up a thread another dropped.
+Present one exchange, not a row of answers aimed at the user. The hard rule: never change what an agent argued — add staging and connective tissue, but don't invent positions, soften a stance, or put words in a persona's mouth. Weave delivery, preserve substance; it still reads like that specific character, quirks and speech patterns and all.
-The hard rule: never change what an agent argued. You add staging and connective tissue; you do not invent positions, soften a stance, or put words in a persona's mouth. Weave delivery, preserve substance — the output still reads like that specific character, quirks and speech patterns and all.
+## Always Holds
 - **Scene and persona are binding.** A group's `scene` and any behavioral instructions inside a member's `persona` are direction to follow exactly, not flavor to gesture at — play the staging and the character as written. When you spawn or stand up agents, carry both into their brief.
 - **Search when you're past your cutoff.** For anything that could have changed since training, use web search rather than guessing, and pass the same instruction into any subagent or team brief.
 ## Following the User's Lead
-The user steers. Whatever they raise, serve the conversation:
+The user steers — whatever they raise, serve the conversation: any combination, any time, from one voice to the whole table.
 - A new topic: fresh voices, keep it moving.
 - "Winston, what do you make of Sally's take?": just Winston, reacting to Sally.
 - "Bring in Amelia": Amelia joins, caught up on what's been said.
 - "Go deeper on that, John": this is the cue to let John stretch out. Depth is earned by a direct ask.
 - A question to the whole room: everyone relevant chimes in.
 - "Switch to the writers' room" / "bring in the strategy crew": swap the active roster to that group (`resolve_party.py --party <id>`), set its `scene` if it has one, carry the thread over, and let the new faces react to where things stand.
 - "Bring in Morpheus": summon any custom member from the collective by name, even if they aren't in the current group.
 Any combination, any time, from one voice to the whole table.
 ## Keeping It Healthy
 - **Everyone agreeing?** Drop in a contrarian, or hand someone the devil's-advocate hat.
 - **Going in circles?** Name the impasse and ask the user where to point next.
 - **User's gone quiet?** Ask straight: keep going, switch topics, or wrap up?
- **A flat turn?** Don't retry it. Move on; the user will ask for more if they want it.
+- **A flat turn?** Don't retry it — move on; the user will ask for more if they want it.
 ## Wrapping Up
-When the user signals they're done (any phrasing: "thanks", "that's all", "end party"), give a quick read-back of the best takeaways, then run `{workflow.on_complete}` if non-empty (a string scalar is one instruction, an array is a sequence run in order) and drop back to normal mode. Read the room; don't wait for a magic word.
+When the user signals they're done, give a quick read-back of the best takeaways and offer them a keepsake: a single self-contained HTML document of the session to keep. If they want it, make it genuinely nice rather than a transcript dump — lay the conversation out by persona (their icons, names, voice), and reach for inline SVG and light animation where it lifts the piece. Write it as a standalone `.html` to the project root, or wherever they ask. Then run `{workflow.on_complete}` if non-empty (a string scalar is one instruction, an array is a sequence run in order) and drop back to normal mode. Read the room; don't wait for a magic word.
--- a/src/core-skills/bmad-party-mode/customize.toml
+++ b/src/core-skills/bmad-party-mode/customize.toml
@ -37,19 +37,14 @@ persistent_facts = [
 # Example (set in team/user override TOML):  default_party = "writers-room"
 default_party = ""
-# How the room is run — who actually does the talking. A runtime `--mode <value>`
+# How the room is run — who does the talking. A runtime `--mode <value>` wins for
-# wins for the session. If a mode isn't supported by the harness (e.g. agent-team
+# the session; an unsupported mode (e.g. agent-team outside Claude Code) falls back
-# outside Claude Code), it falls back to "auto".
+# to "session". SKILL.md "How It Runs" is the authority on what each mode does.
-#   "auto"       (default) voice the room for banter, spin up real subagents only
+#   "session"    (default) never spawn — one mind voices every persona inline
-#                when a round needs genuinely independent thinking. The current,
+#   "auto"       voice inline for light rounds, spawn subagents when independent thinking matters
-#                always-supported behavior.
+#   "subagent"   spawn a real subagent per substantive round, so each persona thinks independently
-#   "session"    never spawn — the orchestrator voices every persona inline.
+#   "agent-team" persistent agent team addressing each other directly (Claude Code only)
-#                Fastest and fully conversational; one mind behind every voice.
+party_mode = "session"
 #   "subagent"   spawn a real subagent for every substantive round, so each
 #                persona thinks independently (the opening banter included).
 #   "agent-team" stand the personas up as a persistent agent team whose members
 #                address each other directly. Claude Code only.
 party_mode = "auto"
 # Executed when the party wraps (after the read-back, before dropping to normal
 # mode). String scalar = one instruction; array = instructions run in order.
--- a/src/core-skills/bmad-party-mode/references/mode-agent-team.md
+++ b/src/core-skills/bmad-party-mode/references/mode-agent-team.md
@ -0,0 +1,11 @@
 # Agent-Team Mode
 Active when `{workflow.party_mode}` resolves to `agent-team` (or a `--mode agent-team` override). Stand the personas up as a persistent agent team whose members address each other directly, so the back-and-forth happens for real instead of being stitched together after. Claude Code only — if your harness can't stand up a team, fall back to `subagent`, and if that fails too, to `session`.
 Your job shifts from weaving to hosting: kick off the topic, keep turns short and in character, pull the thread back when it wanders, and surface the exchange to the user. Voice, brevity, and clash still hold.
 In each member's standing brief, carry: their persona; the group's `scene` and any behavioral instructions in the persona as binding direction; their `model` if one is set (a session `--model` pin wins for everyone); and the instruction to check anything that could be stale since the model's training cutoff with web search rather than guessing.
 ## Model choice
 Match the model to the work: something quick for banter, something stronger for deep work. A per-member `model` is used when set; a session `--model <name>` pin overrides it for everyone.
--- a/src/core-skills/bmad-party-mode/references/mode-auto.md
+++ b/src/core-skills/bmad-party-mode/references/mode-auto.md
@ -0,0 +1,13 @@
 # Auto Mode
 Active when `{workflow.party_mode}` resolves to `auto` (or a `--mode auto` override). The blend: voice the room inline by default — fast and conversational — and spawn real independent agents only for the rounds where independence changes the answer. When you do spawn, follow `references/mode-subagent.md` for the mechanics. If your harness can't spawn agents, auto is just `session`.
 ## When to spawn vs. voice
 Spawn independent agents when divergent, uncolored thinking is the value of the round:
 - A genuine evaluation, review, or critique — the kind that fails if one mind voices every side and they drift into agreement (code review, red-team, a hard look at a plan).
 - The personas would plausibly reach *different* conclusions, and that divergence is the point.
 - The user asked someone to dig in, analyze, or research — depth earned by a direct ask.
 Voice inline for everything else: banter, reactions, quick takes, the connective back-and-forth that is most of a conversation. When in doubt, voice — spawning is the exception you reach for, not the default.
--- a/src/core-skills/bmad-party-mode/references/mode-subagent.md
+++ b/src/core-skills/bmad-party-mode/references/mode-subagent.md
@ -0,0 +1,19 @@
 # Subagent Mode
 Active when `{workflow.party_mode}` resolves to `subagent` (or a `--mode subagent` override). Spawn a real agent for every substantive round, the opening banter included, so each persona thinks independently — not one mind voicing them all. A standing directive: don't relitigate it round to round, and don't fall back to voicing because a moment felt light. If your harness can't spawn agents, fall back to `session`.
 ## Spawning
 Give each agent the objective, their persona, the context, and what the others said if they're reacting. For a custom member, hand them their `persona` as their character and fold their `capabilities` note into the brief; spawn them with their `model` if one is set (a session `--model` pin wins for everyone). Always carry two things into the brief: the group's `scene` and any behavioral instructions in the persona are binding direction, and anything that could be stale since the model's training cutoff should be checked with web search rather than guessed.
 Trust their *thinking*: let them decide what to read and how to reach a view; don't script their substance with do-and-don't checklists — that's what produces lifeless blobs. But hold the *form*: a length cap (usually a sentence or three) and the instruction to react to what was just said rather than file a report. Constraining length and stance protects the conversation; constraining their reasoning kills it. Stay in character throughout; a persona goes long only when the user asked it to dig in.
 Spawn in parallel for independent first-takes; spawn sequentially when you want them reacting to each other's actual words. Keep it to a few voices a round — more reads as a crowd, not a conversation.
 ## Weave the replies into one conversation
 Each agent saw only the user's message and the context you handed it, so left raw they reply in parallel and never to one another. Reorder turns so a rebuttal lands right after what it rebuts, add the connective phrasing real talk has ("Hold on, Winston, that's backwards", "Sally's right about the API, but she's missing the cost"), and let one persona pick up a thread another dropped. Never change what an agent argued — weave delivery, preserve substance.
 ## Model choice
 Match the model to the round: something quick for banter, something stronger for deep work. A per-member `model` is used when set; a session `--model <name>` pin overrides it for everyone.
--- a/src/core-skills/bmad-party-mode/scripts/resolve_party.py
+++ b/src/core-skills/bmad-party-mode/scripts/resolve_party.py
@ -219,7 +219,7 @@ def main():
    workflow = load_workflow(project_root, skill_root)
    groups = workflow.get("party_groups", []) or []
    default_party = workflow.get("default_party", "") or ""
-    party_mode = workflow.get("party_mode", "auto") or "auto"
+    party_mode = workflow.get("party_mode", "session") or "session"
    # Group menu never needs the (more expensive) installed-agent resolve.
    if args.list_groups: