USP

Unlike traditional development that accumulates technical debt, Compound Engineering actively inverts this by codifying learnings and leveraging AI for thorough planning and review, ensuring each work unit makes subsequent ones easier.

Use cases

01Feature planning and ideation
02Automated code and doc review
03Systematic bug debugging
04Knowledge codification and learning
05Product strategy and performance reporting

Detected files (8)

plugins/compound-engineering/skills/ce-agent-native-audit/SKILL.mdskill

Show content (8394 bytes)

---
name: ce-agent-native-audit
description: Run comprehensive agent-native architecture review with scored principles
argument-hint: "[optional: specific principle to audit]"
disable-model-invocation: true
---

# Agent-Native Architecture Audit

Conduct a comprehensive review of the codebase against agent-native architecture principles, launching parallel sub-agents for each principle and producing a scored report.

## Core Principles to Audit

1. **Action Parity** - "Whatever the user can do, the agent can do"
2. **Tools as Primitives** - "Tools provide capability, not behavior"
3. **Context Injection** - "System prompt includes dynamic context about app state"
4. **Shared Workspace** - "Agent and user work in the same data space"
5. **CRUD Completeness** - "Every entity has full CRUD (Create, Read, Update, Delete)"
6. **UI Integration** - "Agent actions immediately reflected in UI"
7. **Capability Discovery** - "Users can discover what the agent can do"
8. **Prompt-Native Features** - "Features are prompts defining outcomes, not code"

## Workflow

### Step 1: Load the Agent-Native Skill

First, invoke the agent-native-architecture skill to understand all principles:

```
/ce-agent-native-architecture
```

Select option 7 (action parity) to load the full reference material.

### Step 2: Launch Parallel Sub-Agents

Launch 8 parallel sub-agents using the platform's subagent primitive (`Agent` with `subagent_type: Explore` in Claude Code, `spawn_agent` with `agent_type: "explorer"` in Codex, `subagent` with `agent: "scout"` in Pi via the `pi-subagents` extension), one for each principle. Each agent should:

1. Enumerate ALL instances in the codebase (user actions, tools, contexts, data stores, etc.)
2. Check compliance against the principle
3. Provide a SPECIFIC SCORE like "X out of Y (percentage%)"
4. List specific gaps and recommendations

<sub-agents>

**Agent 1: Action Parity**
```
Audit for ACTION PARITY - "Whatever the user can do, the agent can do."

Tasks:
1. Enumerate ALL user actions in frontend (API calls, button clicks, form submissions)
   - Search for API service files, fetch calls, form handlers
   - Check routes and components for user interactions
2. Check which have corresponding agent tools
   - Search for agent tool definitions
   - Map user actions to agent capabilities
3. Score: "Agent can do X out of Y user actions"

Format:
## Action Parity Audit
### User Actions Found
| Action | Location | Agent Tool | Status |
### Score: X/Y (percentage%)
### Missing Agent Tools
### Recommendations
```

**Agent 2: Tools as Primitives**
```
Audit for TOOLS AS PRIMITIVES - "Tools provide capability, not behavior."

Tasks:
1. Find and read ALL agent tool files
2. Classify each as:
   - PRIMITIVE (good): read, write, store, list - enables capability without business logic
   - WORKFLOW (bad): encodes business logic, makes decisions, orchestrates steps
3. Score: "X out of Y tools are proper primitives"

Format:
## Tools as Primitives Audit
### Tool Analysis
| Tool | File | Type | Reasoning |
### Score: X/Y (percentage%)
### Problematic Tools (workflows that should be primitives)
### Recommendations
```

**Agent 3: Context Injection**
```
Audit for CONTEXT INJECTION - "System prompt includes dynamic context about app state"

Tasks:
1. Find context injection code (search for "context", "system prompt", "inject")
2. Read agent prompts and system messages
3. Enumerate what IS injected vs what SHOULD be:
   - Available resources (files, drafts, documents)
   - User preferences/settings
   - Recent activity
   - Available capabilities listed
   - Session history
   - Workspace state

Format:
## Context Injection Audit
### Context Types Analysis
| Context Type | Injected? | Location | Notes |
### Score: X/Y (percentage%)
### Missing Context
### Recommendations
```

**Agent 4: Shared Workspace**
```
Audit for SHARED WORKSPACE - "Agent and user work in the same data space"

Tasks:
1. Identify all data stores/tables/models
2. Check if agents read/write to SAME tables or separate ones
3. Look for sandbox isolation anti-pattern (agent has separate data space)

Format:
## Shared Workspace Audit
### Data Store Analysis
| Data Store | User Access | Agent Access | Shared? |
### Score: X/Y (percentage%)
### Isolated Data (anti-pattern)
### Recommendations
```

**Agent 5: CRUD Completeness**
```
Audit for CRUD COMPLETENESS - "Every entity has full CRUD"

Tasks:
1. Identify all entities/models in the codebase
2. For each entity, check if agent tools exist for:
   - Create
   - Read
   - Update
   - Delete
3. Score per entity and overall

Format:
## CRUD Completeness Audit
### Entity CRUD Analysis
| Entity | Create | Read | Update | Delete | Score |
### Overall Score: X/Y entities with full CRUD (percentage%)
### Incomplete Entities (list missing operations)
### Recommendations
```

**Agent 6: UI Integration**
```
Audit for UI INTEGRATION - "Agent actions immediately reflected in UI"

Tasks:
1. Check how agent writes/changes propagate to frontend
2. Look for:
   - Streaming updates (SSE, WebSocket)
   - Polling mechanisms
   - Shared state/services
   - Event buses
   - File watching
3. Identify "silent actions" anti-pattern (agent changes state but UI doesn't update)

Format:
## UI Integration Audit
### Agent Action → UI Update Analysis
| Agent Action | UI Mechanism | Immediate? | Notes |
### Score: X/Y (percentage%)
### Silent Actions (anti-pattern)
### Recommendations
```

**Agent 7: Capability Discovery**
```
Audit for CAPABILITY DISCOVERY - "Users can discover what the agent can do"

Tasks:
1. Check for these 7 discovery mechanisms:
   - Onboarding flow showing agent capabilities
   - Help documentation
   - Capability hints in UI
   - Agent self-describes in responses
   - Suggested prompts/actions
   - Empty state guidance
   - Slash commands (/help, /tools)
2. Score against 7 mechanisms

Format:
## Capability Discovery Audit
### Discovery Mechanism Analysis
| Mechanism | Exists? | Location | Quality |
### Score: X/7 (percentage%)
### Missing Discovery
### Recommendations
```

**Agent 8: Prompt-Native Features**
```
Audit for PROMPT-NATIVE FEATURES - "Features are prompts defining outcomes, not code"

Tasks:
1. Read all agent prompts
2. Classify each feature/behavior as defined in:
   - PROMPT (good): outcomes defined in natural language
   - CODE (bad): business logic hardcoded
3. Check if behavior changes require prompt edit vs code change

Format:
## Prompt-Native Features Audit
### Feature Definition Analysis
| Feature | Defined In | Type | Notes |
### Score: X/Y (percentage%)
### Code-Defined Features (anti-pattern)
### Recommendations
```

</sub-agents>

### Step 3: Compile Summary Report

After all agents complete, compile a summary with:

```markdown
## Agent-Native Architecture Review: [Project Name]

### Overall Score Summary

| Core Principle | Score | Percentage | Status |
|----------------|-------|------------|--------|
| Action Parity | X/Y | Z% | ✅/⚠️/❌ |
| Tools as Primitives | X/Y | Z% | ✅/⚠️/❌ |
| Context Injection | X/Y | Z% | ✅/⚠️/❌ |
| Shared Workspace | X/Y | Z% | ✅/⚠️/❌ |
| CRUD Completeness | X/Y | Z% | ✅/⚠️/❌ |
| UI Integration | X/Y | Z% | ✅/⚠️/❌ |
| Capability Discovery | X/Y | Z% | ✅/⚠️/❌ |
| Prompt-Native Features | X/Y | Z% | ✅/⚠️/❌ |

**Overall Agent-Native Score: X%**

### Status Legend
- ✅ Excellent (80%+)
- ⚠️ Partial (50-79%)
- ❌ Needs Work (<50%)

### Top 10 Recommendations by Impact

| Priority | Action | Principle | Effort |
|----------|--------|-----------|--------|

### What's Working Excellently

[List top 5 strengths]
```

## Success Criteria

- [ ] All 8 sub-agents complete their audits
- [ ] Each principle has a specific numeric score (X/Y format)
- [ ] Summary table shows all scores and status indicators
- [ ] Top 10 recommendations are prioritized by impact
- [ ] Report identifies both strengths and gaps

## Optional: Single Principle Audit

If $ARGUMENTS specifies a single principle (e.g., "action parity"), only run that sub-agent and provide detailed findings for that principle alone.

Valid arguments:
- `action parity` or `1`
- `tools` or `primitives` or `2`
- `context` or `injection` or `3`
- `shared` or `workspace` or `4`
- `crud` or `5`
- `ui` or `integration` or `6`
- `discovery` or `7`
- `prompt` or `features` or `8`

plugins/compound-engineering/skills/ce-brainstorm/SKILL.mdskill

Show content (23515 bytes)

---
name: ce-brainstorm
description: 'Explore requirements and approaches through collaborative dialogue before writing a right-sized requirements document and planning implementation. Use for feature ideas, problem framing, when the user says ''let''s brainstorm'', or when they want to think through options before deciding what to build. Also use when a user describes a vague or ambitious feature request, asks ''what should we build'', ''help me think through X'', presents a problem with multiple valid solutions, or seems unsure about scope or direction — even if they don''t explicitly ask to brainstorm.'
argument-hint: "[feature idea or problem to explore]"
---

# Brainstorm a Feature or Improvement

**Note: The current year is 2026.** Use this when dating requirements documents.

Brainstorming helps answer **WHAT** to build through collaborative dialogue. It precedes `/ce-plan`, which answers **HOW** to build it.

The durable output of this workflow is a **requirements document**. In other workflows this might be called a lightweight PRD or feature brief. In compound engineering, keep the workflow name `brainstorm`, but make the written artifact strong enough that planning does not need to invent product behavior, scope boundaries, or success criteria.

This skill does not implement code. It explores, clarifies, and documents decisions for later planning or execution.

**IMPORTANT: All file references in generated documents must use repo-relative paths (e.g., `src/models/user.rb`), never absolute paths. Absolute paths break portability across machines, worktrees, and teammates.**

## Core Principles

1. **Assess scope first** - Match the amount of ceremony to the size and ambiguity of the work.
2. **Be a thinking partner** - Suggest alternatives, challenge assumptions, and explore what-ifs instead of only extracting requirements.
3. **Resolve product decisions here** - User-facing behavior, scope boundaries, and success criteria belong in this workflow. Detailed implementation belongs in planning.
4. **Keep implementation out of the requirements doc by default** - Do not include libraries, schemas, endpoints, file layouts, or code-level design unless the brainstorm itself is inherently about a technical or architectural change.
5. **Right-size the artifact** - Simple work gets a compact requirements document or brief alignment. Larger work gets a fuller document. Do not add ceremony that does not help planning.
6. **Apply YAGNI to carrying cost, not coding effort** - Prefer the simplest approach that delivers meaningful value. Avoid speculative complexity and hypothetical future-proofing, but low-cost polish or delight is worth including when its ongoing cost is small and easy to maintain.

## Interaction Rules

These rules apply to every brainstorm, including the universal (non-software) flow routed to `references/universal-brainstorming.md`.

1. **Ask one question at a time** - One question per turn, even when sub-questions feel related. Stacking several questions in a single message produces diluted answers; pick the single most useful one and ask it.
2. **Prefer single-select multiple choice** - Use single-select when choosing one direction, one priority, or one next step.
3. **Use multi-select rarely and intentionally** - Use it only for compatible sets such as goals, constraints, non-goals, or success criteria that can all coexist. If prioritization matters, follow up by asking which selected item is primary.
4. **Default to the platform's blocking question tool** - Use `AskUserQuestion` in Claude Code (call `ToolSearch` with `select:AskUserQuestion` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). These tools include a free-text fallback (e.g., "Other" in Claude Code), so options scaffold the answer without confining it — well-chosen options surface dimensions the user may not have separated, and pick-plus-optional-note is lower activation energy than composing prose from scratch. This default holds for opening and elicitation questions too, not only narrowing. Fall back to numbered options in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.
5. **Use prose only when the question is genuinely open** - Drop the blocking tool only when (a) the answer is inherently narrative ("walk me through how you got here"), (b) the question is diagnostic or introspective and presented options would leak your priors and bias the answer (e.g., "what concerns you most?" where a 4-option menu signals which axes matter), or (c) you cannot write 3-4 genuinely distinct, plausibly-correct options that cover the space without padding or strawmen. The test: if you'd be straining to fill the option slots, the question is open — use prose. Rule 1 still applies: still one question per turn.

## Output Guidance

- **Keep outputs concise** - Prefer short sections, brief bullets, and only enough detail to support the next decision.
- **Use repo-relative paths** - When referencing files, use paths relative to the repo root (e.g., `src/models/user.rb`), never absolute paths. Absolute paths make documents non-portable across machines and teammates.

## Feature Description

<feature_description> #$ARGUMENTS </feature_description>

**If the feature description above is empty, ask the user:** "What would you like to explore? Please describe the feature, problem, or improvement you're thinking about."

Do not proceed until you have a feature description from the user.

## Execution Flow

### Phase 0: Resume, Assess, and Route

#### 0.1 Resume Existing Work When Appropriate

If the user references an existing brainstorm topic or document, or there is an obvious recent matching `*-requirements.md` file in `docs/brainstorms/`:
- Read the document
- Confirm with the user before resuming: "Found an existing requirements doc for [topic]. Should I continue from this, or start fresh?"
- If resuming, summarize the current state briefly, continue from its existing decisions and outstanding questions, and update the existing document instead of creating a duplicate

#### 0.1b Classify Task Domain

Before proceeding to Phase 0.2, classify whether this is a software task. The key question is: **does the task involve building, modifying, or architecting software?** -- not whether the task *mentions* software topics.

**Software** (continue to Phase 0.2) -- the task references code, repositories, APIs, databases, or asks to build/modify/debug/deploy software.

**Non-software brainstorming** (route to universal brainstorming) -- BOTH conditions must be true:
- None of the software signals above are present
- The task describes something the user wants to explore, decide, or think through in a non-software domain

**Neither** (respond directly, skip all brainstorming phases) -- the input is a quick-help request, error message, factual question, or single-step task that doesn't need a brainstorm.

**If non-software brainstorming is detected:** Read `references/universal-brainstorming.md` and use those facilitation principles. Skip Phases 0.2–4 below — the **Core Principles and Interaction Rules above still apply unchanged**, including one-question-per-turn and the default to the platform's blocking question tool.

#### 0.2 Assess Whether Brainstorming Is Needed

**Clear requirements indicators:**
- Specific acceptance criteria provided
- Referenced existing patterns to follow
- Described exact expected behavior
- Constrained, well-defined scope

**If requirements are already clear:**
Keep the interaction brief. Confirm understanding and present concise next-step options rather than forcing a long brainstorm. Only write a short requirements document when a durable handoff to planning or later review would be valuable. Skip Phase 1.1 and 1.2 entirely — go straight to Phase 1.3 or Phase 2.5 in announce-mode (synthesis emitted for visibility, no blocking confirmation), then to Phase 3.

#### 0.3 Assess Scope

Use the feature description plus a light repo scan to classify the work:
- **Lightweight** - small, well-bounded, low ambiguity
- **Standard** - normal feature or bounded refactor with some decisions to make
- **Deep** - cross-cutting, strategic, or highly ambiguous

If the scope is unclear, ask one targeted question to disambiguate and then proceed.

**Deep sub-mode: feature vs product.** For Deep scope, also classify whether the brainstorm must establish product shape or inherit it:

- **Deep — feature** (default): existing product shape anchors decisions. Primary actors, core outcome, positioning, and primary flows are already established in the product or repo. The brainstorm extends or refines within that shape.
- **Deep — product**: the brainstorm must establish product shape rather than inherit it. Primary actors, core outcome, positioning against adjacent products, or primary end-to-end flows are materially unresolved. Existing code lowers the odds of product-tier but does not by itself rule it out — a half-built tool with ambiguous shape is still product-tier.

Product-tier triggers additional Phase 1.2 questions and additional sections in the requirements document. Feature-tier uses the current Deep behavior unchanged.

### Phase 1: Understand the Idea

#### 1.1 Existing Context Scan

Scan the repo before substantive brainstorming. Match depth to scope:

**Lightweight** — Search for the topic, check if something similar already exists, and move on.

**Standard and Deep** — Two passes:

*Constraint Check* — Check project instruction files (`AGENTS.md`, and `CLAUDE.md` only if retained as compatibility context) for workflow, product, or scope constraints that affect the brainstorm. Also read `STRATEGY.md` if it exists — the product's target problem, approach, persona, and active tracks are direct input to what this brainstorm should deliver and should shape scope, success criteria, and which approaches are aligned vs out-of-scope. If these add nothing, move on.

*Topic Scan* — Search for relevant terms. Read the most relevant existing artifact if one exists (brainstorm, plan, spec, skill, feature doc). Skim adjacent examples covering similar behavior.

If nothing obvious appears after a short scan, say so and continue. Two rules govern technical depth during the scan:

1. **Verify before claiming** — When the brainstorm touches checkable infrastructure (database tables, routes, config files, dependencies, model definitions), read the relevant source files to confirm what actually exists. Any claim that something is absent — a missing table, an endpoint that doesn't exist, a dependency not in the Gemfile, a config option with no current support — must be verified against the codebase first; if not verified, label it as an unverified assumption. This applies to every brainstorm regardless of topic.

2. **Defer design decisions to planning** — Implementation details like schemas, migration strategies, endpoint structure, or deployment topology belong in planning, not here — unless the brainstorm is itself about a technical or architectural decision, in which case those details are the subject of the brainstorm and should be explored.

**Slack context** (opt-in, Standard and Deep only) — never auto-dispatch. Route by condition:

- **Tools available + user asked**: Dispatch `ce-slack-researcher` with a brief summary of the brainstorm topic alongside Phase 1.1 work. Incorporate findings into constraint and context awareness.
- **Tools available + user didn't ask**: Note in output: "Slack tools detected. Ask me to search Slack for organizational context at any point, or include it in your next prompt."
- **No tools + user asked**: Note in output: "Slack context was requested but no Slack tools are available. Install and authenticate the Slack plugin to enable organizational context search."

#### 1.2 Product Pressure Test

Before generating approaches, scan the user's opening for rigor gaps. Match depth to scope.

This is agent-internal analysis, not a user-facing checklist. Read the opening, note which gaps actually exist, and raise only those as questions during Phase 1.3 — folded into the normal flow of dialogue, not fired as a pre-flight gauntlet. A fuzzy opening may earn three or four probes; a concrete, well-framed one may earn zero because no scope-appropriate gaps were found.

**Lightweight:**
- Is this solving the real user problem?
- Are we duplicating something that already covers this?
- Is there a clearly better framing with near-zero extra cost?

**Standard — scan for these gaps:**

- **Evidence gap.** The opening asserts want or need, but doesn't point to anything the would-be user has already done — time spent, money paid, workarounds built — that would make the want observable. When present, ask for the most concrete thing someone has already done about this.

- **Specificity gap.** The opening describes the beneficiary at a level of abstraction where the agent couldn't design without silently inventing who they are and what changes for them. When present, ask the user to name a specific person or narrow segment, and what changes for that person when this ships.

- **Counterfactual gap.** The opening doesn't make visible what users do today when this problem arises, nor what changes if nothing ships. When present, ask what the current workaround is, even if it's messy — and what it costs them.

- **Attachment gap.** The opening treats a particular solution shape as the thing being built, rather than the value that shape is supposed to deliver, and hasn't been examined against smaller forms that might deliver the same value. When present, ask what the smallest version that still delivers real value would look like.

Plus these synthesis questions — not gap lenses, product-judgment the agent weighs in its own reasoning:
- Is there a nearby framing that creates more user value without more carrying cost? If so, what complexity does it add?
- Given the current project state, user goal, and constraints, what is the single highest-leverage move right now: the request as framed, a reframing, one adjacent addition, a simplification, or doing nothing?

Favor moves that compound value, reduce future carrying cost, or make the product meaningfully more useful or compelling. Use the result to sharpen the conversation, not to bulldoze the user's intent.

**Deep** — Standard lenses and synthesis questions plus:
- Is this a local patch, or does it move the broader system toward where it wants to be?

**Deep — product** — Deep plus:

- **Durability gap.** The opening's value proposition rests on a current state of the world that may shift in predictable ways within the horizon the user cares about. When present, ask how the idea fares under the most plausible near-term shifts — and push past rising-tide answers every competitor could make.

- What adjacent product could we accidentally build instead, and why is that the wrong one?
- What would have to be true in the world for this to fail?

These questions force an explicit product thesis and feed the Scope Boundaries subsections ("Deferred for later" and "Outside this product's identity") and Dependencies / Assumptions in the requirements document.

#### 1.3 Collaborative Dialogue

Follow the Interaction Rules above. Use the platform's blocking question tool when available.

**Guidelines:**
- Ask what the user is already thinking before offering your own ideas. This surfaces hidden context and prevents fixation on AI-generated framings.
- Start broad (problem, users, value) then narrow (constraints, exclusions, edge cases)
- **Rigor probes fire before Phase 2 and are prose, not menus.** Narrowing is legitimate, but Phase 1 cannot end with un-probed rigor gaps. Each scope-appropriate gap from Phase 1.2 fires as a **separate** direct prose probe — one probe satisfies one gap, not multiple. Standard brainstorms scan four gap lenses (evidence, specificity, counterfactual, attachment); Deep-product adds durability (five total), but only the gaps actually present in the opening must be probed. Surface those probes progressively across the conversation — interleaving with narrowing moves is fine, as long as every scope-appropriate gap that was found in Phase 1.2 has been probed in prose before Phase 2. Rigor probes map to Interaction Rule 5(b): a 4-option menu signals which kinds of evidence count and lets the user pick rather than produce. Prose forces them to produce real observation or surface their uncertainty. Examples (one per gap): *evidence — "What's the most concrete thing someone's already done about this — paid, built a workaround, quit a tool over it?"* / *specificity — "Can you name a team you've actually watched hit this, or are you reasoning?"* / *counterfactual — "What do teams do today when this breaks — who reconciles?"* / *attachment — "Before we move to shapes or approaches — what's the smallest version that would still prove the bet right, and what's excluded?"* — **attachment is the final rigor probe before Phase 2 when the attachment gap is present. Fire it regardless of whether a specific shape has emerged through narrowing; its job is to pressure-test the user's implicit framing of the product before Phase 2 inherits it** / *durability — "Under the most plausible near-term shifts, how does this bet hold?"* If the answer reveals genuine uncertainty, record it as an explicit assumption in the requirements document rather than skipping the probe.
- Clarify the problem frame, validate assumptions, and ask about success criteria
- Make requirements concrete enough that planning will not need to invent behavior
- Surface dependencies or prerequisites only when they materially affect scope
- Resolve product decisions here; leave technical implementation choices for planning
- Bring ideas, alternatives, and challenges instead of only interviewing

**Exit condition:** Continue until the idea is clear OR the user explicitly wants to proceed.

### Phase 2: Explore Approaches

If multiple plausible directions remain, propose **2-3 concrete approaches** based on research and conversation. Otherwise state the recommended direction directly.

Use at least one non-obvious angle — inversion (what if we did the opposite?), constraint removal (what if X weren't a limitation?), or analogy from how another domain solves this. The first approaches that come to mind are usually variations on the same axis.

Present approaches first, then evaluate. Let the user see all options before hearing which one is recommended — leading with a recommendation before the user has seen alternatives anchors the conversation prematurely.

When useful, include one deliberately higher-upside alternative:
- Identify what adjacent addition or reframing would most increase usefulness, compounding value, or durability without disproportionate carrying cost. Present it as a challenger option alongside the baseline, not as the default. Omit it when the work is already obviously over-scoped or the baseline request is clearly the right move.

At product tier, alternatives should differ on *what* is built (product shape, actor set, positioning), not *how* it is built. Implementation-variant alternatives belong at feature tier.

For each approach, provide:
- Brief description (2-3 sentences)
- Pros and cons
- Key risks or unknowns
- When it's best suited

**Approach granularity: mechanism / product shape, not architecture.** Approach descriptions name mechanism-level distinctions ("pause as a rule property" vs "pause as an event filter" vs "pause as a separate entity") and product-relevant trade-offs (plan-tier coupling, complexity surface, migration difficulty). They do NOT name implementation specifics — column names, table names, file paths, service classes, JSON shapes, exact method names. Those are ce-plan's job. Bringing architecture forward at brainstorm time forces the user to make architectural decisions on ce-brainstorm's intentionally-shallow research, and the synthesis at Phase 2.5 then has to filter out the leak.

After presenting all approaches, state your recommendation and explain why. Prefer simpler solutions when added complexity creates real carrying cost, but do not reject low-cost, high-value polish just because it is not strictly necessary.

If one approach is clearly best and alternatives are not meaningful, skip the menu and state the recommendation directly.

If relevant, call out whether the choice is:
- Reuse an existing pattern
- Extend an existing capability
- Build something net new

### Phase 2.5: Synthesis Summary

**STOP. Before composing the synthesis, read `references/synthesis-summary.md`.** The discipline rules, prose-summary requirement, three-bucket structure, anti-pattern guidance, soft-cut behavior, self-redirect support, prose-feedback rules, and bucket-content routing into doc body sections all live there. Composing a synthesis without these rules loaded reliably produces malformed output — missing prose summary, implementation-detail leakage, the proposal-pitch anti-pattern. This is not optional supplementary reading; it is the source of truth for how the phase behaves.

Surface a synthesis to the user before Phase 3 writes the requirements doc — the user's last opportunity to correct scope before the artifact lands.

Fires for **all tiers** including Lightweight. Skip Phase 2.5 entirely on the Phase 0.1b non-software (universal-brainstorming) route.

**Headless mode** (LFG / `disable-model-invocation`): the synthesis is composed but not confirmed. Inferred bets route to a `## Assumptions` section in the doc (so downstream review can scrutinize them as un-validated), not into Key Decisions. See `references/synthesis-summary.md` Headless mode for the full routing.

**Announce-mode (Phase 0.2 fast path)**: on the "requirements already clear" fast path, Phase 2.5 fires in announce-mode — emit the synthesis (Stated / Inferred / Out) for visibility, then **end the turn**. Do NOT call the Write tool in the same turn as the synthesis emission. On the user's next message: if it's an acknowledgment, follow-up, or any non-correcting input, proceed to Phase 3 doc-write; if it indicates a correction (push-back on an Inferred bullet, scope adjustment), revise the synthesis and emit again. Lighter than full Phase 2.5 (no `AskUserQuestion` menu, no formal confirm option) but still gives the user a real interruption window before the doc lands. ce-brainstorm sits early in the workflow; a wrong-doc has downstream consequence (feeds ce-plan, then implementation), so the turn boundary is justified even on the fast path.

### Phase 3: Capture the Requirements

Write or update a requirements document only when the conversation produced durable decisions worth preserving. Read `references/requirements-capture.md` for the document template, formatting rules, visual aid guidance, and completeness checks.

For **Lightweight** brainstorms, keep the document compact. Skip document creation when the user only needs brief alignment and no durable decisions need to be preserved.

### Phase 4: Handoff

Present next-step options and execute the user's selection. Read `references/handoff.md` for the option logic, dispatch instructions, and closing summary format.

plugins/compound-engineering/skills/ce-clean-gone-branches/SKILL.mdskill

Show content (2462 bytes)

---
name: ce-clean-gone-branches
description: Clean up local branches whose remote tracking branch is gone. Use when the user says "clean up branches", "delete gone branches", "prune local branches", "clean gone", or wants to remove stale local branches that no longer exist on the remote. Also handles removing associated worktrees for branches that have them.
---

# Clean Gone Branches

Delete local branches whose remote tracking branch has been deleted, including any associated worktrees.

## Workflow

### Step 1: Discover gone branches

Run the discovery script to fetch the latest remote state and identify gone branches:

```bash
bash scripts/clean-gone
```

[scripts/clean-gone](./scripts/clean-gone)

The script runs `git fetch --prune` first, then parses `git branch -vv` for branches marked `: gone]`.

If the script outputs `__NONE__`, report that no stale branches were found and stop.

### Step 2: Present branches and ask for confirmation

Show the user the list of branches that will be deleted. Format as a simple list:

```
These local branches have been deleted from the remote:

  - feature/old-thing
  - bugfix/resolved-issue
  - experiment/abandoned

Delete all of them? (y/n)
```

Wait for the user's answer using the platform's blocking question tool: `AskUserQuestion` in Claude Code (call `ToolSearch` with `select:AskUserQuestion` first if its schema isn't loaded), `request_user_input` in Codex, `ask_user` in Gemini, `ask_user` in Pi (requires the `pi-ask-user` extension). Fall back to presenting the list in chat only when no blocking tool exists in the harness or the call errors (e.g., Codex edit modes) — not because a schema load is required. Never silently skip the question.

This is a yes-or-no decision on the entire list -- do not offer multi-selection or per-branch choices.

### Step 3: Delete confirmed branches

If the user confirms, delete each branch. For each branch:

1. Check if it has an associated worktree (`git worktree list | grep "\\[$branch\\]"`)
2. If a worktree exists and is not the main repo root, remove it first: `git worktree remove --force "$worktree_path"`
3. Delete the branch: `git branch -D "$branch"`

Report results as you go:

```
Removed worktree: .worktrees/feature/old-thing
Deleted branch: feature/old-thing
Deleted branch: bugfix/resolved-issue
Deleted branch: experiment/abandoned

Cleaned up 3 branches.
```

If the user declines, acknowledge and stop without deleting anything.

plugins/coding-tutor/skills/coding-tutor/SKILL.mdskill

Show content (12017 bytes)

---
name: coding-tutor
description: Personalized coding tutorials that build on your existing knowledge and use your actual codebase for examples. Creates a persistent learning trail that compounds over time using the power of AI, spaced repetition and quizes.
---

This skill creates personalized coding tutorials that evolve with the learner. Each tutorial builds on previous ones, uses real examples from the current codebase, and maintains a persistent record of concepts mastered.

The user asks to learn something - either a specific concept or an open "teach me something new" request.

## Welcome New Learners

If `~/coding-tutor-tutorials/` does not exist, this is a new learner. Before running setup, introduce yourself:

> I'm your personal coding tutor. I create tutorials tailored to you - using real code from your projects, building on what you already know, and tracking your progress over time.
>
> All your tutorials live in one central library (`~/coding-tutor-tutorials/`) that works across all your projects. Use `/teach-me` to learn something new, `/quiz-me` to test your retention with spaced repetition.

Then proceed with setup and onboarding.

## Setup: Ensure Tutorials Repo Exists

**Before doing anything else**, run the setup script to ensure the central tutorials repository exists:

```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/coding-tutor/scripts/setup_tutorials.py
```

This creates `~/coding-tutor-tutorials/` if it doesn't exist. All tutorials and the learner profile are stored there, shared across all your projects.

## First Step: Know Your Learner

**Always start by reading `~/coding-tutor-tutorials/learner_profile.md` if it exists.** This profile contains crucial context about who you're teaching - their background, goals, and personality. Use it to calibrate everything: what analogies will land, how fast to move, what examples resonate.

If no tutorials exist in `~/coding-tutor-tutorials/` AND no learner profile exists at `~/coding-tutor-tutorials/learner_profile.md`, this is a brand new learner. Before teaching anything, you need to understand who you're teaching.

**Onboarding Interview:**

Ask these three questions, one at a time. Wait for each answer before asking the next.

1. **Prior exposure**: What's your background with programming? - Understand if they've built anything before, followed tutorials, or if this is completely new territory.

2. **Ambitious goal**: This is your private AI tutor whose goal is to make you a top 1% programmer. Where do you want this to take you? - Understand what success looks like for them: a million-dollar product, a job at a company they admire, or something else entirely.

3. **Who are you**: Tell me a bit about yourself - imagine we just met at a coworking space. - Get context that shapes how to teach them.

4. **Optional**: Based on the above answers, you may ask upto one optional 4th question if it will make your understanding of the learner richer.

After gathering responses, create `~/coding-tutor-tutorials/learner_profile.md` and put the interview Q&A there (along with your commentary):

```yaml
---
created: DD-MM-YYYY
last_updated: DD-MM-YYYY
---

**Q1. <insert question you asked>**
**Answer**. <insert user's answer>
**your internal commentary**

**Q2. <insert question you asked>**
**Answer**. <insert user's answer>
**your internal commentary**

**Q3. <insert question you asked>**
**Answer**. <insert user's answer>
**your internal commentary**

**Q4. <optional>
```

## Teaching Philosophy

Our general goal is to take the user from newbie to a senior engineer in record time. One at par with engineers at companies like 37 Signals or Vercel.

Before creating a tutorial, make a plan by following these steps:

- **Load learner context**: Read `~/coding-tutor-tutorials/learner_profile.md` to understand who you're teaching - their background, goals, and personality.
- **Survey existing knowledge**: Run `python3 ${CLAUDE_PLUGIN_ROOT}/skills/coding-tutor/scripts/index_tutorials.py` to understand what concepts have been covered, at what depth, and how well they landed (understanding scores). Optionally, dive into particular tutorials in `~/coding-tutor-tutorials/` to read them.
- **Identify the gap**: What's the next concept that would be most valuable? Consider both what they've asked for AND what naturally follows from their current knowledge. Think of a curriculum that would get them from their current point to Senior Engineer - what should be the next 3 topics they need to learn to advance their programming knowledge in this direction?
- **Find the anchor**: Locate real examples in the codebase that demonstrate this concept. Learning from abstract examples is forgettable; learning from YOUR code is sticky.
- **(Optional) Use ask-user-question tool**: Ask clarifying questions to the learner to understand their intent, goals or expectations if it'll help you make a better plan.

Then show this curriculum plan of **next 3 TUTORIALS** to the user and proceed to the tutorial creation step only if the user approves. If the user rejects, create a new plan using steps mentioned above.

## Tutorial Creation

Each tutorial is a markdown file in `~/coding-tutor-tutorials/` with this structure:
```yaml
---
concepts: [primary_concept, related_concept_1, related_concept_2]
source_repo: my-app  # Auto-detected: which repo this tutorial's examples come from
description: One-paragraph summary of what this tutorial covers
understanding_score: null  # null until quizzed, then 1-10 based on quiz performance
last_quizzed: null  # null until first quiz, then DD-MM-YYYY
prerequisites: [~/coding-tutor-tutorials/tutorial_1_name.md, ~/coding-tutor-tutorials/tutorial_2_name.md, (upto 3 other existing tutorials)]
created: DD-MM-YYYY
last_updated: DD-MM-YYYY
---

Full contents of tutorial go here

---

## Q&A

Cross-questions during learning go here.

## Quiz History

Quiz sessions recorded here.
```

Run `scripts/create_tutorial.py` like this to create a new tutorial with template:

```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/coding-tutor/scripts/create_tutorial.py "Topic Name" --concepts "Concept1,Concept2"
```

This creates an empty template of the tutorial. Then you should edit the newly created file to write in the actual tutorial.
Qualities of a great tutorial should:

- **Start with the "why"**: Not "here's how callbacks work" but "here's the problem in your code that callbacks solve"
- **Use their code**: Every concept demonstrated with examples pulled from the actual codebase. Reference specific files and line numbers.
- **Build mental models**: Diagrams, analogies, the underlying "shape" of the concept - not just syntax, ELI5
- **Predict confusion**: Address the questions they're likely to ask before they ask them, don't skim over things, don't write in a notes style
- **End with a challenge**: A small exercise they could try in this codebase to cement understanding

### Tutorial Writing Style

Write personal tutorials like the best programming educators: Julia Evans, Dan Abramov. Not like study notes or documentation. There's a difference between a well-structured tutorial and one that truly teaches.

- Show the struggle - "Here's what you might try... here's why it doesn't work... here's the insight that unlocks it."
- Fewer concepts, more depth - A tutorial that teaches 3 things deeply beats one that mentions 10 things.
- Tell stories - a great tutorial is one coherent story, dives deep into a single concept, using storytelling techniques that engage readers

We should make the learner feel like Julia Evans or Dan Abramov is their private tutor.

Note: If you're not sure about a fact or capability or new features/APIs, do web research, look at documentation to make sure you're teaching accurate up-to-date things. NEVER commit the sin of teaching something incorrect.

## The Living Tutorial

Tutorials aren't static documents - they evolve:

- **Q&A is mandatory**: When the learner asks ANY clarifying question about a tutorial, you MUST append it to the tutorial's `## Q&A` section. This is not optional - these exchanges are part of their personalized learning record and improve future teaching.
- If the learner says they can't follow the tutorial or need you to take a different approach, update the tutorial like they ask
- Update `last_updated` timestamp
- If a question reveals a gap in prerequisites, note it for future tutorial planning

Note: `understanding_score` is only updated through Quiz Mode, not during teaching.

## What Makes Great Teaching
**DO**: Meet them where they are. Use their vocabulary. Reference their past struggles. Make connections to concepts they already own. Be encouraging but honest about complexity.

**DON'T**: Assume knowledge not demonstrated in previous tutorials. Use generic blog-post examples when codebase examples exist. Overwhelm with every edge case upfront. Be condescending about gaps.

**CALIBRATE**: A learner with 3 tutorials is different from one with 30. Early tutorials need more scaffolding and encouragement. Later tutorials can move faster and reference the shared history you've built.

Remember: The goal isn't to teach programming in the abstract. It's to teach THIS person, using THEIR code, building on THEIR specific journey. Every tutorial should feel like it was written specifically for them - because it was.

## Quiz Mode

Tutorials teach. Quizzes verify. The score should reflect what the learner actually retained, not what was presented to them.

**Triggers:**
- Explicit: "Quiz me on React hooks" → quiz that specific concept
- Open: "Quiz me on something" → run `python3 ${CLAUDE_PLUGIN_ROOT}/skills/coding-tutor/scripts/quiz_priority.py` to get a prioritized list based on spaced repetition, then choose what to quiz

**Spaced Repetition:**

When the user requests an open quiz, the priority script uses spaced repetition intervals to surface:
- Never-quizzed tutorials (need baseline assessment)
- Low-scored concepts that are overdue for review
- High-scored concepts whose review interval has elapsed

The script uses Fibonacci-ish intervals: score 1 = review in 2 days, score 5 = 13 days, score 8 = 55 days, score 10 = 144 days. This means weak concepts get drilled frequently while mastered ones fade into long-term review.

The script gives you an ordered list with `understanding_score` and `last_quizzed` for each tutorial. Use this to make an informed choice about what to quiz, and explain to the learner why you picked that concept ("You learned callbacks 5 days ago but scored 4/10 - let's see if it's sticking better now").

**Philosophy:**

A quiz isn't an exam - it's a conversation that reveals understanding. Ask questions that expose mental models, not just syntax recall. The goal is to find the edges of their knowledge: where does solid understanding fade into uncertainty?

**Ask only 1 question at a time.** Wait for the learner's answer before asking the next question.

Mix question types based on what the concept demands:
- Conceptual ("when would you use X over Y?")
- Code reading ("what does this code in your app do?")
- Code writing ("write a scope that does X")
- Debugging ("what's wrong here?")

Use their codebase for examples whenever possible. "What does line 47 of `app/models/user.rb` do?" is more valuable than abstract snippets.

**Scoring:**

After the quiz, update `understanding_score` honestly:
- **1-3**: Can't recall the concept, needs re-teaching
- **4-5**: Vague memory, partial answers
- **6-7**: Solid understanding, minor gaps
- **8-9**: Strong grasp, handles edge cases
- **10**: Could teach this to someone else

Also update `last_quizzed: DD-MM-YYYY` in the frontmatter.

**Recording:**

Append to the tutorial's `## Quiz History` section:
```
### Quiz - DD-MM-YYYY
**Q:** [Question asked]
**A:** [Brief summary of their response and what it revealed about understanding]
Score updated: 5 → 7
```

This history helps future quizzes avoid repetition and track progression over time.

plugins/compound-engineering/skills/ce-agent-native-architecture/SKILL.mdskill

Show content (22433 bytes)

---
name: ce-agent-native-architecture
description: Build applications where agents are first-class citizens. Use this skill when designing autonomous agents, creating MCP tools, implementing self-modifying systems, or building apps where features are outcomes achieved by agents operating in a loop.
---

<why_now>
## Why Now

Software agents work reliably now. Claude Code demonstrated that an LLM with access to bash and file tools, operating in a loop until an objective is achieved, can accomplish complex multi-step tasks autonomously.

The surprising discovery: **a really good coding agent is actually a really good general-purpose agent.** The same architecture that lets Claude Code refactor a codebase can let an agent organize your files, manage your reading list, or automate your workflows.

The Claude Code SDK makes this accessible. You can build applications where features aren't code you write—they're outcomes you describe, achieved by an agent with tools, operating in a loop until the outcome is reached.

This opens up a new field: software that works the way Claude Code works, applied to categories far beyond coding.
</why_now>

<core_principles>
## Core Principles

### 1. Parity

**Whatever the user can do through the UI, the agent should be able to achieve through tools.**

This is the foundational principle. Without it, nothing else matters.

Imagine you build a notes app with a beautiful interface for creating, organizing, and tagging notes. A user asks the agent: "Create a note summarizing my meeting and tag it as urgent."

If you built UI for creating notes but no agent capability to do the same, the agent is stuck. It might apologize or ask clarifying questions, but it can't help—even though the action is trivial for a human using the interface.

**The fix:** Ensure the agent has tools (or combinations of tools) that can accomplish anything the UI can do.

This isn't about creating a 1:1 mapping of UI buttons to tools. It's about ensuring the agent can **achieve the same outcomes**. Sometimes that's a single tool (`create_note`). Sometimes it's composing primitives (`write_file` to a notes directory with proper formatting).

**The discipline:** When adding any UI capability, ask: can the agent achieve this outcome? If not, add the necessary tools or primitives.

A capability map helps:

| User Action | How Agent Achieves It |
|-------------|----------------------|
| Create a note | `write_file` to notes directory, or `create_note` tool |
| Tag a note as urgent | `update_file` metadata, or `tag_note` tool |
| Search notes | `search_files` or `search_notes` tool |
| Delete a note | `delete_file` or `delete_note` tool |

**The test:** Pick any action a user can take in your UI. Describe it to the agent. Can it accomplish the outcome?

---

### 2. Granularity

**Prefer atomic primitives. Features are outcomes achieved by an agent operating in a loop.**

A tool is a primitive capability: read a file, write a file, run a bash command, store a record, send a notification.

A **feature** is not a function you write. It's an outcome you describe in a prompt, achieved by an agent that has tools and operates in a loop until the outcome is reached.

**Less granular (limits the agent):**
```
Tool: classify_and_organize_files(files)
→ You wrote the decision logic
→ Agent executes your code
→ To change behavior, you refactor
```

**More granular (empowers the agent):**
```
Tools: read_file, write_file, move_file, list_directory, bash
Prompt: "Organize the user's downloads folder. Analyze each file,
        determine appropriate locations based on content and recency,
        and move them there."
Agent: Operates in a loop—reads files, makes judgments, moves things,
       checks results—until the folder is organized.
→ Agent makes the decisions
→ To change behavior, you edit the prompt
```

**The key shift:** The agent is pursuing an outcome with judgment, not executing a choreographed sequence. It might encounter unexpected file types, adjust its approach, or ask clarifying questions. The loop continues until the outcome is achieved.

The more atomic your tools, the more flexibly the agent can use them. If you bundle decision logic into tools, you've moved judgment back into code.

**The test:** To change how a feature behaves, do you edit prose or refactor code?

---

### 3. Composability

**With atomic tools and parity, you can create new features just by writing new prompts.**

This is the payoff of the first two principles. When your tools are atomic and the agent can do anything users can do, new features are just new prompts.

Want a "weekly review" feature that summarizes activity and suggests priorities? That's a prompt:

```
"Review files modified this week. Summarize key changes. Based on
incomplete items and approaching deadlines, suggest three priorities
for next week."
```

The agent uses `list_files`, `read_file`, and its judgment to accomplish this. You didn't write weekly-review code. You described an outcome, and the agent operates in a loop until it's achieved.

**This works for developers and users.** You can ship new features by adding prompts. Users can customize behavior by modifying prompts or creating their own. "When I say 'file this,' always move it to my Action folder and tag it urgent" becomes a user-level prompt that extends the application.

**The constraint:** This only works if tools are atomic enough to be composed in ways you didn't anticipate, and if the agent has parity with users. If tools encode too much logic, or the agent can't access key capabilities, composition breaks down.

**The test:** Can you add a new feature by writing a new prompt section, without adding new code?

---

### 4. Emergent Capability

**The agent can accomplish things you didn't explicitly design for.**

When tools are atomic, parity is maintained, and prompts are composable, users will ask the agent for things you never anticipated. And often, the agent can figure it out.

*"Cross-reference my meeting notes with my task list and tell me what I've committed to but haven't scheduled."*

You didn't build a "commitment tracker" feature. But if the agent can read notes, read tasks, and reason about them—operating in a loop until it has an answer—it can accomplish this.

**This reveals latent demand.** Instead of guessing what features users want, you observe what they're asking the agent to do. When patterns emerge, you can optimize them with domain-specific tools or dedicated prompts. But you didn't have to anticipate them—you discovered them.

**The flywheel:**
1. Build with atomic tools and parity
2. Users ask for things you didn't anticipate
3. Agent composes tools to accomplish them (or fails, revealing a gap)
4. You observe patterns in what's being requested
5. Add domain tools or prompts to make common patterns efficient
6. Repeat

This changes how you build products. You're not trying to imagine every feature upfront. You're creating a capable foundation and learning from what emerges.

**The test:** Give the agent an open-ended request relevant to your domain. Can it figure out a reasonable approach, operating in a loop until it succeeds? If it just says "I don't have a feature for that," your architecture is too constrained.

---

### 5. Improvement Over Time

**Agent-native applications get better through accumulated context and prompt refinement.**

Unlike traditional software, agent-native applications can improve without shipping code:

**Accumulated context:** The agent can maintain state across sessions—what exists, what the user has done, what worked, what didn't. A `context.md` file the agent reads and updates is layer one. More sophisticated approaches involve structured memory and learned preferences.

**Prompt refinement at multiple levels:**
- **Developer level:** You ship updated prompts that change agent behavior for all users
- **User level:** Users customize prompts for their workflow
- **Agent level:** The agent modifies its own prompts based on feedback (advanced)

**Self-modification (advanced):** Agents that can edit their own prompts or even their own code. For production use cases, consider adding safety rails—approval gates, automatic checkpoints for rollback, health checks. This is where things are heading.

The improvement mechanisms are still being discovered. Context and prompt refinement are proven. Self-modification is emerging. What's clear: the architecture supports getting better in ways traditional software doesn't.

**The test:** Does the application work better after a month of use than on day one, even without code changes?
</core_principles>

<intake>
## What aspect of agent-native architecture do you need help with?

1. **Design architecture** - Plan a new agent-native system from scratch
2. **Files & workspace** - Use files as the universal interface, shared workspace patterns
3. **Tool design** - Build primitive tools, dynamic capability discovery, CRUD completeness
4. **Domain tools** - Know when to add domain tools vs stay with primitives
5. **Execution patterns** - Completion signals, partial completion, context limits
6. **System prompts** - Define agent behavior in prompts, judgment criteria
7. **Context injection** - Inject runtime app state into agent prompts
8. **Action parity** - Ensure agents can do everything users can do
9. **Self-modification** - Enable agents to safely evolve themselves
10. **Product design** - Progressive disclosure, latent demand, approval patterns
11. **Mobile patterns** - iOS storage, background execution, checkpoint/resume
12. **Testing** - Test agent-native apps for capability and parity
13. **Refactoring** - Make existing code more agent-native

**Wait for response before proceeding.**
</intake>

<routing>
| Response | Action |
|----------|--------|
| 1, "design", "architecture", "plan" | Read `references/architecture-patterns.md`, then apply Architecture Checklist below |
| 2, "files", "workspace", "filesystem" | Read `references/files-universal-interface.md` and `references/shared-workspace-architecture.md` |
| 3, "tool", "mcp", "primitive", "crud" | Read `references/mcp-tool-design.md` |
| 4, "domain tool", "when to add" | Read `references/from-primitives-to-domain-tools.md` |
| 5, "execution", "completion", "loop" | Read `references/agent-execution-patterns.md` |
| 6, "prompt", "system prompt", "behavior" | Read `references/system-prompt-design.md` |
| 7, "context", "inject", "runtime", "dynamic" | Read `references/dynamic-context-injection.md` |
| 8, "parity", "ui action", "capability map" | Read `references/action-parity-discipline.md` |
| 9, "self-modify", "evolve", "git" | Read `references/self-modification.md` |
| 10, "product", "progressive", "approval", "latent demand" | Read `references/product-implications.md` |
| 11, "mobile", "ios", "android", "background", "checkpoint" | Read `references/mobile-patterns.md` |
| 12, "test", "testing", "verify", "validate" | Read `references/agent-native-testing.md` |
| 13, "review", "refactor", "existing" | Read `references/refactoring-to-prompt-native.md` |

**After reading the reference, apply those patterns to the user's specific context.**
</routing>

<architecture_checklist>
## Architecture Review Checklist

When designing an agent-native system, verify these **before implementation**:

### Core Principles
- [ ] **Parity:** Every UI action has a corresponding agent capability
- [ ] **Granularity:** Tools are primitives; features are prompt-defined outcomes
- [ ] **Composability:** New features can be added via prompts alone
- [ ] **Emergent Capability:** Agent can handle open-ended requests in your domain

### Tool Design
- [ ] **Dynamic vs Static:** For external APIs where agent should have full access, use Dynamic Capability Discovery
- [ ] **CRUD Completeness:** Every entity has create, read, update, AND delete
- [ ] **Primitives not Workflows:** Tools enable capability, don't encode business logic
- [ ] **API as Validator:** Use `z.string()` inputs when the API validates, not `z.enum()`

### Files & Workspace
- [ ] **Shared Workspace:** Agent and user work in same data space
- [ ] **context.md Pattern:** Agent reads/updates context file for accumulated knowledge
- [ ] **File Organization:** Entity-scoped directories with consistent naming

### Agent Execution
- [ ] **Completion Signals:** Agent has explicit `complete_task` tool (not heuristic detection)
- [ ] **Partial Completion:** Multi-step tasks track progress for resume
- [ ] **Context Limits:** Designed for bounded context from the start

### Context Injection
- [ ] **Available Resources:** System prompt includes what exists (files, data, types)
- [ ] **Available Capabilities:** System prompt documents tools with user vocabulary
- [ ] **Dynamic Context:** Context refreshes for long sessions (or provide `refresh_context` tool)

### UI Integration
- [ ] **Agent → UI:** Agent changes reflect in UI (shared service, file watching, or event bus)
- [ ] **No Silent Actions:** Agent writes trigger UI updates immediately
- [ ] **Capability Discovery:** Users can learn what agent can do

### Mobile (if applicable)
- [ ] **Checkpoint/Resume:** Handle iOS app suspension gracefully
- [ ] **iCloud Storage:** iCloud-first with local fallback for multi-device sync
- [ ] **Cost Awareness:** Model tier selection (Haiku/Sonnet/Opus)

**When designing architecture, explicitly address each checkbox in your plan.**
</architecture_checklist>

<quick_start>
## Quick Start: Build an Agent-Native Feature

**Step 1: Define atomic tools**
```typescript
const tools = [
  tool("read_file", "Read any file", { path: z.string() }, ...),
  tool("write_file", "Write any file", { path: z.string(), content: z.string() }, ...),
  tool("list_files", "List directory", { path: z.string() }, ...),
  tool("complete_task", "Signal task completion", { summary: z.string() }, ...),
];
```

**Step 2: Write behavior in the system prompt**
```markdown
## Your Responsibilities
When asked to organize content, you should:
1. Read existing files to understand the structure
2. Analyze what organization makes sense
3. Create/move files using your tools
4. Use your judgment about layout and formatting
5. Call complete_task when you're done

You decide the structure. Make it good.
```

**Step 3: Let the agent work in a loop**
```typescript
const result = await agent.run({
  prompt: userMessage,
  tools: tools,
  systemPrompt: systemPrompt,
  // Agent loops until it calls complete_task
});
```
</quick_start>

<reference_index>
## Reference Files

All references in `references/`:

**Core Patterns:**
- `references/architecture-patterns.md` - Event-driven, unified orchestrator, agent-to-UI
- `references/files-universal-interface.md` - Why files, organization patterns, context.md
- `references/mcp-tool-design.md` - Tool design, dynamic capability discovery, CRUD
- `references/from-primitives-to-domain-tools.md` - When to add domain tools, graduating to code
- `references/agent-execution-patterns.md` - Completion signals, partial completion, context limits
- `references/system-prompt-design.md` - Features as prompts, judgment criteria

**Agent-Native Disciplines:**
- `references/dynamic-context-injection.md` - Runtime context, what to inject
- `references/action-parity-discipline.md` - Capability mapping, parity workflow
- `references/shared-workspace-architecture.md` - Shared data space, UI integration
- `references/product-implications.md` - Progressive disclosure, latent demand, approval
- `references/agent-native-testing.md` - Testing outcomes, parity tests

**Platform-Specific:**
- `references/mobile-patterns.md` - iOS storage, checkpoint/resume, cost awareness
- `references/self-modification.md` - Git-based evolution, guardrails
- `references/refactoring-to-prompt-native.md` - Migrating existing code
</reference_index>

<anti_patterns>
## Anti-Patterns

### Common Approaches That Aren't Fully Agent-Native

These aren't necessarily wrong—they may be appropriate for your use case. But they're worth recognizing as different from the architecture this document describes.

**Agent as router** — The agent figures out what the user wants, then calls the right function. The agent's intelligence is used to route, not to act. This can work, but you're using a fraction of what agents can do.

**Build the app, then add agent** — You build features the traditional way (as code), then expose them to an agent. The agent can only do what your features already do. You won't get emergent capability.

**Request/response thinking** — Agent gets input, does one thing, returns output. This misses the loop: agent gets an outcome to achieve, operates until it's done, handles unexpected situations along the way.

**Defensive tool design** — You over-constrain tool inputs because you're used to defensive programming. Strict enums, validation at every layer. This is safe, but it prevents the agent from doing things you didn't anticipate.

**Happy path in code, agent just executes** — Traditional software handles edge cases in code—you write the logic for what happens when X goes wrong. Agent-native lets the agent handle edge cases with judgment. If your code handles all the edge cases, the agent is just a caller.

---

### Specific Anti-Patterns

**THE CARDINAL SIN: Agent executes your code instead of figuring things out**

```typescript
// WRONG - You wrote the workflow, agent just executes it
tool("process_feedback", async ({ message }) => {
  const category = categorize(message);      // Your code decides
  const priority = calculatePriority(message); // Your code decides
  await store(message, category, priority);   // Your code orchestrates
  if (priority > 3) await notify();           // Your code decides
});

// RIGHT - Agent figures out how to process feedback
tools: store_item, send_message  // Primitives
prompt: "Rate importance 1-5 based on actionability, store feedback, notify if >= 4"
```

**Workflow-shaped tools** — `analyze_and_organize` bundles judgment into the tool. Break it into primitives and let the agent compose them.

**Context starvation** — Agent doesn't know what resources exist in the app.
```
User: "Write something about Catherine the Great in my feed"
Agent: "What feed? I don't understand what system you're referring to."
```
Fix: Inject available resources, capabilities, and vocabulary into system prompt.

**Orphan UI actions** — User can do something through the UI that the agent can't achieve. Fix: maintain parity.

**Silent actions** — Agent changes state but UI doesn't update. Fix: Use shared data stores with reactive binding, or file system observation.

**Heuristic completion detection** — Detecting agent completion through heuristics (consecutive iterations without tool calls, checking for expected output files). This is fragile. Fix: Require agents to explicitly signal completion through a `complete_task` tool.

**Static tool mapping for dynamic APIs** — Building 50 tools for 50 API endpoints when a `discover` + `access` pattern would give more flexibility.
```typescript
// WRONG - Every API type needs a hardcoded tool
tool("read_steps", ...)
tool("read_heart_rate", ...)
tool("read_sleep", ...)
// When glucose tracking is added... code change required

// RIGHT - Dynamic capability discovery
tool("list_available_types", ...)  // Discover what's available
tool("read_health_data", { dataType: z.string() }, ...)  // Access any type
```

**Incomplete CRUD** — Agent can create but not update or delete.
```typescript
// User: "Delete that journal entry"
// Agent: "I don't have a tool for that"
tool("create_journal_entry", ...)  // Missing: update, delete
```
Fix: Every entity needs full CRUD.

**Sandbox isolation** — Agent works in separate data space from user.
```
Documents/
├── user_files/        ← User's space
└── agent_output/      ← Agent's space (isolated)
```
Fix: Use shared workspace where both operate on same files.

**Gates without reason** — Domain tool is the only way to do something, and you didn't intend to restrict access. The default is open. Keep primitives available unless there's a specific reason to gate.

**Artificial capability limits** — Restricting what the agent can do out of vague safety concerns rather than specific risks. Be thoughtful about restricting capabilities. The agent should generally be able to do what users can do.
</anti_patterns>

<success_criteria>
## Success Criteria

You've built an agent-native application when:

### Architecture
- [ ] The agent can achieve anything users can achieve through the UI (parity)
- [ ] Tools are atomic primitives; domain tools are shortcuts, not gates (granularity)
- [ ] New features can be added by writing new prompts (composability)
- [ ] The agent can accomplish tasks you didn't explicitly design for (emergent capability)
- [ ] Changing behavior means editing prompts, not refactoring code

### Implementation
- [ ] System prompt includes dynamic context about app state
- [ ] Every UI action has a corresponding agent tool (action parity)
- [ ] Agent tools are documented in system prompt with user vocabulary
- [ ] Agent and user work in the same data space (shared workspace)
- [ ] Agent actions are immediately reflected in the UI
- [ ] Every entity has full CRUD (Create, Read, Update, Delete)
- [ ] Agents explicitly signal completion (no heuristic detection)
- [ ] context.md or equivalent for accumulated knowledge

### Product
- [ ] Simple requests work immediately with no learning curve
- [ ] Power users can push the system in unexpected directions
- [ ] You're learning what users want by observing what they ask the agent to do
- [ ] Approval requirements match stakes and reversibility

### Mobile (if applicable)
- [ ] Checkpoint/resume handles app interruption
- [ ] iCloud-first storage with local fallback
- [ ] Background execution uses available time wisely
- [ ] Model tier matched to task complexity

---

### The Ultimate Test

**Describe an outcome to the agent that's within your application's domain but that you didn't build a specific feature for.**

Can it figure out how to accomplish it, operating in a loop until it succeeds?

If yes, you've built something agent-native.

If it says "I don't have a feature for that"—your architecture is still too constrained.
</success_criteria>

.agents/plugins/marketplace.jsonmarketplace

Show content (679 bytes)

{
  "name": "compound-engineering-plugin",
  "interface": {
    "displayName": "Compound Engineering"
  },
  "plugins": [
    {
      "name": "compound-engineering",
      "source": {
        "source": "local",
        "path": "./plugins/compound-engineering"
      },
      "policy": {
        "installation": "AVAILABLE",
        "authentication": "ON_INSTALL"
      },
      "category": "Coding"
    },
    {
      "name": "coding-tutor",
      "source": {
        "source": "local",
        "path": "./plugins/coding-tutor"
      },
      "policy": {
        "installation": "AVAILABLE",
        "authentication": "ON_INSTALL"
      },
      "category": "Coding"
    }
  ]
}

.cursor-plugin/marketplace.jsonmarketplace

Show content (745 bytes)

{
  "name": "compound-engineering",
  "owner": {
    "name": "Kieran Klaassen",
    "email": "kieran@every.to",
    "url": "https://github.com/kieranklaassen"
  },
  "metadata": {
    "description": "Cursor plugin marketplace for Every Inc plugins",
    "version": "1.0.1",
    "pluginRoot": "plugins"
  },
  "plugins": [
    {
      "name": "compound-engineering",
      "source": "compound-engineering",
      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last."
    },
    {
      "name": "coding-tutor",
      "source": "coding-tutor",
      "description": "Personalized coding tutorials with spaced repetition quizzes using your real codebase."
    }
  ]
}

.claude-plugin/marketplace.jsonmarketplace

Show content (1564 bytes)

{
  "name": "compound-engineering-plugin",
  "owner": {
    "name": "Kieran Klaassen",
    "url": "https://github.com/kieranklaassen"
  },
  "metadata": {
    "description": "Plugin marketplace for Claude Code and Codex extensions",
    "version": "1.0.2"
  },
  "plugins": [
    {
      "name": "compound-engineering",
      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last.",
      "author": {
        "name": "Kieran Klaassen",
        "url": "https://github.com/kieranklaassen",
        "email": "kieran@every.to"
      },
      "homepage": "https://github.com/EveryInc/compound-engineering-plugin",
      "tags": [
        "ai-powered",
        "compound-engineering",
        "workflow-automation",
        "code-review",
        "quality",
        "knowledge-management",
        "image-generation"
      ],
      "source": "./plugins/compound-engineering"
    },
    {
      "name": "coding-tutor",
      "description": "Personalized coding tutorials that build on your existing knowledge and use your actual codebase for examples. Includes spaced repetition quizzes to reinforce learning. Includes 3 commands and 1 skill.",
      "author": {
        "name": "Nityesh Agarwal"
      },
      "homepage": "https://github.com/EveryInc/compound-engineering-plugin",
      "tags": [
        "coding",
        "programming",
        "tutorial",
        "learning",
        "spaced-repetition",
        "education"
      ],
      "source": "./plugins/coding-tutor"
    }
  ]
}

README

Compound Engineering

AI skills and agents that make each unit of engineering work easier than the last.

Philosophy

Each unit of engineering work should make subsequent units easier -- not harder.

Traditional development accumulates technical debt. Every feature adds complexity. Every bug fix leaves behind a little more local knowledge that someone has to rediscover later. The codebase gets larger, the context gets harder to hold, and the next change becomes slower.

Compound engineering inverts this. 80% is in planning and review, 20% is in execution:

Plan thoroughly before writing code with /ce-brainstorm and /ce-plan
Review to catch issues and calibrate judgment with /ce-code-review and /ce-doc-review
Codify knowledge so it is reusable with /ce-compound
Keep quality high so future changes are easy

The point is not ceremony. The point is leverage. A good brainstorm makes the plan sharper. A good plan makes execution smaller. A good review catches the pattern, not just the bug. A good compound note means the next agent does not have to learn the same lesson from scratch.

Learn more

Full component reference - all agents and skills
Compound engineering: how Every codes with agents
The story behind compounding engineering

Workflow

/ce-strategy is upstream of the loop -- it captures the product's target problem, approach, persona, metrics, and tracks as a short durable anchor at STRATEGY.md. Ideate, brainstorm, and plan read it as grounding when present, so strategy choices flow into feature conception, prioritization, and spec.

The core loop is: brainstorm the requirements, plan the implementation, work through the plan, review the result, compound the learning, then repeat with better context.

Use /ce-ideate before the loop when you want the agent to generate and critique bigger ideas before choosing one to brainstorm. It produces a ranked ideation artifact, not requirements, plans, or code.

Skill	Purpose
`/ce-strategy`	Create or maintain `STRATEGY.md` -- the product's target problem, approach, persona, key metrics, and tracks. Read as grounding by ideate, brainstorm, and plan
`/ce-ideate`	Optional big-picture ideation: generate and critically evaluate grounded ideas, then route the strongest one into brainstorming
`/ce-brainstorm`	Interactive Q&A to think through a feature or problem and write a right-sized requirements doc before planning
`/ce-plan`	Turn feature ideas into detailed implementation plans
`/ce-work`	Execute plans with worktrees and task tracking
`/ce-debug`	Systematically reproduce failures, trace root cause, and implement fixes
`/ce-code-review`	Multi-agent code review before merging
`/ce-compound`	Document learnings to make future work easier
`/ce-product-pulse`	Generate a single-page, time-windowed pulse report on usage, performance, errors, and followups. Saves to `docs/pulse-reports/`

/ce-product-pulse is the read-side companion -- a time-windowed report on what users actually experienced and how the product performed over a given window (24h, 7d, etc.), saved to docs/pulse-reports/ so past pulses form a browseable timeline of user outcomes. The next strategy update and the next brainstorm get real signal to anchor to.

Each cycle compounds: brainstorms sharpen plans, plans inform future plans, reviews catch more issues, patterns get documented.

Quick Example

A typical cycle starts by turning a rough idea into a requirements doc, then planning from that doc before handing execution to /ce-work:

/ce-brainstorm "make background job retries safer"
/ce-plan docs/brainstorms/background-job-retry-safety-requirements.md
/ce-work
/ce-code-review
/ce-compound

For a focused bug investigation:

/ce-debug "the checkout webhook sometimes creates duplicate invoices"
/ce-code-review
/ce-compound

Getting Started

After installing, run /ce-setup in any project. It checks your environment, installs missing tools, and bootstraps project config.

The compound-engineering plugin currently ships 37 skills and 51 agents. See the full component reference for the complete inventory.

Install

Claude Code

/plugin marketplace add EveryInc/compound-engineering-plugin
/plugin install compound-engineering

Cursor

In Cursor Agent chat, install from the plugin marketplace:

/add-plugin compound-engineering

Or search for "compound engineering" in the plugin marketplace.

Codex

Three steps: register the marketplace, install the agent set, then install the plugin through Codex's TUI.

Register the marketplace with Codex:

codex plugin marketplace add EveryInc/compound-engineering-plugin

Install the Compound Engineering agents (Codex's plugin spec does not register custom agents yet):
```
bunx @every-env/compound-plugin install compound-engineering --to codex
```
Install the plugin through Codex's TUI: launch codex, run /plugins, find the Compound Engineering marketplace, select the compound-engineering plugin, and choose Install. Restart Codex after install completes. Codex's CLI does not currently have a subcommand for installing a plugin from an added marketplace -- the /plugins TUI is the canonical flow.

All three steps are needed. The marketplace registration plus TUI install handles skills; the Bun step adds the review, research, and workflow agents that skills like $ce-code-review, $ce-plan, and $ce-work spawn in Codex. Without the agent step, delegating skills will report missing agents.

Heads up: once Codex's native plugin spec supports custom agents, the Bun agent step goes away. The TUI install alone will be sufficient.

If you previously used the Bun-only Codex install, back up stale CE artifacts before switching:

bunx @every-env/compound-plugin cleanup --target codex

GitHub Copilot

For VS Code Copilot Agent Plugins:

Run Chat: Install Plugin from Source from the VS Code command palette
Use EveryInc/compound-engineering-plugin for the repo
Select compound-engineering when VS Code shows the plugins in this repository

For Copilot CLI, use:

Inside Copilot CLI:

/plugin marketplace add EveryInc/compound-engineering-plugin
/plugin install compound-engineering@compound-engineering-plugin

From a shell with the copilot binary:

copilot plugin marketplace add EveryInc/compound-engineering-plugin
copilot plugin install compound-engineering@compound-engineering-plugin

Copilot CLI reads the existing Claude-compatible plugin manifests, so no separate Bun install step is needed.

If you previously used the old Bun Copilot install, back up stale CE artifacts before switching to the native plugin:

bunx @every-env/compound-plugin cleanup --target copilot

Factory Droid

From a shell with the droid binary:

droid plugin marketplace add https://github.com/EveryInc/compound-engineering-plugin
droid plugin install compound-engineering@compound-engineering-plugin

Droid uses plugin@marketplace plugin IDs; here compound-engineering is the plugin and compound-engineering-plugin is the marketplace name. Droid installs the existing Claude Code-compatible plugin and translates the format automatically, so no Bun install step is needed.

If you previously used the old Bun Droid install, back up stale CE artifacts before switching to the native plugin:

bunx @every-env/compound-plugin cleanup --target droid

Qwen Code

qwen extensions install EveryInc/compound-engineering-plugin:compound-engineering

Qwen Code installs Claude Code-compatible plugins directly from GitHub and converts the plugin format during install, so no Bun install step is needed.

If you previously used the old Bun Qwen install, back up stale CE artifacts before switching to the native extension:

bunx @every-env/compound-plugin cleanup --target qwen

OpenCode, Pi, Gemini, and Kiro

This repo includes a Bun/TypeScript installer that converts the Compound Engineering plugin to OpenCode, Pi, Gemini CLI, and Kiro CLI.

bunx @every-env/compound-plugin install compound-engineering --to opencode
bunx @every-env/compound-plugin install compound-engineering --to pi
bunx @every-env/compound-plugin install compound-engineering --to gemini
bunx @every-env/compound-plugin install compound-engineering --to kiro

Pi prerequisites. Pi does not ship a native subagent primitive, so the Pi install depends on nicobailon/pi-subagents (required) and recommends edlsh/pi-ask-user for richer blocking user questions:

pi install npm:pi-subagents    # required — provides the `subagent` tool used by skills that dispatch parallel agents
pi install npm:pi-ask-user     # recommended — provides the `ask_user` tool; skills fall back to numbered options in chat when it is missing

To auto-detect custom-install targets and install to all:

bunx @every-env/compound-plugin install compound-engineering --to all

The custom install targets run CE legacy cleanup during install. To run cleanup manually for a specific target:

bunx @every-env/compound-plugin cleanup --target codex
bunx @every-env/compound-plugin cleanup --target opencode
bunx @every-env/compound-plugin cleanup --target pi
bunx @every-env/compound-plugin cleanup --target gemini
bunx @every-env/compound-plugin cleanup --target kiro
bunx @every-env/compound-plugin cleanup --target copilot   # old Bun installs only
bunx @every-env/compound-plugin cleanup --target droid     # old Bun installs only
bunx @every-env/compound-plugin cleanup --target qwen      # old Bun installs only
bunx @every-env/compound-plugin cleanup --target windsurf  # deprecated legacy installs only

Cleanup moves known CE artifacts into a compound-engineering/legacy-backup/ directory under the target root.

Local Development

bun install
bun test
bun run release:validate

From your local checkout

For active development -- edits to the plugin source are reflected immediately.

Claude Code -- add a shell alias so your local copy loads alongside your normal plugins:

alias cce='claude --plugin-dir ~/Code/compound-engineering-plugin/plugins/compound-engineering'

Run cce instead of claude to test your changes. Your production install stays untouched.

Codex and other targets -- run the local CLI against your checkout:

# from the repo root
bun run src/index.ts install ./plugins/compound-engineering --to codex

# same pattern for other targets
bun run src/index.ts install ./plugins/compound-engineering --to opencode

From a pushed branch

For testing someone else's branch or your own branch from a worktree, without switching checkouts. Uses --branch to clone the branch to a deterministic cache directory.

Unpushed local branches: If the branch exists only in a local worktree and has not been pushed, point --plugin-dir directly at the worktree path instead (e.g. claude --plugin-dir /path/to/worktree/plugins/compound-engineering).

Claude Code -- use plugin-path to get the cached clone path:

# from the repo root
bun run src/index.ts plugin-path compound-engineering --branch feat/new-agents
# Output:
#   claude --plugin-dir ~/.cache/compound-engineering/branches/compound-engineering-feat~new-agents/plugins/compound-engineering

The cache path is deterministic. Re-running updates the checkout to the latest commit on that branch.

Codex, OpenCode, and other targets -- pass --branch to install:

# from the repo root
bun run src/index.ts install compound-engineering --to codex --branch feat/new-agents

# works with any target
bun run src/index.ts install compound-engineering --to opencode --branch feat/new-agents

# combine with --also for multiple targets
bun run src/index.ts install compound-engineering --to codex --also opencode --branch feat/new-agents

Both features use the COMPOUND_PLUGIN_GITHUB_SOURCE env var to resolve the repository, defaulting to https://github.com/EveryInc/compound-engineering-plugin.

Shell aliases

Add to ~/.zshrc or ~/.bashrc. All aliases use the local CLI so there is no dependency on npm publishing. plugin-path prints just the path to stdout, so it composes with $().

CE_REPO=~/Code/compound-engineering-plugin

ce-cli() { bun run "$CE_REPO/src/index.ts" "$@"; }

# --- Local checkout (active development) ---
alias cce='claude --plugin-dir $CE_REPO/plugins/compound-engineering'

codex-ce() {
  ce-cli install "$CE_REPO/plugins/compound-engineering" --to codex "$@"
}

# --- Pushed branch (testing PRs, worktree workflows) ---
ccb() {
  claude --plugin-dir "$(ce-cli plugin-path compound-engineering --branch "$1")" "${@:2}"
}

codex-ceb() {
  ce-cli install compound-engineering --to codex --branch "$1" "${@:2}"
}

Usage:

cce                              # local checkout with Claude Code
codex-ce                         # install local checkout to Codex
ccb feat/new-agents              # test a pushed branch with Claude Code
ccb feat/new-agents --verbose    # extra flags forwarded to claude
codex-ceb feat/new-agents        # install a pushed branch to Codex

Codex installs keep generated plugin skills isolated under ~/.codex/skills/compound-engineering/ and do not write new files into ~/.agents. The installer removes old CE-managed .agents/skills symlinks when it can prove they point back to CE's Codex-managed store, which prevents stale Codex installs from shadowing Copilot's native plugin install.

Troubleshooting

Codex skills work but review or research delegation fails

Run the agent install step:

bunx @every-env/compound-plugin install compound-engineering --to codex

Native Codex plugin install handles skills. The Bun step installs the custom agents those skills delegate to.

Codex shows stale or duplicate CE skills

Back up old Bun-installed artifacts before switching to the native Codex plugin flow:

bunx @every-env/compound-plugin cleanup --target codex

Copilot, Droid, or Qwen loads stale CE skills

Back up old Bun-installed artifacts before using the native plugin path:

bunx @every-env/compound-plugin cleanup --target copilot
bunx @every-env/compound-plugin cleanup --target droid
bunx @every-env/compound-plugin cleanup --target qwen

Limitations

Codex native plugin install currently handles skills, not custom agents. The documented Bun followup is required until Codex supports agents in its native plugin spec.

OpenCode, Pi, Gemini, and Kiro installs are converter-backed and may change as those target formats evolve.

Release versions are owned by release automation. Routine feature PRs should not hand-bump plugin or marketplace manifest versions.

FAQ

Do I need Bun for Claude Code?

No. Claude Code installs directly from the plugin marketplace. Bun is only needed for converter-backed targets, Codex's current agent followup, local development, and cleanup of old converted installs.

Why does Codex need a separate Bun step?

Codex's native plugin flow installs skills from the Codex plugin manifest. It does not currently install the custom reviewer, researcher, and workflow agents that Compound Engineering skills delegate to. The Bun step fills that gap.

Where do I see all available skills and agents?

Read the Compound Engineering plugin README. It lists the current skill and agent inventory.

Where is release history?

GitHub Releases are the canonical release-notes surface. The root CHANGELOG.md points to that history.

About Contributions

About Contributions: Please don't take this the wrong way, but I do not accept outside contributions for any of my projects. I simply don't have the mental bandwidth to review anything, and it's my name on the thing, so I'm responsible for any problems it causes; thus, the risk-reward is highly asymmetric from my perspective. I'd also have to worry about other "stakeholders," which seems unwise for tools I mostly make for myself for free. Feel free to submit issues, and even PRs if you want to illustrate a proposed fix, but know I won't merge them directly. Instead, I'll have Claude or Codex review submissions via gh and independently decide whether and how to address them. Bug reports in particular are welcome. Sorry if this offends, but I want to avoid wasted time and hurt feelings. I understand this isn't in sync with the prevailing open-source ethos that seeks community contributions, but it's the only way I can move at this velocity and keep my sanity.

License

MIT