Engineering cookbook

Five recipes (skill chains) cover most engineering work. Recipes are how skills compose into a finished result: fuzzy discovery, methodical shipping, design-aware frontend work that pairs Pace with impeccable, a pre-ship quality gate, and a disciplined bug hunt. Each step has a one-line description and a Why it matters note so you can tell when to skip it and when not to.

Install the engineering plugin

Pick the surface you actually work in. You can install in both; the plugin and your settings are the same either way.

In Cowork (claude.ai)

One-click install in the browser

Best if you mostly work in the claude.ai chat interface. Opens Cowork's plugin installer with the pace marketplace and the engineering plugin pre-selected.

Open Cowork installer ↗

In Claude Code (terminal)

For agents that touch your repo

For real coding work where the agent reads and writes files. Requires the claude CLI on PATH; the pace marketplace registers automatically the first time you install a plugin from it.

claude plugin install engineering@pace

How to run a recipe

The recipes below are skill chains. Each one ships with a kickoff prompt you can paste straight into a session.

1

Open a session where the plugin is installed.

Cowork (claude.ai) or Claude Code in your terminal. The agent needs the engineering plugin loaded; if it isn't, the slash commands the recipe uses won't resolve.
2

Copy the kickoff prompt from the recipe and paste it.

You don't type /prototype or /spec yourself. The natural-language kickoff prompt routes to the right skills; the slash-command chips in the chain are showing you what the agent will invoke under the hood.
3

Swap the example task for your real one.

Every kickoff prompt uses a placeholder task (e.g. bulk-CSV import). Replace it with whatever you're actually building before you send. Constraints, file paths, and acceptance criteria help the agent pick the right slice.
4

"Stop after step N" means the agent pauses and waits.

When the kickoff prompt says "stop after step 2 so I can review," the agent will run through step 2 and then wait for your reply. Type continue to proceed, redirect with a new instruction, or course-correct on the previous step's output.

Recipes

Skills compose. A real engineering task usually runs three to six skills in order, with the output of one feeding the next. The first three recipes cover the main shapes of engineering work; the last two tack onto the end of any of them: a quality gate before you ask a human to review, and a disciplined loop for hard bugs.

Recipe

From idea to first PR, with the thinking captured

Half a day of unstructured exploration becomes a tracked epic with one PR in the queue, ready to review.

When you're starting from something fuzzier than a spec. You'll prototype to learn, grill yourself on what you learned, write the PRD that captures the decision, split it into issues, and TDD the first one.

1

/prototype
Prototype

Build a throwaway version to feel the shape of the design. State, edges, UI variants, whichever is fuzzy.

Why it matters
You don't actually know what you want until you can play with it. A throwaway prototype turns vague intuitions into concrete reactions ("the modal flow is too many clicks"), and those reactions become real requirements. Skipping this step means specifying something you don't yet understand, and paying for the misunderstanding in PR-review churn.
2

/grill-me
Grill yourself

Force the questions you've been avoiding. Failure modes, partial state, concurrency, scale, telemetry, rollback. Every branch gets an answer.

Why it matters
Every plan has questions you've quietly skipped. Answering them now is a one-hour conversation; discovering them in production is an incident. The grilling is also context for the PRD: by the end you have explicit rationale for every decision, not just the decisions themselves.
3

/to-prd
Write the PRD

Turn the prototype and the grilling answers into a PRD: problem, constraints, decisions, non-goals. Saved to your project tracker.

Why it matters
Context dies fast. The prototype lives in a branch; the grilling lives in your chat transcript; both vanish the moment you /next or close the tab. Persisting the thinking to your tracker (Linear, GitHub, Notion) makes it durable, so the next agent, your teammate, or future-you can read it without rebuilding the mental model from scratch. The PRD is what survives the session.
4

/to-issues
Split into issues

Break the PRD into tracer-bullet vertical slices, one per issue. Each issue is sized to land as one PR.

Why it matters
One giant PR is unreviewable, unrevertable, and impossible to schedule around. Tracer-bullet slices (each end-to-end, each shippable) let you land value incrementally, catch design problems on slice one instead of slice five, and roll back surgically when a slice misbehaves in prod. Splitting also forces honesty about scope: anything that won't fit in a slice is a non-goal.
5

/tdd
TDD the first slice

Red-green-refactor on the first issue. Commits at every transition so the loop is visible in the diff.

Why it matters
Tests written first capture intent before implementation contaminates it; the red-green-refactor loop forces small, observable steps. Commits at each transition leave a navigable diff a reviewer can read like a story: "here's the failing test, here's the minimum code to pass it, here's the cleanup." Tests written after the fact almost always test what the code does, not what you wanted it to do.

Discovery kickoff

I want to build the bulk-CSV import feature. We need to handle 50k rows, validate against the existing user schema, and produce a downloadable error report.

Let's go through the discovery chain:
1. Prototype both the streaming-parse approach and the chunked-batch approach so I can feel the difference.
2. Grill me on the design: failure modes, partial uploads, concurrent users, what error reporting actually looks like.
3. Write the PRD from what we landed on, save it to Linear.
4. Split into issues sized for one PR each.
5. TDD the first issue.

Stop after each step so I can review.

Use when: you have an ask, not a spec. You don't yet know the right design.

Recipe

Stacked-PR shipping, end to end

A planned feature lands as a chain of small reviewable PRs, each rebased cleanly after the previous one merges.

When the spec already exists. You'll plan it as an epic, build it as a stack of PRs, address review comments, rebase the next one when the previous merges, and reset cleanly between tasks.

1

/spec
Spec the feature

Refine the ask into a parent epic with ordered sub-issues. Each sub-issue is scoped to a single PR. No code yet.

Why it matters
Planning before coding is cheaper than planning during coding. The spec turns a verbal ask into shared artifacts (a parent epic plus ordered sub-issues), so the dependency order is explicit and reviewers get a roadmap before the first PR shows up. No code yet is the point: design decisions are easier to change in an issue body than in a 600-line diff.
2

/implement
Implement the stack

Build the epic as stacked branches, one PR per sub-issue. Stops at PR boundaries so you can review before continuing.

Why it matters
Stacked PRs keep each diff small enough to review in one sitting. Stopping at PR boundaries is the safety net: you look at PR n before PR n+1 builds on top of any mistakes. The stack also surfaces design problems early: if slice two is suddenly hard, the slicing was wrong, and that's much cheaper to fix at slice two than at slice five.
3

/review
Address reviews

Walk PR comments grouped by file. Each one gets a fix-and-reply, a decline-with-rationale, or an ask-for-clarification.

Why it matters
Review comments rot if you batch them. Walking them one at a time forces an explicit decision per comment (fixed, declined with reason, or asking for more info), instead of a wall of unanswered threads. The reviewer sees their feedback being engaged with in real time, which makes the next review faster.
4

/topr
Rebase the next PR

After PR n merges via squash, rebase PR n+1 cleanly. The skill correctly drops the squash-merged commits.

Why it matters
After a squash-merge, the next PR's branch contains commits that no longer exist on main. A naive rebase produces duplicate commits or false conflicts where git can't tell what came from where. /topr knows the squash pattern and rebases cleanly, so the next PR's diff actually reflects only that PR's changes.
5

/next
Reset

Clean throwaway branch on the latest main. The next task starts from a known good state; the old branch stays in case you need it.

Why it matters
Starting the next task from a stale branch is a footgun: you inherit half-merged state, weird remote pointers, or accidentally commit unrelated changes from the previous task. /next gives you a known-clean starting point so what you build next is built on actual main, not on a snapshot of main from when you started two days ago.

Shipping kickoff

Spec the bulk-CSV import feature into a tracked epic with ordered sub-issues. Each sub-issue should be scoped to a single PR.

Once the spec is approved, implement it as a stack of PRs. Stop after PR #2 so I can review.

After each PR merges, run /impeccable audit on the changed frontend files to catch design regressions before addressing review comments. Then rebase the next PR on main and reset for the following task.

Use when: the design decisions are made. You know what to build and just need it shipped methodically.

Recipe

Frontend feature, design-aware

A UI feature ships with engineering discipline AND design craft, not one at the expense of the other.

When the work is user-facing. Pace owns the engineering loop; impeccable owns the visual + interaction craft. They pair through a Cowork session where you route between them.

1

/impeccable shape
Shape the UX

Plan the screens, states, copy, and edge cases before writing real code. Impeccable thinks in design; Pace will think in tests next.

Why it matters
If you start with code, you start with whichever screen happened to be easiest to scaffold, and the empty/loading/error states get bolted on later, badly. Shaping forces you to enumerate every state up front so the engineering work knows what it has to satisfy. Design decisions made here become test cases in the next step.
2

/prototype
Prototype

Build a runnable throwaway with several UI variants on one route. Toggle between them to feel which design holds up.

Why it matters
Designs in Figma look better than designs in the browser. A runnable prototype with two or three variants on the same route lets you click between them and feel which one survives contact with real data, real keyboard flow, and real width breakpoints. Cheaper to throw away a variant now than to argue about it in code review.
3

/tdd
TDD the logic underneath

Write the failing tests for the behavior the design implies. Keep the prototype open so the tests reflect what you actually want.

Why it matters
The design implies behavior: "saving optimistically rolls back on error," "hitting Escape closes the modal without losing form state." Capture each of those as a failing test before the real implementation exists. The prototype stays open as a reference so the tests describe the behavior you actually picked, not the behavior the implementation happens to ship with.
4

/implement
Implement

Promote the prototype + tests into real code. The tests stay green throughout; the design choices land as code.

Why it matters
With both a prototype and a green test suite in hand, the real implementation is mostly transcription. The tests act as guardrails so the design choices made in step 1 actually survive the rewrite. Refactors that drift away from the spec turn red immediately.
5

/impeccable polish
Polish

Typography, color, motion, accessibility, and spacing pass on the implemented feature. Catches what TDD cannot see.

Why it matters
Tests prove behavior; they don't prove a save button is the right size or that focus order is sane. The polish pass catches what assertions can't: typography hierarchy, motion that feels right, color contrast, keyboard a11y, spacing rhythm. Skipping it ships a feature that works and looks half-built.
6

/ship-pr
Ship the PR

Commit, push, open the PR. The PR description includes both the engineering test plan and the design rationale (impeccable's notes from the polish pass).

Why it matters
A PR description is documentation forever. Bundling the engineering test plan with the design rationale from the polish pass means a future reader (or auditor) can answer "why does this look the way it does?" without spelunking chat history.

Frontend feature kickoff

We're building the new account-settings page. Routes between impeccable (for design) and the engineering plugin (for code) across the session:

1. /impeccable shape the page: which states exist (loading, empty, error, multi-account), what the copy says, what the keyboard flow is.
2. Prototype two layouts so I can pick: tabs on the left vs. cards in a grid.
3. TDD the underlying logic (account switching, edit-then-save, optimistic UI rollback on error).
4. Implement against the tests.
5. /impeccable polish: typography hierarchy, color, motion on the save action, a11y on the form.
6. Ship the PR with both the engineering test plan and the design rationale.

Stop after step 2 so I can pick the layout.

Use when: the work is user-facing. Don't skip the polish pass; that's where the design actually lands.

Recipe

Pre-ship quality gate, before you ask for human review

A PR that looks 'done' goes through a structured second-opinion pass (your own re-read, a Codex sweep, a security check, and a real e2e), so the human reviewer is reading polished work, not first-pass work.

When the feature is functionally complete and tests pass, but you have not yet asked anyone to review it. This recipe is the gate you run yourself first. It tacks onto the end of any other recipe.

1

/careful-review
Re-read with fresh eyes

Walk the whole diff slowly and look hard for dead code, misleading variable names, errors that swallow instead of surface, and tests that do not actually assert anything. Fix what you find.

Why it matters
You wrote it; you've blown past the same patterns dozens of times by the time it's complete. A deliberate re-read after you think you're done catches the embarrassing things: a stub left over from debugging, a try/catch that hides a real error, a test that asserts on a value that's always true. Better to find them now than have a reviewer find them.
2

/codex-review
Get a second opinion from Codex

Run an OpenAI Codex review on the branch diff against main. Triage P0/P1 findings against the actual code: confirmed bugs get fixed and committed, false positives get explicitly ignored with rationale.

Why it matters
A different model trained differently catches different things. Codex is good at spotting concrete correctness issues (off-by-ones, missing null checks, race conditions, leaks) that a single-author pass tends to miss. The triage step matters: not every finding is real, and 'ignored with rationale' is a better artifact than 'ignored silently' for the next person who reads the PR.
3

/security-review
Security pass

OWASP top-10 + language-specific footguns on anything touching user input, auth, secrets, or external boundaries. Findings come prioritized P0/P1/P2 with concrete patches.

Why it matters
Most security bugs are boring (input validation, IDOR, secrets in code, unsafe deserialization) and a focused security pass catches them cheaply. Run it on any PR that touches a boundary; skip it on pure refactors that move existing code around without changing what reaches the outside world.
4

/e2e-test
Real end-to-end sweep

Boot the actual backend and frontend dev servers and drive the critical user flow with no mocks. Get a PASS/FAIL summary with logs and screenshots on any failure.

Why it matters
Unit tests prove the parts work; e2e proves they work together. A full sweep before merge is cheap insurance against the regressions that only show up under real network: auth tokens that don't round-trip, race conditions between two services, an API change that the mock didn't model.
5

/ship-pr
Open the PR with everything baked in

Commit any final fixes, push, and open the PR. The description includes the test plan, the e2e PASS summary, and a one-line note on what the quality gate caught and corrected.

Why it matters
The PR description is the artifact reviewers and future-you actually read. Bake in what the gate caught so the reviewer doesn't re-litigate work that's already been pressure-tested, and so the next person debugging a regression knows what was checked before merge.

Quality gate kickoff

I'm done with the bulk-CSV import branch and tests pass locally. Before I push and request review, run the pre-ship quality gate:

1. /careful-review the diff. Fix obvious issues you find.
2. /codex-review against main. Triage findings: fix the real ones, ignore the false positives with one-line rationale.
3. /security-review the upload endpoint specifically, since it accepts user-controlled filenames and parses untrusted content.
4. /e2e-test the upload → validation → error-report-download flow with real dev servers, no mocks.
5. /ship-pr with a description that includes the test plan, the e2e summary, and what the gate caught.

Stop after step 2 so I can look at the Codex findings before we proceed.

Use when: the PR is functionally done and you want it pre-vetted before a human spends time on it. Stacks onto the end of any other recipe; don't skip the triage step in /codex-review.

Recipe

Hard bug hunt, with the diagnosis captured

A reproducible repro, a real root-cause writeup, a regression test that would have caught it, and a PR description that explains why earlier tests missed the bug.

When something is broken and you don't yet know why. The recipe forces a reproduction before guessing at fixes, locks in a regression test before declaring victory, and ships the diagnosis as part of the PR so the next person hitting it has a signpost.

1

/diagnose
Reproduce → minimize → hypothesize

Run the disciplined diagnosis loop: reliable reproduction first, then a minimal failing case, then an evidence-backed hypothesis. No fix attempts until reproduction is reliable.

Why it matters
Ad-hoc debugging burns hours on the wrong hypothesis. Forcing reproduction first means you'll know whether your fix actually fixed anything. Without a repro, 'I think it's fixed' is the same as 'I have no idea.' Minimizing the case also makes the bug easier to write a regression test for in step three.
2

/debug
Walk the stack

Once you can reproduce, walk the specific stack trace or 'works locally, broken in prod' gap. Find the line where state diverges from what you assumed.

Why it matters
Reproduction tells you the bug is real; debug tells you which line is lying. The output is a writeup: where the value should have been set, why it wasn't, what the actual fix is. That writeup is half the value of fixing a hard bug, and it's what you'll paste into the PR.
3

/tdd
Write the regression test first

Write a failing test that reproduces the bug in the test suite, watch it fail for the right reason, then apply the fix and watch it pass. Commit the failing test and the fix separately so the diff tells the story.

Why it matters
Fixing first and 'adding a test after' almost always means the test is worthless: it'll pass with or without the fix because you wrote it knowing what the fix already did. Writing the test first proves you actually understand the bug, and locks in that this specific failure can't come back without somebody noticing.
4

/ship-pr
Ship with the diagnosis included

Open the PR. The description includes the reproduction steps, the root cause, why the existing tests missed it, and any guardrails added so a similar bug surfaces faster next time.

Why it matters
The diagnosis is the part that survives. Future-you, or a teammate, will hit something that looks similar a year from now, and a PR description that names the failure mode and the guardrails is the difference between five minutes of grep and five hours of repeated detective work.

Bug hunt kickoff

The /checkout endpoint is intermittently 10x slower in prod than it was two days ago. Run the bug-hunt recipe:

1. /diagnose: get a reliable reproduction (try synthetic load against staging), minimize it, and produce an evidence-backed hypothesis. Don't fix anything yet.
2. /debug the specific code path the hypothesis points at. Tell me where state diverges from what we assumed.
3. /tdd a regression test that reproduces the slowness as a failing test, then apply the fix.
4. /ship-pr with the reproduction, root cause, why our existing perf tests missed it, and what guardrail we're adding.

Stop after step 1 so I can sanity-check the hypothesis before you touch code.

Use when: production or staging is misbehaving and you don't yet know why. Don't shortcut into 'I'll just try a fix.' The loop is the value.

Want every skill spelled out?

The skill catalog lists every one of the 34 engineering skills with the SKILL.md frontmatter, trigger phrases, and source files. Use it as a reference; come back here for the recipes that compose them.

Not on engineering? The main cookbook has the per-team marquee recipes.

Install the engineering plugin

One-click install in the browser

For agents that touch your repo

How to run a recipe

Open a session where the plugin is installed.

Copy the kickoff prompt from the recipe and paste it.

Swap the example task for your real one.

"Stop after step N" means the agent pauses and waits.

Prototype

Grill yourself

Write the PRD

Split into issues

TDD the first slice

Spec the feature

Implement the stack

Address reviews

Rebase the next PR

Reset

Shape the UX

Prototype

TDD the logic underneath

Implement

Polish

Ship the PR

Re-read with fresh eyes

Get a second opinion from Codex

Security pass

Real end-to-end sweep

Open the PR with everything baked in

Reproduce → minimize → hypothesize

Walk the stack

Write the regression test first

Ship with the diagnosis included

Want every skill spelled out?