Product P-03 / 08 / Downloadable $50 USD × 84 items

Production-readiness checklist.

An 84-item review covering SLOs, observability, on-call, capacity, security and data. The exact list we run before any launch we own — not a generic Internet checklist.

Buy & download › Back to catalog

USD 50

One-time

Items

Six chapters
before first traffic.

Each item names the owner role, the evidence to collect, and the failure mode it prevents — so the list teaches as it gates.

A / SLOs

SLOs & error budgets

A target leadership signs and engineering can defend.

Per-service SLO written
Burn-rate alerts tuned
Error budget policy agreed
Trend dashboard monthly
14 items total

B / Observability

Three signals

Logs, metrics, traces — wired and queryable.

Structured logs JSON
RED metrics per-endpoint
Distributed traces ≥ 10% sample
Runbook per alert enforced
17 items total

C / On-call

On-call readiness

A paging story, not “whoever sees it first.”

Pager rotation PagerDuty
Escalation policy tested
Onboarding doc < 1 day
Drill cadence quarterly
11 items total

D / Capacity

Capacity + cost

You know the breakpoint before you reach it.

Load test against SLO
Headroom ≥ 2× peak
Cost ceiling alerted
Right-sizing review monthly
12 items total

Sec. 02 Deliverables

Use it as a gate.
Use it as a teacher.

Most teams run it once. The best teams keep it open and update it as they learn.

Notion database

Pre-built — drop in, assign owners, watch items move from red to green.

Spreadsheet version

For teams not on Notion. Same fields, same flow. Google Sheets and Excel.

PDF print

8-page printable version for whiteboard reviews or hand-off to ops.

Severity rubric

Per-item: must-have / should-have / nice-to-have, with explicit launch criteria.

Quarterly review

A schedule + facilitator notes for revisiting the list quarterly post-launch.

Sec. 03 Frequently asked

Things buyers ask
on the first call.

If something isn’t answered here, ask in your intro email — we keep this list short on purpose.

We’re pre-launch. Should we run this?+

Yes — that’s the ideal time. Three weeks before launch is when you can still afford to address gaps. Two days before launch is when you discover you can’t.

We’re already in production. Still useful?+

Yes. Most clients use it as a maturity review every six months — score yourself, prioritize the gaps, knock them down. Trend the score over quarters.

Do you tailor it per service type?+

The base list is general. The pack includes notes on which items apply differently for: batch jobs, stateful services, public APIs, customer-facing UIs, and AI features.

What’s the difference vs Google SRE workbook?+

The SRE book is the canon. This is what we found we actually use day-to-day after building it for 30+ launches. Less theory, more lists with named owners.

Production-readiness checklist.

Six chaptersbefore first traffic.

SLOs & error budgets

Three signals

On-call readiness

Capacity + cost

Use it as a gate.Use it as a teacher.

Notion database

Spreadsheet version

PDF print

Severity rubric

Quarterly review

Things buyers askon the first call.

Other thingswe do well.

Postmortem template

K8s hardening kit

Cloud & DevOps

Need it customizedfor your team?

Six chapters
before first traffic.

Use it as a gate.
Use it as a teacher.

Things buyers ask
on the first call.

Other things
we do well.

Need it customized
for your team?