Service 04 / 10 / Run From $48k/mo × 24×7 on-call optional

Cloud & DevOps
without fragile prod.

Kubernetes you can reason about. Terraform you don’t fear. GitOps from day one. SLOs as the contract between engineering and the business — not a wall of green dashboards no one reads.

Start a project › Back to catalog

$48k

From / month

99.99%

Achieved SLO

< 30m

P1 MTTR

Avg. deploy time

Sec. 01 What we put in your accounts

Infrastructure as code,
not as a screenshot.

Every resource declared. Every secret rotated. Every deployment traceable from commit to pod.

A / Compute

Kubernetes & Nomad

Right-sized clusters — not a fleet you can’t afford to operate.

EKS / GKE 1.30+
Karpenter / autoscale cost-aware
GitOps (Argo/Flux) mandatory
Spot mix 60–80%
Service mesh only if needed

B / IaC

Terraform & Pulumi

Reproducible from a clean account, end-to-end.

Per-env workspaces isolated
Module registry versioned
Drift detection nightly
OPA policy pre-apply
Atlantis / TFC PR-gated

C / CI/CD

Continuous delivery

Trunk-based, gated, observable end-to-end.

GitHub / Buildkite parallel
Mean lead time < 1 hr
Canary rollouts Argo Rollouts
Rollback < 90s
Change failure rate < 5%

D / Observability

Logs / metrics / traces

Three signals, one query language, no $1M Datadog bills.

OpenTelemetry native
SLO budgets paged
Grafana / Tempo if cost matters
Honeycomb / DD if it earns it
Runbook per alert enforced

Sec. 02 Deliverables

Infrastructure that survives
your tenure.

Built so the next team can read it, run it, and improve it without rewriting it.

Terraform monorepo

Modular, versioned, OPA-gated. Apply from CI only — no laptop applies. Drift detected nightly and ticketed.

GitOps delivery

Argo CD or Flux managing all clusters. Promotion via PR. Manual kubectl removed from break-glass-only.

SLO catalog

Every user-facing service has a named SLO, an error budget, and a paged alert. Burn-rate alerts, not threshold alerts.

Incident process

Severity rubric, paging policy, incident commander rotation, postmortem template, action-item burndown. All in writing.

FinOps cycle

Monthly cost review. Per-team chargeback if useful. Targeted optimization PRs — not “please use fewer resources.”

Disaster recovery

Documented RPO/RTO per service. Restore drill quarterly, results published. No DR plan that hasn’t been tested.

Sec. 03 Pricing — three tiers

Three shapes
of operate.

From “help us run prod” to “take the pager.” Same engineers, different scope of responsibility.

Hands-on coaching

Embed light

From $48k/mo · 2 engineers

Embedded with your team
IaC + CI/CD foundation
SLO catalog + alert hygiene
You keep the pager

Most common

Embed standard

From $84k/mo · 3 engineers

All of light, plus:
Shared on-call rotation
Monthly capacity + cost review
Quarterly DR drill

Full operate

Run for you

From $128k/mo · 4–5 engineers

24×7 on-call we own
Named SLO contract
P1 MTTR < 30 min
Monthly written ops review

Sec. 04 How the engagement unfolds

Eight weeks
to boring.

Boring infrastructure is the goal. The first eight weeks are the unglamorous reshaping that gets you there.

01 / Week 1

Audit & plan

Written audit of current state: IaC coverage, deploy path, secret hygiene, alert quality, RPO/RTO. Plan with named owners and timelines.

02 / Week 2–4

IaC baseline

Bring 80% of resources under Terraform. Set up Atlantis or Terraform Cloud. Remove clickops paths. Drift detection in place.

03 / Week 5–6

GitOps + SLOs

Argo or Flux owns deploys. Every user-facing service gets a written SLO and a burn-rate alert. Runbooks for each.

04 / Week 7+

Take the pager

We join the rotation, then take it. Postmortems every incident. Monthly written ops review with engineering leadership.

Sec. 05 Frequently asked

Things buyers ask
on the first call.

If something isn’t answered here, ask in your intro email — we keep this list short on purpose.

We’re multi-cloud — is that a problem?+

No. Most of our work is AWS or GCP; we know both deeply, plus Cloudflare and Fly. We’ll say honestly when a multi-cloud constraint is paying its way and when it’s a tax.

Do you do bare metal or on-prem?+

For Kubernetes on bare metal: yes, with Talos/k0s. Pure on-prem with no cloud at all: usually not — let’s talk if it’s a real constraint.

Can you migrate us off Heroku / Render / Vercel?+

Yes, and we’ll first ask if you should. Many teams are paying a managed-platform premium for genuinely good leverage. Migration only makes sense at specific scale or pricing inflections.

What’s your stance on service mesh?+

Skeptical by default. Istio/Linkerd earns its place when you have cross-team mTLS, cross-cluster traffic, or genuine observability needs. For most teams it adds operational cost without ROI.

Sec. 06 Pairs well with

Other things
we do well.

09 / Security

Cybersecurity

Hardening that meets your compliance needs.

From $42k →

P-06 / Product

K8s hardening kit

Our day-one hardening kit, off-the-shelf.

$100 · one-time →

06 / Data

Data engineering

Pipelines that don’t silently lie.

From $54k/mo →

M-03 / Engagement

Operate & evolve

The long-term shape of running it together.

From $32k/mo →

Got something hard
that needs to be real?

Send a paragraph about the problem. We’ll come back inside 48 hours with a written take — team shape, cost envelope, riskiest assumptions.

hello@kvb.dev › Browse services

Cloud & DevOpswithout fragile prod.

Infrastructure as code,not as a screenshot.