← All posts
GKE17 May 2026 · 3 min read

The 20-Minute Monthly GKE Cost-Hygiene Checklist

Cloud waste isn't a one-time cleanup; it grows back. Here's a tight monthly routine that keeps your GKE bill honest in about twenty minutes, plus the order to work it in.

By The FeckBills team

The 20-Minute Monthly GKE Cost-Hygiene Checklist

Here's the uncomfortable truth about cloud cost optimisation: it's not a project, it's a habit. You can do a heroic cleanup, save £3k/mo, feel great, and watch it all creep back within two quarters as new deployments over-request, new orphans accumulate, and new namespaces go quiet. Waste regenerates.

The antidote isn't more heroics. It's a short, boring, recurring routine. Here's one that takes about twenty minutes a month and works top-down from biggest impact to smallest.

1. Over-provisioning, the big one (6 min)

Pull requests vs P95 usage for every workload, grouped by namespace + container. Anything sitting under ~40% utilisation is a candidate. Lower CPU requests aggressively (it's compressible), memory cautiously (it's not). This is where 60-80% of your reclaimable spend lives, so it goes first.

2. Idle and zombie namespaces (4 min)

Scan namespace activity: used CPU, network, and request traffic over the last two weeks. Flat-line namespaces reserving compute but doing nothing are reclaim candidates. Verify nothing's a deliberate standby, then scale to zero and schedule the delete. Keep an ignore list for the legitimately-idle add-ons.

3. Orphaned disks and snapshots (3 min)

List unattached persistent disks and source-disk-gone snapshots. These are high-confidence, low-risk reclaims. Snapshot-then-delete the disks; bin the truly orphaned snapshots. While you're here, confirm your snapshot retention policy is actually expiring old ones.

4. Idle IPs and load balancers (2 min)

Reserved static IPs not attached to anything (idle ones are billed at a premium, so the orphans cost the most). Forwarding rules with zero healthy backends. Small individually; worth it across every project.

5. Node-pool shape and autoscaler health (3 min)

Is any pool sitting flat at high node count overnight? Check for pods pinning nodes (safe-to-evict: false, local storage, missing PDBs). Unblocking the autoscaler is often a bigger win than any single delete.

6. The "new since last month" diff (2 min)

The most valuable two minutes: what changed? A new namespace that's already idle, a new deployment that over-requests by 4x, a fresh batch of orphaned disks from a migration. Catching waste in the month it's born is ten times easier than excavating it a year later.

Make it stick

  • Same day each month. Put it in the calendar. Cost hygiene dies the moment it's "when I get around to it."
  • Track the trend, not just the number. Reclaimable-waste-over-time tells you whether your habit is winning or losing.
  • Share the £ saved. A visible monthly figure turns cost hygiene from a chore into a scoreboard.

How FeckBills makes it a 60-second job

Honestly, the twenty-minute version above is the manual path, and it's worth knowing how to do by hand. But the whole point of FeckBills is to collapse it: one read-only scan runs every detector across every project, ranks the waste in £/mo, tracks the trend month over month, and shows you the "new since last scan" diff automatically. Run it as a scheduled job and the checklist runs itself.

Automate your cost hygiene: read-only, your infra, your call on every fix.

#gke#finops#checklist#cost-optimization

See your number in 60 seconds

Read-only. Runs in your infra. You decide on every fix.

Run a free scan →