← All posts
GKE2 June 2026 · 2 min read

Zombie Namespaces: Finding the Idle GKE Workloads Eating Your Budget

Whole namespaces sit in production reserving compute while doing essentially nothing. Here's how to find these zombies without nuking something that's deliberately on standby.

By The FeckBills team

Zombie Namespaces: Finding the Idle GKE Workloads Eating Your Budget

Every long-lived GKE cluster accumulates them: the legacy-search namespace from a project that shipped two years ago, the billing-v1 that got replaced by billing-v2 but never deleted, the proof-of-concept that became permanent. They still have pods. They still reserve CPU and memory. They do almost nothing.

We call them zombie namespaces, and they're worth real money once you add them up.

Why they survive

Nobody deletes them because nobody is sure. "Is anything still hitting this?" is a surprisingly hard question to answer confidently, so the safe move is to leave it running. Multiply that hesitation across a few dozen namespaces and a multi-team cluster, and you've got a meaningful chunk of your bill reserved for the dead.

The signature of a zombie

A namespace is a reclaim candidate when, over a 14-day window, it's:

  • Reserving compute, with non-zero requested cores.
  • Using almost none of it, with P95 used CPU near zero (say < 0.02 cores).
  • Quiet on the network, with negligible bytes in/out.
  • Getting no real traffic, with near-zero requests/sec across its load-balancer backends.

One signal alone can mislead. A batch job is idle most of the time but spikes nightly. A DR standby is quiet by design. That's why you want all four signals together, over a window long enough to catch periodic work.

Don't trust a single number; look at the shape

The most useful view isn't a verdict, it's a heatmap: used CPU per time bucket, per namespace, across the window. A live workload looks busy and textured. A zombie is a flat, dark line. Your eye catches it instantly, and you keep the judgment call, which is exactly where it belongs, because only you know that dr-failover is supposed to look dead.

Before you delete

  1. Check for a deliberate standby/DR purpose. Quiet is not the same as unused.
  2. Look at the owner labels and ask the team. A 30-second Slack message beats a postmortem.
  3. Scale to zero first, delete later. Set replicas to 0, wait a week, watch for screaming. If it's silent, remove it.
  4. Keep an ignore list for the namespaces that are supposed to idle (cert-manager, external-dns, and friends) so they stop showing up as noise.

How FeckBills helps

FeckBills maps every GKE namespace's real activity (used CPU, network, and request traffic joined from your HTTP(S) load balancers) into an activity heatmap, and tallies the reclaimable £/mo sitting in the idle ones. System namespaces and known idle-by-design add-ons are excluded automatically, and you can ignore anything that's a deliberate standby. It flags; you decide. Nothing is ever auto-killed.

See your namespace heatmap and find out which ones are pretending to work.

#gke#kubernetes#idle#namespaces#finops

See your number in 60 seconds

Read-only. Runs in your infra. You decide on every fix.

Run a free scan →