← All posts
GKE29 May 2026 · 2 min read

Right-Sizing GKE Node Pools Without Taking Down Production

Pod requests are only half the story. If your node pools are the wrong shape, you pay for allocatable capacity nobody can schedule onto. Here's how to fix the pool, safely.

By The FeckBills team

Right-Sizing GKE Node Pools Without Taking Down Production

You can right-size every pod request perfectly and still overpay, because the nodes are the wrong shape. Node-pool waste is sneakier than pod waste: it hides in the gap between what a node can allocate and what your pods actually fit into.

The two ways node pools waste money

1. Low bin-packing density. If your pods request 2.5 vCPU each and your nodes have 4 allocatable, you fit one pod per node and strand 1.5 vCPU on every box. Pick a machine type whose allocatable capacity divides cleanly into your typical pod size and the stranded fraction shrinks dramatically.

2. Over-provisioned headroom that never scales down. The cluster autoscaler adds nodes under load but is conservative about removing them. A single un-evictable pod (no PodDisruptionBudget, local storage, or a safe-to-evict: false annotation) pins an entire node up indefinitely.

Diagnosing it

Look at each node pool over a couple of weeks:

  • Allocatable vs requested: how much of the pool's capacity is actually claimed by pods?
  • Requested vs used: how much of that is real work (this is the pod-level question)?
  • Node count over time: does it scale down at night/weekends, or sit flat at peak?

A pool that's flat at high node count but low utilisation is the prime target. Either the autoscaler can't downscale (something's pinning nodes) or the pool is simply over-sized for its workload.

The safe path to a smaller pool

Never resize a production pool in place. Create a new pool, drain onto it, delete the old one:

  1. Create a new node pool with the better machine type (and spot/preemptible where appropriate).
  2. Cordon the old pool's nodes so nothing new schedules there.
  3. Drain them gracefully, respecting PodDisruptionBudgets, so pods reschedule onto the new pool.
  4. Watch for a cycle, confirm everything's healthy, then delete the old pool.

This is a zero-downtime migration when your workloads have sane PDBs. If they don't, fixing the PDBs first is the real prerequisite, and it's the same fix that lets the autoscaler downscale properly afterward.

Unblock the autoscaler while you're in there

  • Add PodDisruptionBudgets so the autoscaler can evict and consolidate.
  • Audit cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotations; each one is a node that can never go away.
  • Watch out for pods using local SSD / hostPath, which pin nodes by design.

How FeckBills helps

FeckBills surfaces the node-pool and workload signals side by side: reserved vs used compute per namespace, and the reclaimable capacity once you right-size. It won't run kubectl drain for you (that's your call, in your hands), but it tells you precisely which pools and workloads are worth the migration, and what each one saves.

Start a read-only scan and find the pools carrying dead weight.

#gke#node-pools#rightsizing#autoscaler

See your number in 60 seconds

Read-only. Runs in your infra. You decide on every fix.

Run a free scan →