Why Is My GKE Cluster So Expensive? 7 Places to Look First
A high GKE bill almost always comes down to the same handful of culprits, and the management fee isn't one of them. Here are the seven things to check, in order of how much money they usually hide.
By The FeckBills team
Why Is My GKE Cluster So Expensive? 7 Places to Look First
If your Google Kubernetes Engine bill is bigger than it should be, the cause is rarely exotic. In our own dogfooding and across the clusters we've scanned, the same seven culprits show up again and again. Here they are, ordered by how much money they typically hide.
1. Pods request far more than they use
This is the big one, usually 60-80% of reclaimable waste. Developers over-request CPU and memory to be safe, the scheduler reserves that capacity, and nobody ever dials it back. Compare requests against P95 usage per container. Anything under ~40% utilisation is money sitting idle. Lower CPU aggressively (it's compressible), memory carefully (it isn't).
2. Idle and zombie namespaces
Whole namespaces from old projects keep running pods that do nothing: legacy-v1, the proof-of-concept that became permanent, the service nobody calls anymore. They reserve compute around the clock. Look for namespaces with near-zero used CPU, network, and request traffic over two weeks.
3. Node pools that are the wrong shape
Even with right-sized pods, a badly shaped node pool strands capacity. If pods request 2.5 vCPU and nodes offer 4 allocatable, you waste 1.5 per node. And if a single un-evictable pod pins a node, the autoscaler can never scale it down. Check whether your pools shrink overnight or sit flat at peak.
4. Everything's on-demand, nothing's on spot
Interruption-tolerant workloads (stateless web tier, batch jobs, CI runners, dev/staging) can run on spot nodes at 60-91% off. If 100% of your compute is on-demand, you're leaving the single biggest discount in cloud computing untouched.
5. Orphaned disks, IPs, and snapshots
Not strictly cluster cost, but it rides along: leaked pvc-* persistent disks from deleted PVCs, reserved static IPs billing at a premium for nothing, snapshots of disks that no longer exist. Individually small, collectively a real line item.
6. Cluster sprawl
Every GKE cluster carries a ~$72/month management fee before a single pod runs, and only one zonal cluster is covered by the free credit. A graveyard of half-used dev/experiment clusters pays that floor over and over. Consolidate where you safely can.
7. No committed-use discounts on the steady baseline
Once you've right-sized (and only once you've right-sized), the stable floor of compute you run every hour can take a Committed Use Discount of 20-55%. Most teams never set them up. Just don't commit before cleaning up, or you'll lock in your waste for three years.
The order matters
Work top-down. Fixing #1 and #2 usually moves the bill more than everything below them combined, and you want to right-size before you commit (#7) so you're discounting a lean baseline, not a bloated one.
How FeckBills helps
FeckBills runs every one of these checks in a single read-only scan: requests-vs-usage, idle namespaces, node-pool waste, orphaned resources, all ranked in £/mo so you know exactly which lever to pull first. It runs in your own infra and never makes a change; you decide on every fix.