Requests vs Usage: The GKE Mistake Quietly Costing You Thousands
The single biggest source of wasted GKE spend isn't exotic. It's the gap between what your pods request and what they actually use. Here's how to find it and claw it back.
By The FeckBills team
Requests vs Usage: The GKE Mistake Quietly Costing You Thousands
If you only audit one thing in your GKE clusters this quarter, make it this: the gap between requested CPU/memory and actual usage. It is, by a wide margin, the most expensive habit in Kubernetes, and almost nobody notices, because the cluster keeps running fine.
That's the trap. Over-provisioning doesn't page you at 3am. It just bleeds money, every hour, forever.
Why the gap exists
When a developer ships a deployment, they set resource requests. Under deadline pressure, the safe move is to over-request: pick a generous number, ship it, move on. Nobody gets fired for a pod that has too much headroom. So the numbers drift upward and never come back down.
Meanwhile, the GKE scheduler reserves capacity based on requests, not usage. Request 2 vCPU and use 0.2, and you're paying for 2. The other 1.8 is reserved, unschedulable by anything else, and invisible on the surface.
How to measure it properly
The mistake people make is comparing requests against average usage. Average hides spikes. The right comparison is requests vs P95 usage over a representative window (14 days is a good default):
- Requests:
kubernetes.io/container/cpu/request_cores, grouped by namespace + container. - Usage:
kubernetes.io/container/cpu/core_usage_time, aligned to a rate, then taken at the 95th percentile.
P95 means you right-size against real peaks, not the mean. You never want to recommend cutting below what a workload actually hits under load.
The reclaimable figure is simply: (requests - P95 usage) x the hourly rate for that capacity. Do this per container, sum it up, and the number is usually uncomfortable.
What "good" looks like
- A healthy workload sits around 60-80% utilisation against requests at P95.
- Under ~40% and you're leaving real money on the table.
- Under ~10% and the workload is barely doing anything. Verify it's not standby/DR, then cut hard.
Don't chase 100%. Headroom is insurance. The goal is to remove the waste, not the safety margin.
The fix, in order of risk
- Lower CPU requests to P95 plus a buffer. CPU is compressible, so the worst case is a throttled pod, not a killed one.
- Lower memory requests more carefully. Memory is not compressible; under-request and you get OOM-killed. Leave a wider margin here.
- Adopt a VPA in recommendation mode so the numbers self-correct over time instead of drifting back up.
How FeckBills helps
This is the detector we built first, because it's where the money is. FeckBills reads requests and P95 usage straight from Cloud Monitoring (no kube API, no agent inside your cluster), groups them to the container level across replicas, and prices the gap as reclaimable capacity in £/mo. You get a ranked list of exactly which workloads to right-size and how much each one saves.
Run a read-only scan and see your number. Most teams find their first £400/mo in about 60 seconds.