← All posts
FinOps23 May 2026 · 2 min read

P95, Not Average: How to Read Cloud Monitoring Like a FinOps Engineer

Averages lie about cloud usage. If you right-size on the mean you'll cause outages; if you right-size on the max you'll never save anything. The 95th percentile is where the truth lives.

By The FeckBills team

P95, Not Average: How to Read Cloud Monitoring Like a FinOps Engineer

Every cost-optimisation decision rests on one question: how much does this workload actually use? Get the statistic wrong and you either cause an outage or save nothing. The single most useful habit you can build is to stop looking at averages and start looking at the 95th percentile.

Why the average betrays you

A workload that idles all night and spikes hard at 9am has a comfortable-looking average. Right-size to that average and you'll cap it below its morning peak: hello throttling and OOM kills. The mean smooths away exactly the moments that matter.

The max has the opposite problem: a single freak spike (a deploy, a backfill, a retry storm) sets a ceiling you'll provision against forever. You'll be "safe" and broke.

P95 is the goldilocks number. It says: "95% of the time, usage is at or below this." You size against real, recurring peaks and ignore the once-a-fortnight anomaly. Headroom stays; waste goes.

Reading the metrics that matter

For GKE rightsizing, the two metrics to internalise:

  • kubernetes.io/container/cpu/request_cores: what you reserved (and pay for).
  • kubernetes.io/container/cpu/core_usage_time: what you used. Align it to a rate over a sensible window, then take the 95th percentile across the series.

Group both by namespace + container_name so you're comparing like with like across replicas. The reclaimable number falls straight out: (request - P95 usage) x rate.

Windows and alignment: the details that trip people up

  • Window length: 14 days is a good default. It catches weekly patterns (weekday vs weekend) without drowning in a quarter of noise.
  • Alignment period: too fine and you chase momentary spikes; too coarse and you smear peaks into the average. A few-minute alignment is usually right.
  • Beware cold workloads: a service deployed three days ago in a 14-day window will look artificially quiet. Sanity-check the data density before trusting the verdict.

The mindset shift

FinOps isn't about a dashboard; it's about asking "what does this actually do, statistically?" and acting on the answer with appropriate caution. P95 is the tool that lets you be aggressive about waste and conservative about risk at the same time. That's the whole game.

A note on memory

Everything above applies to CPU, which is compressible: throttle it and a pod slows down. Memory is not compressible: under-provision and the kernel OOM-kills the pod. For memory, lean toward a higher percentile (P99) or P95 plus a wider buffer. The cost of being wrong is asymmetric, so size accordingly.

How FeckBills helps

FeckBills uses P95, never the mean, for exactly these reasons, with both requests and usage pulled from Cloud Monitoring, grouped to the container level, priced as reclaimable capacity. You get the statistically honest version of "how much is this wasting," not a misleading average that gets someone paged.

Run a scan and see your P95 picture.

#finops#monitoring#metrics#rightsizing

See your number in 60 seconds

Read-only. Runs in your infra. You decide on every fix.

Run a free scan →