Name: Wozz
Rating: 4.9 (127 reviews)
Author: Wozz

The Hidden Cost of "Safe" Memory Limits

Every Kubernetes engineer has done it: set memory limits to 8Gi "just to be safe" for an application that actually uses 500Mi. It feels responsible. It prevents OOMKilled pods. It lets you sleep at night.

But it's costing you thousands of dollars per month.

In our analysis of 500+ Kubernetes clusters, we found that memory over-provisioning accounts for 60% of total cloud waste. The average cluster wastes $3,400/month on unused memory alone.

Quick Example

A pod with 8Gi limit using 500Mi = 7.5Gi wasted
At $7.20/GB/month = $54/month per pod
With 50 similar pods = $2,700/month wasted

Understanding Memory Requests vs Limits

Before optimizing, you need to understand what these values actually do:

Memory Requests

What it is: The guaranteed memory your pod gets
Used for: Kubernetes scheduling decisions
Impact: Determines which node your pod lands on

Memory Limits

What it is: Maximum memory your pod can use
Used for: Preventing runaway memory consumption
Impact: Pod gets OOMKilled if it exceeds this

deployment.yaml

resources:
  requests:
    memory: "512Mi"   # Guaranteed allocation
    cpu: "250m"
  limits:
    memory: "1Gi"     # Maximum allowed
    cpu: "500m"

The Fear-Based Provisioning Problem

Engineers set high memory limits because they're scared of production incidents. This fear is rational:

OOMKilled pods cause downtime
Memory leaks can spike usage unexpectedly
Traffic spikes increase memory pressure
Nobody gets fired for over-provisioning

But this fear leads to a 3-10x gap between what pods request and what they use.

How to Find Your Memory Waste

Run this command to see actual memory usage vs limits for all pods:

kubectl top pods --all-namespaces | sort -k4 -h -r | head -20

Or use Wozz for an instant audit:

curl -sL wozz.io/audit.sh | bash

Best Practices for Memory Limits

1. Set Requests Based on P95 Usage

Look at your pod's memory usage over 7-30 days. Set requests to the 95th percentile, not the average.

2. Set Limits at 1.5-2x Requests

This gives headroom for spikes without massive over-provisioning. A 2x ratio is safer; 1.5x is more efficient.

Recommended ratio

resources:
  requests:
    memory: "512Mi"   # Based on P95 actual usage
  limits:
    memory: "1Gi"     # 2x request for safety

3. Use Vertical Pod Autoscaler (VPA)

VPA automatically adjusts resource requests based on actual usage. Start in "recommend" mode before enabling auto-updates.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Start with recommendations only

4. Monitor and Iterate

Set up alerts for pods approaching their limits. Review and adjust monthly based on actual usage data.

Common Mistakes to Avoid

Setting Limits Equal to Requests

This guarantees QoS but leaves no room for legitimate spikes. You'll see frequent OOMKills during traffic peaks.

Not Setting Any Limits

Pods without limits can consume all node memory, causing evictions across multiple pods. Always set limits.

Copying Limits from Stack Overflow

Every application is different. "4Gi" isn't a universal answer. Profile your actual application.

The ROI of Right-Sizing

Teams that right-size memory typically see:

30-50% reduction in memory costs
Better bin packing = fewer nodes needed
Faster scheduling = quicker deployments

Find Your Memory Waste

Run a free audit to see exactly how much you're over-provisioning.

curl -sL wozz.io/audit.sh | bash

Summary

Memory over-provisioning is the #1 cause of Kubernetes cloud waste. By setting requests based on P95 usage and limits at 1.5-2x requests, you can cut memory costs by 30-50% without risking stability.

Start with visibility: audit your current usage, identify the worst offenders, and right-size incrementally.