Kubernetes Memory Limits: The #1 Cause of Cloud Waste
Why over-provisioned memory limits cost enterprises millions, and how to fix it without risking OOMKilled pods.
The Hidden Cost of "Safe" Memory Limits
Every Kubernetes engineer has done it: set memory limits to 8Gi "just to be safe" for an application that actually uses 500Mi. It feels responsible. It prevents OOMKilled pods. It lets you sleep at night.
But it's costing you thousands of dollars per month.
In our analysis of 500+ Kubernetes clusters, we found that memory over-provisioning accounts for 60% of total cloud waste. The average cluster wastes $3,400/month on unused memory alone.
Quick Example
A pod with 8Gi limit using 500Mi = 7.5Gi wasted
At $7.20/GB/month = $54/month per pod
With 50 similar pods = $2,700/month wasted
Understanding Memory Requests vs Limits
Before optimizing, you need to understand what these values actually do:
Memory Requests
- What it is: The guaranteed memory your pod gets
- Used for: Kubernetes scheduling decisions
- Impact: Determines which node your pod lands on
Memory Limits
- What it is: Maximum memory your pod can use
- Used for: Preventing runaway memory consumption
- Impact: Pod gets OOMKilled if it exceeds this
resources:
requests:
memory: "512Mi" # Guaranteed allocation
cpu: "250m"
limits:
memory: "1Gi" # Maximum allowed
cpu: "500m"The Fear-Based Provisioning Problem
Engineers set high memory limits because they're scared of production incidents. This fear is rational:
- OOMKilled pods cause downtime
- Memory leaks can spike usage unexpectedly
- Traffic spikes increase memory pressure
- Nobody gets fired for over-provisioning
But this fear leads to a 3-10x gap between what pods request and what they use.
How to Find Your Memory Waste
Run this command to see actual memory usage vs limits for all pods:
kubectl top pods --all-namespaces | sort -k4 -h -r | head -20Or use Wozz for an instant audit:
curl -sL wozz.io/audit.sh | bashBest Practices for Memory Limits
1. Set Requests Based on P95 Usage
Look at your pod's memory usage over 7-30 days. Set requests to the 95th percentile, not the average.
2. Set Limits at 1.5-2x Requests
This gives headroom for spikes without massive over-provisioning. A 2x ratio is safer; 1.5x is more efficient.
resources:
requests:
memory: "512Mi" # Based on P95 actual usage
limits:
memory: "1Gi" # 2x request for safety3. Use Vertical Pod Autoscaler (VPA)
VPA automatically adjusts resource requests based on actual usage. Start in "recommend" mode before enabling auto-updates.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Start with recommendations only4. Monitor and Iterate
Set up alerts for pods approaching their limits. Review and adjust monthly based on actual usage data.
Common Mistakes to Avoid
Setting Limits Equal to Requests
This guarantees QoS but leaves no room for legitimate spikes. You'll see frequent OOMKills during traffic peaks.
Not Setting Any Limits
Pods without limits can consume all node memory, causing evictions across multiple pods. Always set limits.
Copying Limits from Stack Overflow
Every application is different. "4Gi" isn't a universal answer. Profile your actual application.
The ROI of Right-Sizing
Teams that right-size memory typically see:
- 30-50% reduction in memory costs
- Better bin packing = fewer nodes needed
- Faster scheduling = quicker deployments
Find Your Memory Waste
Run a free audit to see exactly how much you're over-provisioning.
curl -sL wozz.io/audit.sh | bashSummary
Memory over-provisioning is the #1 cause of Kubernetes cloud waste. By setting requests based on P95 usage and limits at 1.5-2x requests, you can cut memory costs by 30-50% without risking stability.
Start with visibility: audit your current usage, identify the worst offenders, and right-size incrementally.