The Complete Guide to Kubernetes Cost Optimization
A comprehensive, actionable guide to reducing Kubernetes costs by 40%+. From resource requests to autoscaling, learn the exact techniques used by top engineering teams.
Why Kubernetes is So Expensive
The average company wastes 30-50% of their Kubernetes spend. That's $50K-$200K annually for mid-market companies, and millions for enterprises.
The root cause? Kubernetes makes it too easy to over-provision. Developers set resource limits "to be safe," and those limits stick around forever — even when actual usage is 10x lower.
The 7 Biggest Sources of Kubernetes Waste
1. Over-Provisioned Memory Limits (40% of waste)
This is the #1 cost killer. A pod requests 4GB RAM but uses 400MB. Kubernetes reserves that entire 4GB, so you're paying for 3.6GB of nothing.
❌ Bad Example
resources:
requests:
memory: "4Gi"
cpu: "2000m"
limits:
memory: "4Gi"
cpu: "2000m"
# Actual usage: 400Mi memory, 200m CPU
# Waste: $1,200/year per pod✅ Optimized
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi" # Headroom for spikes
cpu: "500m"
# Aligned with actual usage
# Savings: $1,200/year per pod2. Orphaned Load Balancers (20% of waste)
You deleted a Service but the cloud load balancer stayed alive. At $20-50/month per LB, 10 orphaned LBs = $6,000/year.
How to find them:
# List all LoadBalancer services kubectl get svc --all-namespaces -o json | \ jq '.items[] | select(.spec.type=="LoadBalancer") | .metadata.name' # Cross-reference with your cloud provider console # Delete orphaned LBs that aren't in kubectl output
3. Idle Development Clusters (15% of waste)
Dev/staging clusters running 24/7 when they're only used 9-5 weekdays.
Solution: Use Karpenter or cluster-autoscaler to scale down to zero nodes after hours. Or use tools like DevSpace/Tilt for local dev.
4. No Autoscaling (10% of waste)
Fixed replica counts sized for peak load, running 24/7.
Implement HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 705. Oversized Nodes (10% of waste)
Running 8xlarge instances when 4xlarge would suffice. Bin-packing efficiency matters.
6. No Spot/Preemptible Instances (5% of waste)
Spot instances are 70% cheaper. Use them for stateless workloads, batch jobs, and dev/test.
Step-by-Step Optimization Plan
Week 1: Discovery
- Run Wozz audit (2 min) to find all waste
- Identify top 10 over-provisioned pods (these are quick wins)
- Find orphaned load balancers and volumes
Week 2: Quick Wins
- Delete orphaned load balancers → save $500-2K/month
- Right-size top 10 pods → save $2-5K/month
- Scale down dev clusters at night → save $1-3K/month
Week 3-4: Systematic Optimization
- Implement HPA for top 20 services
- Enable cluster autoscaling
- Move batch jobs to spot instances
- Set up VPA (Vertical Pod Autoscaler) for automatic right-sizing
Advanced Techniques
Use Karpenter for Better Node Provisioning
Karpenter is 10x better than cluster-autoscaler. It provisions the exact right instance types based on pod requirements, reducing waste from poor bin-packing.
Implement Pod Disruption Budgets
PDBs let you aggressively scale down without risking availability. They ensure minimum replicas stay running during node drains.
Use Namespace Resource Quotas
Prevent teams from over-provisioning by setting namespace-level CPU/memory quotas.
Measuring Success
Track these metrics monthly:
- Total cluster cost (from cloud bill)
- Cost per pod (cluster cost / total pods)
- Resource utilization (actual usage / requested resources)
- Waste reduction % (vs. baseline month)
Target: 70%+ utilization and 40% cost reduction within 3 months.