Kubernetes Resource Requests vs Limits Explained
The complete guide to setting CPU and memory resources in Kubernetes. Includes YAML examples and common mistakes to avoid.
Quick Answer
- Requests = What your container is guaranteed to get
- Limits = The maximum your container can use
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"What Are Resource Requests?
Resource requests tell Kubernetes how much CPU and memory your container needs to run. The scheduler uses this to decide which node to place your pod on.
Key points:
- Requests are guaranteed — your container will always have this much available
- If a node can't satisfy the request, your pod won't be scheduled there
- Requests affect bin packing — how efficiently pods fit on nodes
CPU Requests
CPU is measured in "millicores" (m). 1 CPU = 1000m.
100m= 0.1 CPU core500m= 0.5 CPU core1000mor1= 1 full CPU core
Memory Requests
Memory is measured in bytes, with common suffixes:
128Mi= 128 mebibytes (134 MB)1Gi= 1 gibibyte (1.07 GB)256M= 256 megabytes (avoid mixing Mi and M)
What Are Resource Limits?
Limits define the maximum resources your container can use. If your container tries to exceed these, Kubernetes takes action.
What Happens When You Exceed Limits?
| Resource | What Happens |
|---|---|
| CPU | Container is throttled (slowed down) |
| Memory | Container is OOMKilled (terminated) |
This is why memory limits are dangerous if set too low—your app crashes. CPU limits just make your app slower.
How Requests and Limits Work Together
resources:
requests:
memory: "256Mi" # Guaranteed: 256Mi
cpu: "100m" # Guaranteed: 0.1 CPU
limits:
memory: "512Mi" # Maximum: 512Mi (OOMKill if exceeded)
cpu: "200m" # Maximum: 0.2 CPU (throttled if exceeded)In this example:
- Your container always has 256Mi memory and 0.1 CPU available
- It can burst up to 512Mi memory and 0.2 CPU if the node has capacity
- If it tries to use more than 512Mi memory, it gets killed
- If it tries to use more than 0.2 CPU, it gets throttled
Quality of Service (QoS) Classes
Kubernetes assigns a QoS class based on how you set requests and limits:
Guaranteed (Best)
Requests = Limits for both CPU and memory. These pods are the last to be evicted.
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "500m"Burstable (Middle)
Requests set but less than limits, or only some resources specified.
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"BestEffort (Worst)
No requests or limits set. First to be evicted when node runs low on resources.
spec:
containers:
- name: my-app
image: my-app:latestBest Practices
1. Always Set Requests
Without requests, Kubernetes can't schedule efficiently. Your pods become unpredictable.
2. Set Requests Based on Actual Usage
Look at your pod's P95 resource usage over 7-14 days. Set requests to that value.
kubectl top pods -n <namespace>3. Set Limits at 1.5-2x Requests
This gives headroom for spikes without massive over-provisioning.
4. Be Careful with CPU Limits
CPU throttling can cause latency spikes. Many teams remove CPU limits entirely and only set requests.
5. Memory Limits Are Important
Unlike CPU, memory can't be throttled—only killed. Always set memory limits to prevent runaway containers from affecting other pods.
Common Mistakes
Setting Limits Too Low
Results in constant OOMKills and throttling. Your app appears slow or crashes randomly.
Setting Limits Too High
Results in wasted money. You're paying for resources you'll never use.
The Waste Problem
A pod with 8Gi limit using 500Mi = 7.5Gi wasted.
At $7.20/GB/month = $54/month per pod.
Copying Defaults from Stack Overflow
Every app is different. Profile your actual workload instead of using generic values.
Find Your Over-Provisioned Pods
Run a free audit to see which pods are wasting resources:
curl -sL wozz.io/audit.sh | bashShows you exactly where requests exceed usage and how much you could save.
Summary
- Requests = Guaranteed resources (affects scheduling)
- Limits = Maximum resources (triggers throttle/kill)
- Set requests based on P95 actual usage
- Set limits at 1.5-2x requests for headroom
- Always set requests; memory limits are critical