← Back to Blog
Technical·8 min read·December 11, 2024

Kubernetes Resource Requests vs Limits Explained

The complete guide to setting CPU and memory resources in Kubernetes. Includes YAML examples and common mistakes to avoid.

Quick Answer

  • Requests = What your container is guaranteed to get
  • Limits = The maximum your container can use
Example resource spec
resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

What Are Resource Requests?

Resource requests tell Kubernetes how much CPU and memory your container needs to run. The scheduler uses this to decide which node to place your pod on.

Key points:

  • Requests are guaranteed — your container will always have this much available
  • If a node can't satisfy the request, your pod won't be scheduled there
  • Requests affect bin packing — how efficiently pods fit on nodes

CPU Requests

CPU is measured in "millicores" (m). 1 CPU = 1000m.

  • 100m = 0.1 CPU core
  • 500m = 0.5 CPU core
  • 1000m or 1 = 1 full CPU core

Memory Requests

Memory is measured in bytes, with common suffixes:

  • 128Mi = 128 mebibytes (134 MB)
  • 1Gi = 1 gibibyte (1.07 GB)
  • 256M = 256 megabytes (avoid mixing Mi and M)

What Are Resource Limits?

Limits define the maximum resources your container can use. If your container tries to exceed these, Kubernetes takes action.

What Happens When You Exceed Limits?

ResourceWhat Happens
CPUContainer is throttled (slowed down)
MemoryContainer is OOMKilled (terminated)

This is why memory limits are dangerous if set too low—your app crashes. CPU limits just make your app slower.

How Requests and Limits Work Together

resources:
  requests:
    memory: "256Mi"    # Guaranteed: 256Mi
    cpu: "100m"        # Guaranteed: 0.1 CPU
  limits:
    memory: "512Mi"    # Maximum: 512Mi (OOMKill if exceeded)
    cpu: "200m"        # Maximum: 0.2 CPU (throttled if exceeded)

In this example:

  • Your container always has 256Mi memory and 0.1 CPU available
  • It can burst up to 512Mi memory and 0.2 CPU if the node has capacity
  • If it tries to use more than 512Mi memory, it gets killed
  • If it tries to use more than 0.2 CPU, it gets throttled

Quality of Service (QoS) Classes

Kubernetes assigns a QoS class based on how you set requests and limits:

Guaranteed (Best)

Requests = Limits for both CPU and memory. These pods are the last to be evicted.

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Burstable (Middle)

Requests set but less than limits, or only some resources specified.

resources:
  requests:
    memory: "256Mi"
    cpu: "100m"
  limits:
    memory: "512Mi"
    cpu: "500m"

BestEffort (Worst)

No requests or limits set. First to be evicted when node runs low on resources.

# No resources specified - DON'T DO THIS
spec:
  containers:
  - name: my-app
    image: my-app:latest

Best Practices

1. Always Set Requests

Without requests, Kubernetes can't schedule efficiently. Your pods become unpredictable.

2. Set Requests Based on Actual Usage

Look at your pod's P95 resource usage over 7-14 days. Set requests to that value.

Check actual usage
kubectl top pods -n <namespace>

3. Set Limits at 1.5-2x Requests

This gives headroom for spikes without massive over-provisioning.

4. Be Careful with CPU Limits

CPU throttling can cause latency spikes. Many teams remove CPU limits entirely and only set requests.

5. Memory Limits Are Important

Unlike CPU, memory can't be throttled—only killed. Always set memory limits to prevent runaway containers from affecting other pods.

Common Mistakes

Setting Limits Too Low

Results in constant OOMKills and throttling. Your app appears slow or crashes randomly.

Setting Limits Too High

Results in wasted money. You're paying for resources you'll never use.

The Waste Problem

A pod with 8Gi limit using 500Mi = 7.5Gi wasted.
At $7.20/GB/month = $54/month per pod.

Copying Defaults from Stack Overflow

Every app is different. Profile your actual workload instead of using generic values.

Find Your Over-Provisioned Pods

Run a free audit to see which pods are wasting resources:

curl -sL wozz.io/audit.sh | bash

Shows you exactly where requests exceed usage and how much you could save.

Summary

  • Requests = Guaranteed resources (affects scheduling)
  • Limits = Maximum resources (triggers throttle/kill)
  • Set requests based on P95 actual usage
  • Set limits at 1.5-2x requests for headroom
  • Always set requests; memory limits are critical