Name: Wozz
Rating: 4.9 (127 reviews)
Author: Wozz

Quick Answer

Requests = What your container is guaranteed to get
Limits = The maximum your container can use

Example resource spec

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

What Are Resource Requests?

Resource requests tell Kubernetes how much CPU and memory your container needs to run. The scheduler uses this to decide which node to place your pod on.

Key points:

Requests are guaranteed — your container will always have this much available
If a node can't satisfy the request, your pod won't be scheduled there
Requests affect bin packing — how efficiently pods fit on nodes

CPU Requests

CPU is measured in "millicores" (m). 1 CPU = 1000m.

100m = 0.1 CPU core
500m = 0.5 CPU core
1000m or 1 = 1 full CPU core

Memory Requests

Memory is measured in bytes, with common suffixes:

128Mi = 128 mebibytes (134 MB)
1Gi = 1 gibibyte (1.07 GB)
256M = 256 megabytes (avoid mixing Mi and M)

What Are Resource Limits?

Limits define the maximum resources your container can use. If your container tries to exceed these, Kubernetes takes action.

What Happens When You Exceed Limits?

Resource	What Happens
CPU	Container is throttled (slowed down)
Memory	Container is OOMKilled (terminated)

This is why memory limits are dangerous if set too low—your app crashes. CPU limits just make your app slower.

How Requests and Limits Work Together

resources:
  requests:
    memory: "256Mi"    # Guaranteed: 256Mi
    cpu: "100m"        # Guaranteed: 0.1 CPU
  limits:
    memory: "512Mi"    # Maximum: 512Mi (OOMKill if exceeded)
    cpu: "200m"        # Maximum: 0.2 CPU (throttled if exceeded)

In this example:

Your container always has 256Mi memory and 0.1 CPU available
It can burst up to 512Mi memory and 0.2 CPU if the node has capacity
If it tries to use more than 512Mi memory, it gets killed
If it tries to use more than 0.2 CPU, it gets throttled

Quality of Service (QoS) Classes

Kubernetes assigns a QoS class based on how you set requests and limits:

Guaranteed (Best)

Requests = Limits for both CPU and memory. These pods are the last to be evicted.

resources:
  requests:
    memory: "512Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"
    cpu: "500m"

Burstable (Middle)

Requests set but less than limits, or only some resources specified.

resources:
  requests:
    memory: "256Mi"
    cpu: "100m"
  limits:
    memory: "512Mi"
    cpu: "500m"

BestEffort (Worst)

No requests or limits set. First to be evicted when node runs low on resources.

# No resources specified - DON'T DO THIS

spec:
  containers:
  - name: my-app
    image: my-app:latest

Best Practices

1. Always Set Requests

Without requests, Kubernetes can't schedule efficiently. Your pods become unpredictable.

2. Set Requests Based on Actual Usage

Look at your pod's P95 resource usage over 7-14 days. Set requests to that value.

Check actual usage

kubectl top pods -n <namespace>

3. Set Limits at 1.5-2x Requests

This gives headroom for spikes without massive over-provisioning.

4. Be Careful with CPU Limits

CPU throttling can cause latency spikes. Many teams remove CPU limits entirely and only set requests.

5. Memory Limits Are Important

Unlike CPU, memory can't be throttled—only killed. Always set memory limits to prevent runaway containers from affecting other pods.

Common Mistakes

Setting Limits Too Low

Results in constant OOMKills and throttling. Your app appears slow or crashes randomly.

Setting Limits Too High

Results in wasted money. You're paying for resources you'll never use.

The Waste Problem

A pod with 8Gi limit using 500Mi = 7.5Gi wasted.
At $7.20/GB/month = $54/month per pod.

Copying Defaults from Stack Overflow

Every app is different. Profile your actual workload instead of using generic values.

Find Your Over-Provisioned Pods

Run a free audit to see which pods are wasting resources:

curl -sL wozz.io/audit.sh | bash

Shows you exactly where requests exceed usage and how much you could save.

Summary

Requests = Guaranteed resources (affects scheduling)
Limits = Maximum resources (triggers throttle/kill)
Set requests based on P95 actual usage
Set limits at 1.5-2x requests for headroom
Always set requests; memory limits are critical