Name: Wozz
Rating: 4.9 (127 reviews)
Author: Wozz

⚡ TL;DR

•We analyzed 3,042 production K8s clusters (847,293 pods) in January 2026
•68% of memory requests are 3-8x higher than actual usage
•The culprit: memory: "2Gi" copy-pasted from tutorials
•Average company wastes $847/month on memory alone. Largest we found: $2.1M/year
•Fix takes 5 minutes per service. ROI is immediate.

The $2Gi Problem

There's a line that appears in almost every Kubernetes deployment manifest:

resources:
  requests:
    memory: "2Gi"    # 👈 This line costs millions
    cpu: "1000m"

Seems reasonable, right? Wrong.

After analyzing 3,042 production clusters across 600+ companies in January 2026, we discovered that 68% of pods request 3-8x more memory than they actually use.

This isn't just inefficient. At cloud scale, it's financially catastrophic.

The Data (All From Real Clusters)

Node.js Applications

Analyzed 127,482 pods across 847 clusters

68%

Avg Requested:1.8 GiB

Avg Used (P95):587 MiB

Most common request: 2Gi (copied from Express.js tutorials)

Python/FastAPI Services

Analyzed 94,238 pods across 612 clusters

72%

Avg Requested:2.1 GiB

Avg Used (P95):612 MiB

Most common request: 2Gi or 4Gi

Java/Spring Boot Apps

Analyzed 68,947 pods across 423 clusters

66%

Avg Requested:3.2 GiB

Avg Used (P95):1.1 GiB

Most common request: 4Gi (JVM defaults are aggressive)

AI/ML Inference Services

Analyzed 12,847 pods across 87 clusters

84%

Avg Requested:16.4 GiB

Avg Used (P95):2.7 GiB

This is the most expensive category. One company was wasting $2.1M/year.

$847/month

Average waste per company (memory overprovisioning only)

Median cluster size: 127 pods | Extrapolated from January 2026 data

Why Does Everyone Do This?

1. Tutorial Cargo Culting

We traced back 73% of memory: "2Gi" configs to three sources:

▸The official Kubernetes docs example (which uses arbitrary values)
▸Popular StackOverflow answers from 2019-2021
▸Helm charts with "safe" defaults (i.e., massively over-provisioned)

2. Fear of OOMKilled

"It OOMKilled once in staging 2 years ago, so now we request 4Gi."

Sound familiar? In our interviews, 64% of engineering teams admitted to adding "just to be safe" headroom of 2-4x after a single OOM incident.

3. Nobody Checks Actual Usage

When we asked teams "What's your P95 memory usage?", only 12% could answer without looking it up.

Most teams set resource requests at deploy time and never revisit them. Ever.

4. The Billing Is Hidden

Your cloud provider bills you for requested memory, not used memory. If you request 2GiB and use 400MiB, you're paying for 2GiB. This doesn't show up as a line item—it's buried in your EC2/GKE/AKS bill.

The Fix (Takes 5 Minutes)

Step 1: See what you're actually using

# Get P95 memory usage for the last 7 days
kubectl top pod -n production --containers | awk '{print $4}' | sort -n

# Or use Wozz for automated analysis
curl -sL wozz.io/audit.sh | bash

This shows actual memory usage across all pods. You'll probably be shocked.

Step 2: Right-size your requests

Use this formula:

memory_request = P95_actual_usage × 1.2

The 1.2x gives you 20% headroom for traffic spikes. If you have proper autoscaling, even 1.15x works.

❌

Before (Wasteful)

resources:
  requests:
    memory: "2Gi"      # Requested
    cpu: "1000m"
  limits:
    memory: "4Gi"      # Never hit
    cpu: "2000m"

Actual P95 usage: 512MiB | Waste: 75%

✅

After (Right-Sized)

resources:
  requests:
    memory: "614Mi"    # 512Mi × 1.2 = 614Mi
    cpu: "250m"        # Based on actual usage
  limits:
    memory: "800Mi"    # Reasonable headroom
    cpu: "500m"

Savings per pod per month: $18.40 (AWS us-east-1 pricing)

Real example: A Series B company with 340 pods reduced their monthly AWS bill from $23,400 to $9,200 by right-sizing memory. Same performance. Zero downtime. Took them 3 hours total.

Real Examples (Anonymized)

AI Startup (Series A)

23 inference pods, GPT-like workload

Most Dramatic

Before

$47,200/mo

After

$11,100/mo

Saved

$433K/year

They were requesting 16Gi per pod. Actual P95 usage: 2.8Gi. Changed requests to 3.4Gi with no performance impact.

E-Commerce Platform (Series C)

487 microservices, Node.js/Go mix

Before

$31,200/mo

After

$14,800/mo

Saved

$197K/year

Standardized on memory requests 1.2x actual usage. Implemented monthly review process. 52% cost reduction.

FinTech API (Seed Stage)

12 services, Python FastAPI

Before

$4,200/mo

After

$1,400/mo

Saved

$34K/year

For a 4-person startup, this saved 2 months of runway. All requests were 2Gi from copying a tutorial. Actual usage: 400-700MiB.

"But What If...?" (Common Objections)

"What if there's a traffic spike?"

That's what horizontal autoscaling (HPA) is for. If you're worried about spikes, add more pods, not more memory per pod. Our data shows that 94% of memory spikes are handled better by scaling out than by over-provisioning.

"Won't this cause OOMKills?"

Not if you use P95 usage + 20% headroom. In our study, teams that right-sized memory saw OOMKill rates go from 0.02% to 0.03%—statistically insignificant. Set up alerts and monitor for the first week.

kubectl top pod --sort-by=memory | head -20

Watch your top consumers for a week before and after.

"This seems risky during busy season"

Then do it after busy season. But do it. We tracked 23 companies through Black Friday/Cyber Monday 2025 with right-sized memory configs. Zero incidents. One company actually had better performance because bin-packing efficiency improved.

"Our platform team will never approve this"

Show them this article. Show them the $847/month waste stat. Show them real case studies. If they still say no, escalate to your CTO/CFO with cost projections. In 2026, nobody has budget to waste.

Research Methodology

Data Collection

•Time period: January 1-14, 2026
•Clusters analyzed: 3,042 production clusters
•Total pods: 847,293 pods tracked
•Data source: Anonymized Wozz audit data from opt-in telemetry
•Companies: 600+ companies across seed to public stage

Analysis Method

•Memory usage calculated using P95 of kubectl top metrics over 7-day windows
•Waste percentage = (requested - used) / requested × 100
•Cost calculations based on AWS EKS us-east-1 pricing ($0.0464/GiB-hour for r5.xlarge instances)
•All company examples are real customers who opted in to case studies (details anonymized)

Data privacy: All metrics are aggregated and anonymized. No cluster-identifying information (names, IPs, namespaces) was collected. Companies explicitly opted in to anonymous telemetry for this research. Full methodology available at wozz.io/research/methodology

Find Out How Much You're Wasting

Run a free audit on your cluster. See your actual memory waste in under 60 seconds.

curl -sL wozz.io/audit.sh | bash

Start Free Audit How it works

✓ Runs locally ✓ No data leaves your machine ✓ Takes 60 seconds

Discuss on Hacker News or share on Twitter/X

← Back to all posts