Kubernetes Resource Management: Requests, Limits, ResourceQuota, and LimitRange

If you run Kubernetes long enough, you will eventually hit one of these two problems. Either a single misbehaving pod eats all the memory on a node and takes down everything scheduled next to it, or the scheduler refuses to place your pods even though kubectl top nodes says there is plenty of room. Both problems come from the same root cause: Kubernetes does not know how much CPU and memory your containers actually need unless you tell it.

That is what resource requests and limits are for. They are the contract between your workload and the scheduler. Get them right and the cluster packs pods efficiently, protects healthy workloads from noisy neighbors, and keeps nodes stable under pressure. Get them wrong, or skip them entirely, and you invite random OOMKilled events, mysterious CPU throttling, and unschedulable pods.

This tutorial is for developers, sysadmins, and DevOps engineers who already have a working cluster and want their workloads to behave predictably. If you do not have a cluster yet, see my earlier guide on installing a single-node Kubernetes cluster. It also pairs naturally with my guide on Kubernetes health checks, because probes and resource limits are the two halves of keeping a cluster stable.

Conceptual Overview

Every container in Kubernetes can declare two numbers for both CPU and memory: a request and a limit.

A request is the amount of a resource that Kubernetes guarantees to reserve for your container. The scheduler uses requests, and only requests, to decide which node a pod lands on. If a node has 4 CPUs and the pods already running on it request 3.5 CPUs total, a new pod requesting 1 CPU will not fit there, even if the node is mostly idle right now. Requests are about reservation, not current usage.

A limit is the hard ceiling a container is allowed to consume. If a container tries to use more memory than its limit, the kernel kills it with an OOMKilled (out of memory) status. If it tries to use more CPU than its limit, it does not get killed, it gets throttled, meaning the kernel slows it down to stay within the cap.

This difference matters a lot:

Memory is incompressible. You cannot give a process “a little less” memory on demand, so exceeding the limit means death.
CPU is compressible. A process can always run slower, so exceeding the CPU limit just means throttling, not termination.

Units you need to know

CPU is measured in cores, and you can use fractional values. 1 means one full core. 500m means 500 millicores, which is half a core. 100m is one tenth of a core.

Memory is measured in bytes, but you almost always use suffixes. Mi is mebibytes (1024 based) and Gi is gibibytes. So 256Mi is 256 mebibytes and 1Gi is one gibibyte. There are also the M and G (1000 based) variants, but Mi and Gi are the convention in most manifests.

Quality of Service classes

Based on how you set requests and limits, Kubernetes assigns each pod a QoS class that decides who gets evicted first when a node runs out of memory:

Guaranteed: every container has requests equal to limits for both CPU and memory. These pods are the last to be evicted.
Burstable: at least one container has a request lower than its limit (or only a request set). These pods can use spare capacity but are evicted before Guaranteed pods.
BestEffort: no requests or limits at all. These pods are the first to be killed when a node is under memory pressure.

The takeaway: setting nothing is the most dangerous option, because BestEffort pods are first on the chopping block.

Prerequisites

Before you start, make sure you have:

A running Kubernetes cluster (single-node is fine for this tutorial).
kubectl installed and configured to talk to your cluster.
The metrics-server add-on installed, so kubectl top works. Many distributions like k3s ship it by default.
Basic familiarity with YAML and the Linux command line.

Verify your cluster is reachable and metrics are available:

kubectl get nodes
kubectl top nodes

You should see node usage like this:

NAME              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-single-node   210m         5%     1430Mi          37%

If kubectl top nodes returns an error about metrics not being available, install metrics-server first:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Give it a minute, then try kubectl top nodes again.

Step 1: Deploy a Pod With No Limits

Let us start with the bad case so the improvements are obvious. Create a file called app.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
        - name: demo
          image: polinux/stress
          command: ["sleep", "3600"]

Apply it and check the QoS class:

kubectl apply -f app.yaml
kubectl get pod -l app=demo -o jsonpath='{.items[0].status.qosClass}{"\n"}'

BestEffort

This pod is BestEffort, meaning it has zero protection. If the node runs short on memory, this pod is the first thing Kubernetes kills, regardless of whether it is the actual culprit. The scheduler also treats it as requesting nothing, so it will happily pile many such pods onto a single node until that node falls over.

Step 2: Add Requests and Limits

Now let us give the container a proper resource block. Update the container section of app.yaml:

      containers:
        - name: demo
          image: polinux/stress
          command: ["sleep", "3600"]
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "256Mi"

Here is what this declares, and why each line matters:

requests.cpu: 100m tells the scheduler to reserve one tenth of a core for this pod. The pod will only be placed on a node that has that much CPU still unreserved.
requests.memory: 128Mi reserves 128 mebibytes. This is the floor the pod is guaranteed.
limits.cpu: 500m lets the pod burst up to half a core when spare CPU is available, but no further. Beyond this it gets throttled.
limits.memory: 256Mi is the hard memory ceiling. Cross it and the container is OOMKilled.

Apply and re-check the QoS class:

kubectl apply -f app.yaml
kubectl get pod -l app=demo -o jsonpath='{.items[0].status.qosClass}{"\n"}'

Burstable

The pod is now Burstable. It has a guaranteed floor (the request) and can use extra capacity up to the limit when the node has room. This is the most common and usually the most sensible class for normal web services.

You can confirm the values landed correctly:

kubectl describe pod -l app=demo | grep -A6 Limits

    Limits:
      cpu:     500m
      memory:  256Mi
    Requests:
      cpu:     100m
      memory:  128Mi

Step 3: Watch a Memory Limit Enforce Itself

Let us prove the memory limit is real. The polinux/stress image includes the stress tool, which can deliberately allocate memory. We will tell it to allocate 250 megabytes, comfortably above our 256Mi limit once overhead is included, and watch Kubernetes react.

Update the container command and args:

      containers:
        - name: demo
          image: polinux/stress
          command: ["stress"]
          args: ["--vm", "1", "--vm-bytes", "250M", "--vm-hang", "1"]
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "256Mi"

Apply it and watch the pod:

kubectl apply -f app.yaml
kubectl get pods -l app=demo -w

You will see the pod cycle through restarts as it repeatedly blows past the memory limit:

NAME                    READY   STATUS      RESTARTS      AGE
demo-7c9d4f8b6c-h2xkl   0/1     OOMKilled   1 (4s ago)    12s
demo-7c9d4f8b6c-h2xkl   0/1     CrashLoopBackOff   2       28s

There it is. The container tried to use more memory than its limit allowed, and the kernel terminated it. Press Ctrl+C to stop watching. You can confirm the reason with:

kubectl describe pod -l app=demo | grep -A3 "Last State"

    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

Exit code 137 (128 plus signal 9) is the classic fingerprint of an out-of-memory kill. This is exactly why you size memory limits carefully: too low and your app dies under normal load, too high and the limit stops protecting the node.

Set the args back to a harmless sleep and reapply to stabilize the pod before moving on:

          command: ["sleep"]
          args: ["3600"]

kubectl apply -f app.yaml

Step 4: Enforce Defaults With a LimitRange

Asking every developer to remember a resources block on every container does not scale. A LimitRange fixes this at the namespace level: it can inject default requests and limits into any container that does not specify its own, and it can reject pods whose values are unreasonable.

First create a dedicated namespace to experiment in:

kubectl create namespace team-a

Create limitrange.yaml:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      max:
        cpu: "2"
        memory: "1Gi"
      min:
        cpu: "50m"
        memory: "64Mi"

What each block does:

default is the limit applied to a container that sets no limit of its own.
defaultRequest is the request applied when none is specified.
max and min are guardrails: a container asking for more than max or less than min is rejected outright.

Apply it:

kubectl apply -f limitrange.yaml

Now deploy a bare pod with no resources block into that namespace:

kubectl run nolimits --image=nginx:1.27 -n team-a
kubectl get pod nolimits -n team-a -o jsonpath='{.spec.containers[0].resources}{"\n"}'

{"limits":{"cpu":"500m","memory":"256Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}

Even though we never specified any resources, the LimitRange injected sensible defaults automatically. This is one of the most effective ways to stop BestEffort pods from sneaking into a shared cluster.

Step 5: Cap a Whole Namespace With a ResourceQuota

A LimitRange controls individual containers. A ResourceQuota controls the total consumption of an entire namespace, which is how you stop one team from eating an entire cluster.

Create quota.yaml:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "2"
    requests.memory: 2Gi
    limits.cpu: "4"
    limits.memory: 4Gi
    pods: "10"

This says the team-a namespace may, in total, request up to 2 CPUs and 2Gi of memory, set limits up to 4 CPUs and 4Gi, and run no more than 10 pods. Apply it:

kubectl apply -f quota.yaml
kubectl describe resourcequota team-a-quota -n team-a

Name:            team-a-quota
Namespace:       team-a
Resource         Used   Hard
--------         ----   ----
limits.cpu       500m   4
limits.memory    256Mi  4Gi
pods             1      10
requests.cpu     100m   2
requests.memory  128Mi  2Gi

The Used column already reflects the nolimits pod from the previous step. As you add workloads, this fills up. When the namespace hits the cap, new pods are rejected with a clear error.

There is one important rule to remember: once a ResourceQuota for CPU or memory exists in a namespace, every pod in that namespace must declare requests and limits. A pod with no resources block will be rejected. This is exactly why a LimitRange pairs so well with a ResourceQuota: the LimitRange fills in the defaults so the quota does not break naive deployments.

Clean up the experiment when you are done:

kubectl delete namespace team-a

Common Mistakes and Troubleshooting

Pod stuck in Pending with “Insufficient cpu” or “Insufficient memory”. The scheduler cannot find a node with enough unreserved capacity to satisfy the request. Check the reason:

kubectl describe pod <pod-name> | grep -A5 Events

A line like 0/1 nodes are available: 1 Insufficient memory means your requests are larger than what any node has free. Either lower the request, add nodes, or look at what is already reserved with kubectl describe node.

Frequent OOMKilled events. Your memory limit is too low for what the app actually uses under load. Watch real usage over time with:

kubectl top pod <pod-name>

Then set the limit comfortably above the observed peak, not right at it.

App is slow even though CPU usage looks low. This is almost always CPU throttling. The container is hitting its CPU limit and being slowed down by the kernel, yet kubectl top may show usage sitting right at the cap rather than above it. Raise the CPU limit, or remove it entirely for latency-sensitive workloads while keeping the request.

Requests set far higher than real usage. This wastes capacity. The scheduler reserves the request whether or not the pod uses it, so oversized requests mean a node looks full while sitting mostly idle. Right-size requests based on actual measured usage.

Confusing requests with limits. Remember: requests are what the scheduler reserves and what determines placement, limits are the runtime ceiling. A common bug is setting a tiny request and a huge limit, which lets the scheduler overcommit a node badly and trigger evictions later.

Best Practices

Always set memory requests equal to memory limits for important workloads. Memory cannot be reclaimed gracefully, so a Guaranteed memory profile prevents surprise OOM kills caused by overcommit on the node.

Be more relaxed with CPU. Setting a CPU request without a tight limit is often the right call for latency-sensitive services, because it guarantees a baseline while still letting the app burst into spare capacity without being throttled.

Base your numbers on measurement, not guesses. Run the workload, watch kubectl top pod under realistic traffic, and size requests near the steady-state usage and limits above the peak. Copying values from a tutorial, including this one, is only a starting point.

Use a LimitRange in every shared namespace so that no pod accidentally lands as BestEffort. Pair it with a ResourceQuota when multiple teams or environments share a cluster, so a single namespace can never starve the others.

Combine resource management with health checks. Limits decide how much a pod can consume, and probes decide whether it is healthy, and together they keep nodes stable. If you have not configured probes yet, see my guide on liveness, readiness, and startup probes. Resource requests are also the foundation that horizontal autoscaling relies on, which I covered in Kubernetes autoscaling, since the HPA measures usage as a percentage of the request.

Conclusion

You have now set CPU and memory requests and limits on a real workload, watched a memory limit enforce itself with an OOMKilled event, and seen how QoS classes decide who survives node pressure. You also enforced sane defaults across a namespace with a LimitRange and capped total consumption with a ResourceQuota.

The single habit worth taking away is this: never run production pods as BestEffort. Give every container at least a request, set memory limits to protect your nodes, and measure real usage before you lock in numbers. That one discipline prevents the majority of “the cluster mysteriously fell over” incidents.

For next steps, try adding a ResourceQuota that also limits object counts like Services and ConfigMaps, explore the kubectl top output during a load test to right-size your values, and look into the Vertical Pod Autoscaler if you want Kubernetes to recommend requests for you automatically. Once resource management becomes routine, your cluster will pack workloads tightly and stay stable even when something goes wrong.