Kubernetes Health Checks: Liveness, Readiness, and Startup Probes

One of the biggest promises of Kubernetes is that it keeps your applications running without you babysitting them. But Kubernetes is not a mind reader. Out of the box, it only knows whether your container’s main process is alive. It has no idea whether your app is actually working: maybe it is stuck in a deadlock, maybe it is still loading a 2 GB model into memory, or maybe a downstream database connection has gone stale and every request now returns a 500.

This is exactly the gap that health checks, called probes in Kubernetes, are designed to close. Probes let you teach Kubernetes how to tell the difference between “the process exists” and “the app is healthy and ready to serve traffic.” Once you set them up correctly, Kubernetes will restart frozen containers, stop sending requests to pods that are not ready, and avoid killing apps that are simply slow to start.

This tutorial is for developers, sysadmins, and DevOps engineers who already have a working cluster and want their deployments to behave reliably under real conditions. If you do not have a cluster yet, see my earlier guide on installing a single-node Kubernetes cluster. A basic understanding of pods and deployments helps too, which I covered in understanding Kubernetes objects.

Conceptual Overview

Kubernetes has three kinds of probes, and the most common mistake people make is confusing them. Each one answers a different question.

A liveness probe answers: “Is this container still healthy, or has it gotten stuck?” If a liveness probe fails enough times, Kubernetes kills the container and restarts it. This is your recovery mechanism for deadlocks, memory leaks, or hung event loops that the process itself cannot detect.

A readiness probe answers: “Is this container ready to receive traffic right now?” If a readiness probe fails, Kubernetes does not restart the pod. Instead, it removes the pod from the Service endpoints, so no new requests are routed to it. Once the probe passes again, traffic resumes. This is how you avoid sending users to a pod that is still warming up or temporarily overloaded.

A startup probe answers: “Has this container finished booting yet?” It exists for slow-starting applications. While a startup probe is running, the liveness and readiness probes are disabled. This prevents Kubernetes from killing an app that simply needs 60 seconds to initialize, which would otherwise look like a liveness failure.

The simplest way to remember it: liveness restarts, readiness reroutes, and startup protects slow boots.

Probe mechanisms

Each probe checks health using one of these methods:

httpGet: Kubernetes sends an HTTP GET request to a path and port. Any status code from 200 to 399 counts as success.
tcpSocket: Kubernetes tries to open a TCP connection to a port. If the connection succeeds, the probe passes. Useful for databases or non-HTTP services.
exec: Kubernetes runs a command inside the container. Exit code 0 means success, anything else means failure.
grpc: For services that implement the gRPC health checking protocol.

Prerequisites

Before you start, make sure you have:

A running Kubernetes cluster (single-node is fine for this tutorial).
kubectl installed and configured to talk to your cluster.
Basic familiarity with YAML and the Linux command line.
Ubuntu as your working environment, though the kubectl commands are identical everywhere.

Verify your cluster is reachable:

kubectl get nodes

You should see at least one node in Ready state:

NAME              STATUS   ROLES           AGE   VERSION
k8s-single-node   Ready    control-plane   42d   v1.30.2

Step 1: Deploy an App Without Probes

Let us start with a plain deployment so you can see the default behavior, then improve it step by step. We will use the nginx image because it serves HTTP on port 80 out of the box.

Create a file called web.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: nginx:1.27
          ports:
            - containerPort: 80

Apply it:

kubectl apply -f web.yaml

Check the pod:

kubectl get pods -l app=web

NAME                   READY   STATUS    RESTARTS   AGE
web-6c9f8b7d54-xq2kt   1/1     Running   0          15s

Right now Kubernetes considers this pod healthy purely because the nginx process is running. If nginx were to hang while keeping its process alive, Kubernetes would never notice. Let us fix that.

Step 2: Add a Liveness Probe

We will add an HTTP liveness probe that checks the root path. If nginx stops responding to HTTP, Kubernetes will restart the container.

Update the container section of web.yaml:

      containers:
        - name: web
          image: nginx:1.27
          ports:
            - containerPort: 80
          livenessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3

Here is what each field means, and why it matters:

initialDelaySeconds: 5 waits 5 seconds after the container starts before the first check, giving nginx time to bind to port 80.
periodSeconds: 10 runs the check every 10 seconds.
timeoutSeconds: 2 marks the probe as failed if there is no response within 2 seconds.
failureThreshold: 3 requires 3 consecutive failures before Kubernetes restarts the container. This avoids restarting on a single transient hiccup.

Apply the change:

kubectl apply -f web.yaml

Now describe the pod to confirm the probe is registered:

kubectl describe pod -l app=web | grep -A1 Liveness

    Liveness:  http-get http://:80/ delay=5s timeout=2s period=10s #success=1 #failure=3

Watching a liveness probe in action

Let us deliberately break nginx so you can watch Kubernetes recover. We will rename the nginx binary inside the running container so the process can no longer serve requests after it is killed, forcing repeated restarts. First, exec into the pod and remove the default config to make nginx fail its health endpoint:

kubectl exec -it deploy/web -- rm /usr/share/nginx/html/index.html

Now nginx returns a 403 for /, which is still in the 200 to 399 success range only if directory listing is off. To get a clean failure, point the probe at a path that does not exist. Change the probe path to /healthz:

          livenessProbe:
            httpGet:
              path: /healthz
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3

Apply it and watch:

kubectl apply -f web.yaml
kubectl get pods -l app=web -w

After about 30 seconds (3 failures times 10 seconds) you will see the restart count climb:

NAME                   READY   STATUS    RESTARTS      AGE
web-7d8c5f9b6c-k4m2p   1/1     Running   1 (5s ago)    45s
web-7d8c5f9b6c-k4m2p   1/1     Running   2 (4s ago)    1m25s

This is the liveness probe doing its job. Because /healthz does not exist on plain nginx, the container keeps failing and Kubernetes keeps restarting it. Press Ctrl+C to stop watching, then set the path back to / and reapply so the pod stabilizes:

kubectl apply -f web.yaml

This little experiment teaches the most important lesson about liveness probes: point them at an endpoint that truly reflects health. A bad path turns a healthy app into a restart loop.

Step 3: Add a Readiness Probe

A liveness probe keeps a container alive. A readiness probe controls traffic. They are often configured to hit the same endpoint, but their effects are completely different.

Add a readiness probe alongside the liveness probe:

          readinessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 3
            periodSeconds: 5
            failureThreshold: 2

Apply it:

kubectl apply -f web.yaml

To see readiness in action we need a Service in front of the pods. Create web-svc.yaml:

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  selector:
    app: web
  ports:
    - port: 80
      targetPort: 80

Apply it and inspect the endpoints:

kubectl apply -f web-svc.yaml
kubectl get endpoints web

NAME   ENDPOINTS         AGE
web    10.42.0.23:80     20s

The pod IP appears in the endpoint list only because the readiness probe is passing. If you scale up and a new pod is still starting, its IP will be missing from this list until it becomes ready. That is the whole point: the Service never routes a request to a pod that says “not ready yet.”

You can prove this by scaling the deployment and watching readiness gate the traffic:

kubectl scale deployment web --replicas=3
kubectl get pods -l app=web

NAME                   READY   STATUS    RESTARTS   AGE
web-7d8c5f9b6c-k4m2p   1/1     Running   0          6m
web-7d8c5f9b6c-2nq8r   0/1     Running   0          2s
web-7d8c5f9b6c-9xvbf   0/1     Running   0          2s

The two new pods show 0/1 in the READY column for a moment. During that window they receive no traffic, even though they are Running. This is what makes rolling updates safe and is closely related to how scaling decisions work, which I covered in Kubernetes autoscaling.

Step 4: Add a Startup Probe for Slow Apps

Some applications take a long time to start. Think of a Java service that loads a huge classpath, or an AI inference server that loads a model into memory. If you give such an app a liveness probe with a short delay, Kubernetes will kill it before it ever finishes booting, then kill it again, forever.

The naive fix is a huge initialDelaySeconds on the liveness probe, but that also delays recovery once the app is running. The correct fix is a startup probe. While it is active, liveness and readiness are suspended. Only after the startup probe passes do the other probes begin.

Here is a startup probe that allows up to 5 minutes for boot (30 attempts times 10 seconds):

          startupProbe:
            httpGet:
              path: /
              port: 80
            failureThreshold: 30
            periodSeconds: 10

With this in place, the liveness probe can stay aggressive (fast detection of real hangs) without risking a slow-start death spiral. The math to remember is simple: failureThreshold multiplied by periodSeconds is the maximum time your app is allowed to start.

The Complete Manifest

Putting it all together, here is the full web.yaml with all three probes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: nginx:1.27
          ports:
            - containerPort: 80
          startupProbe:
            httpGet:
              path: /
              port: 80
            failureThreshold: 30
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /
              port: 80
            periodSeconds: 10
            timeoutSeconds: 2
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /
              port: 80
            periodSeconds: 5
            failureThreshold: 2

Notice that once a startup probe is present, the liveness probe no longer needs initialDelaySeconds, because the startup probe already covers the boot window.

Common Mistakes and Troubleshooting

Using the same heavy endpoint for liveness and readiness. If your liveness endpoint checks the database connection, a brief database outage will cause Kubernetes to restart every pod, turning a small problem into a full outage. Keep liveness checks lightweight and local, checking only “is this process responsive.” Put dependency checks in the readiness probe, where failure only pauses traffic instead of triggering restarts.

initialDelaySeconds too short. If you see pods stuck in a CrashLoopBackOff right after deploy, your liveness probe may be firing before the app is ready. Check events:

kubectl describe pod -l app=web | grep -A5 Events

A line like Liveness probe failed: Get "http://10.42.0.23:80/": dial tcp ... connection refused confirms the probe ran before the app bound its port. Add a startup probe or increase the initial delay.

Probe timeout too aggressive. The default timeoutSeconds is 1. For an app under load, a single slow response can flip the probe to failed. Raise timeoutSeconds to 2 or 3 for endpoints that occasionally take longer.

Pointing at a path that does not exist. As we saw in Step 2, a wrong path guarantees an endless restart loop. Always confirm the endpoint with a manual request first:

kubectl exec -it deploy/web -- curl -s -o /dev/null -w "%{http_code}\n" http://localhost:80/

A printed 200 confirms the path is valid before you wire it into a probe.

Forgetting readiness on rolling updates. Without a readiness probe, Kubernetes assumes a pod is ready the instant it starts, so a rolling update may send traffic to pods that are still warming up, causing brief 502 errors during every deploy.

Best Practices

Build a dedicated, lightweight health endpoint in your application, conventionally /healthz or /livez, that returns 200 quickly without touching databases or external services. This keeps liveness checks honest and cheap.

For readiness, expose a separate endpoint like /readyz that does verify critical dependencies such as the database and cache. This way a pod stops receiving traffic when it genuinely cannot serve requests, but is never restarted for a transient dependency blip.

Always tune the timing to your real boot time. Measure how long your app takes to start, then set the startup probe’s failureThreshold times periodSeconds to comfortably exceed it. Do not copy numbers blindly from tutorials, including this one.

Keep probes cheap. A probe that runs every few seconds across hundreds of pods can generate real load, so avoid expensive queries inside probe handlers.

Combine probes with sensible resource requests and limits so Kubernetes can schedule and protect your pods correctly. Health checks and resource management work hand in hand to keep a cluster stable under pressure.

Conclusion

You have now configured all three Kubernetes probe types and seen exactly how each one behaves. The liveness probe restarts containers that get stuck, the readiness probe keeps traffic away from pods that are not ready, and the startup probe gives slow applications room to boot without being killed. Together they turn Kubernetes from something that merely runs your process into something that actively keeps your service healthy.

The single most valuable habit to take away is this: separate “am I alive” from “am I ready” in your application code, and wire each to the matching probe. That one distinction prevents the majority of self-inflicted outages in production clusters.

For next steps, try adding probes to a real application of your own, experiment with exec and tcpSocket probes for non-HTTP services, and look into how readiness gates integrate with rolling update strategies. Once probes become second nature, your deployments will recover from failure on their own while you sleep.