Kubernetes Autoscaling: HPA & VPA

Written by: Bagus Facsi Aginsa
Published at: 24 Nov 2024


Did you know that Kubernetes has the ability to automatically scale your pods up, down, in, and out? This feature is called autoscaling.

Autoscaling is a critical feature in Kubernetes that ensures your applications can handle varying loads by automatically adjusting the number of running instances. This means your applications can respond to traffic spikes without manual intervention and conserve resources during low-traffic periods.

In this tutorial, we’ll delve into the two primary types of autoscaling in Kubernetes: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). We’ll also provide a step-by-step guide on how to install and use them.

Understanding Horizontal Pod Autoscaler (HPA)

Horizontal Pod Autoscaler is a feature in Kubernetes that automatically scales the number of pod replicas based on observed CPU utilization (or other select metrics). It is particularly useful for scaling stateless applications that can run multiple instances independently.

How HPA Works:

  1. Metrics Server: HPA relies on the Metrics Server, which collects resource usage data.
  2. Autoscaling Controller: The HPA controller checks the metrics periodically and adjusts the number of pods based on the specified thresholds.

Installing the Metrics Server:

Before using HPA, ensure that the Metrics Server is installed in your cluster.

You can run this command to install the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

To verify installation, use this command:

kubectl get deployment metrics-server -n kube-system

You will get this response

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           3m42s

With a metric server available, you can check the Kubernetes node cpu and memory usage, for example like this:

kubectl top node

You will get the node’s usage like this one

NAME              CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
k8s-single-node   79m          3%     973Mi           49%

You can even check the cpu and memory usage on Pods level

kubectl top pod --all-namspaces

And it will show you all of the pod’s cpu and memory usage like this

NAMESPACE     NAME                                      CPU(cores)   MEMORY(bytes)
kube-system   coredns-7b98449c4-rmgjp                   2m           12Mi
kube-system   local-path-provisioner-595dcfc56f-qbkg5   1m           6Mi
kube-system   metrics-server-cdcc87586-k7zjm            6m           17Mi
kube-system   svclb-traefik-415e0e61-b9gmk              0m           0Mi
kube-system   traefik-d7c9c5778-46l28                   1m           33Mi

Creating a HPA:

Here’s an example to create an HPA for a deployment named my-deployment. Note that we must define the pod’s resources to be able to use the HPA.

Set Up Your deployment.yaml with resources defined

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
      - name: my-container
        image: nginx
        resources: 
          requests: 
            memory: "64Mi" 
            cpu: "125m" 
          limits: 
            memory: "128Mi" 
            cpu: "250m"

Apply this deployment

kubectl apply -f deployment.yaml

Create an hpa.yaml for the deployment

apiVersion: autoscaling/v1 
kind: HorizontalPodAutoscaler 
metadata: 
  name: my-deployment-hpa 
spec: 
  scaleTargetRef: 
    apiVersion: apps/v1 
    kind: Deployment 
    name: my-deployment
  minReplicas: 1 
  maxReplicas: 10 
  targetCPUUtilizationPercentage: 50

Apply the HPA

kubectl apply -f hpa.yaml

Verify HPA

kubectl get hpa

You will see the resource percentage of the pod

NAME                REFERENCE                  TARGETS       MINPODS   MAXPODS   REPLICAS   AGE
my-deployment-hpa   Deployment/my-deployment   cpu: 1%/50%   1         10        1          69s

In this example, HPA will scale the my-deployment deployment’s pod replicas between 1 and 10 based on CPU utilization, targeting 50% average CPU usage.

Understanding Vertical Pod Autoscaler (VPA)

Vertical Pod Autoscaler adjusts the resource requests and limits of running pods. It is beneficial for stateful applications or applications where horizontal scaling is not feasible.

How VPA Works:

  1. Admission Controller: VPA uses an admission controller to set resource requests and limits for new pods.
  2. Recommender, Updater, and Admission Controller: These components continuously monitor resource usage, provide recommendations, and update running pods.

Installing VPA:

First, Make sure the metric server is also installed. If not yet installed, you can see the HPA section where we install the metric server.

After that, check the compatibility table bellow. VPA and Kubernetes compatibility table

Since we use kubernetes version 1.30.1, we will use the 1.2version. To install VPA, run this command

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/vpa-release-1.2/vertical-pod-autoscaler/deploy/vpa-v1-crd-gen.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/vpa-release-1.2/vertical-pod-autoscaler/deploy/vpa-rbac.yaml

We can change the version either 1.2, 1.1, or 1.0 based on your Kubernetes version.

Creating a VPA

Before applying VPA, set up Your deployment.yaml with resources defined

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-deployment-2
spec:
  selector:
    matchLabels:
      app: my-app-2
  template:
    metadata:
      labels:
        app: my-app-2
    spec:
      containers:
      - name: my-container
        image: nginx
        resources: 
          requests: 
            memory: "64Mi" 
            cpu: "125m" 
          limits: 
            memory: "128Mi" 
            cpu: "250m"

Apply this deployment

kubectl apply -f deployment.yaml

Here’s an example to create a vpa.yaml for a deployment named my-deployment-2.

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-deployment
  updatePolicy:
    updateMode: "Auto"

Notice the spec.updatePolicy.updateMode is set to Auto, there are 4 modes available for VPA update mode:

  1. “Auto”: VPA assigns resource requests on pod creation and updates them on existing pods using the preferred update mechanism. Currently, this is equivalent to “Recreate” (see below). Once restart-free (“in-place”) updates of pod requests are available, they may be used as the preferred update mechanism by the “Auto” mode.
  2. “Recreate”: VPA assigns resource requests on pod creation and updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation (respecting the Pod Disruption Budget, if defined). This mode should be used rarely, only if you need to ensure that the pods are restarted whenever the resource request changes. Otherwise, prefer the “Auto” mode, which may take advantage of restart-free updates once they are available.
  3. “Initial”: VPA only assigns resource requests on pod creation and never changes them later.
  4. “Off”: VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

Apply this VPA:

kubectl apply -f vpa.yaml

Check Recommendations

kubectl describe vpa my-vpa

In this example, VPA will automatically adjust the resource requests and limits of the pods in the my-deployment deployment.

Conclusion

Autoscaling is a powerful feature that can greatly enhance the performance and efficiency of your applications in a Kubernetes environment. By using Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), you can ensure your applications are always running at optimal resource levels.

Horizontal Pod Autoscaler (HPA) helps you manage the number of pod replicas based on resource usage metrics, providing a simple and effective way to scale stateless applications.

Vertical Pod Autoscaler (VPA) adjusts the resource requests and limits of your pods, making it ideal for stateful applications or those where horizontal scaling isn’t feasible.

By following the steps outlined in this guide, you can install and configure both HPA and VPA in your Kubernetes cluster. This will enable your applications to handle varying loads efficiently, ensuring high availability and performance.

Remember, autoscaling is not a one-size-fits-all solution. Always test and monitor your autoscaling configurations to ensure they meet the specific needs of your applications.