Close this search box.

MTVLab: Pioneering DevOps Cloud Training

Revolutionize Your Application Scalability with Kubernetes HPA: Tips and Best Practices

By Rajesh Gheware

In today’s digital age, application scalability is not just a feature but a necessity for surviving and thriving in the competitive landscape. Businesses must ensure their applications can handle varying loads efficiently without manual intervention. Here, Kubernetes Horizontal Pod Autoscaler (HPA) plays a pivotal role by automatically scaling the number of pods in a deployment, replicaset, or statefulset based on observed CPU utilization or other select metrics. As a seasoned Chief Architect with extensive experience in cloud computing and containerization, I’m here to guide you through revolutionizing your application scalability with Kubernetes HPA, offering practical insights and best practices.

Understanding Kubernetes HPA

Kubernetes HPA optimizes your application’s performance and resource utilization by automatically adjusting the number of replicas of your pods to meet your target metrics, such as CPU and memory usage. This dynamism ensures your application can handle sudden spikes in traffic or workloads, maintaining smooth operations and an optimal user experience.


Before diving into HPA, ensure you have:

  • A Kubernetes cluster running.
  • kubectl installed and configured to communicate with your cluster.

Step 1: Install Metrics Server

Metrics Server collects resource metrics from Kubelets and exposes them via the Kubernetes API for use by HPA. To install Metrics Server, follow these steps:

  1. Install the Metrics Server
kubectl apply -f
  1. Update Metrics Server:
kubectl edit deploy metrics-server -n kube-system

Add the below to the metrics-server container args

- --kubelet-insecure-tls

Save and exit (ESC :wq)

Verify that metrics server pods are running using the following command:

kubectl get deploy metrics-server -n kube-system

Step 2: Deploy Your Application

First, create a Deployment manifest for your application. This example specifies both CPU and memory requests and limits for the container.

apiVersion: apps/v1
kind: Deployment
  name: hello-application
  replicas: 1
      app: hello
        app: hello
      - name: hello-container
        image: brainupgrade/hello:1.0
            cpu: "100m"
            memory: "100Mi"
            cpu: "200m"
            memory: "200Mi"

Deploy this application to your cluster using kubectl:

kubectl apply -f deployment.yaml

Step 3: Create an HPA Resource

For autoscaling based on CPU and memory, Kubernetes doesn’t support using both metrics natively in the autoscaling/v1 API version. You’ll need to use autoscaling/v2beta2 which allows you to specify multiple metrics.

Create an HPA manifest that targets your deployment and specifies both CPU and memory metrics for scaling:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: hello-application-hpa
    apiVersion: apps/v1
    kind: Deployment
    name: hello-application
  minReplicas: 1
  maxReplicas: 10
  - type: Resource
      name: cpu
        type: Utilization
        averageUtilization: 50
  - type: Resource
      name: memory
        type: Utilization
        averageUtilization: 50

In this configuration, the HPA is set to scale the hello-application Deployment based on CPU and memory utilization. If either the average CPU utilization or the average memory utilization of the pods exceeds 50%, the HPA will trigger scaling actions.

Apply this HPA to your cluster:

kubectl apply -f hpa.yaml

Step 4: Generate Load to Test Autoscaling

To see the HPA in action, you may need to generate load on your application that increases its CPU or memory usage beyond the specified thresholds. How you generate this load will depend on the nature of your application.

Step 5: Monitor HPA

Monitor the HPA’s behavior with kubectl to see how it responds to the load:

kubectl get hpa hello-application-hpa --watch

You’ll see the number of replicas adjust based on the load, demonstrating how Kubernetes HPA can dynamically scale your application in response to real-world conditions.

Best Practices and Tips

  1. Define Clear Metrics: Besides CPU, consider other metrics for scaling, such as memory usage or custom metrics that closely reflect your application’s performance and user experience.
  2. Test Under Load: Ensure your HPA settings are tested under various load scenarios to find the optimal configuration that balances performance and resource usage.
  3. Monitor and Adjust: Use Kubernetes monitoring tools to track your application’s performance and adjust HPA settings as necessary to adapt to changing usage patterns or application updates.
  4. Use Cluster Autoscaler: In conjunction with HPA, use Cluster Autoscaler to adjust the size of your cluster based on the workload. This ensures your cluster has enough nodes to accommodate the scaled-out pods.
  5. Consider VPA and HPA Together: For comprehensive scalability, consider using Vertical Pod Autoscaler (VPA) alongside HPA to adjust pod resources as needed, though careful planning is required to avoid conflicts.


Kubernetes HPA is a powerful tool for ensuring your applications can dynamically adapt to workload changes, maintaining efficiency and performance. By following the steps and best practices outlined in this article, you can set up HPA in your Kubernetes cluster, ensuring your applications are ready to meet demand without manual scaling intervention.

Remember, the journey to optimal application scalability is ongoing. Continuously monitor, evaluate, and adjust your configurations to keep pace with your application’s needs and the evolving technology landscape. With Kubernetes HPA, you’re well-equipped to make application scalability a cornerstone of your operational excellence.


More Posts

Send Us A Message