The Project | About Us | Contribute | Donations | License

HOME

Horizontal Pod Autoscaling (HPA) is a feature in Kubernetes that automatically scales the number of pods in a deployment, replica set, or stateful set based on observed CPU utilization (or other select metrics). This ensures that your application can handle varying loads efficiently without manual intervention.

Key Concepts

Metrics Server: A cluster-wide aggregator of resource usage data. It collects metrics from the kubelet on each node and provides them via the Kubernetes API.
Target Resource: The deployment, replica set, or stateful set that you want to scale.
Scaling Policy: Defines the conditions under which the HPA will scale the number of pods. This typically involves setting a target CPU utilization percentage.

How HPA Works

Metrics Collection: The Metrics Server collects resource usage data from the nodes.
Evaluation: The HPA controller evaluates the collected metrics against the defined scaling policy.
Scaling Decision: If the current resource usage exceeds the target, the HPA controller increases the number of pods. Conversely, if the resource usage is below the target, it decreases the number of pods.

Setting Up Horizontal Pod Autoscaling

Prerequisites

A running Kubernetes cluster.
Metrics Server installed and configured.

Step-by-Step Guide

Install Metrics Server (if not already installed):

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Create a Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m

Apply the deployment:

kubectl apply -f nginx-deployment.yaml

Create the Horizontal Pod Autoscaler:
```
kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=1 --max=10
```
This command sets up an HPA for the nginx-deployment with the following parameters:
- Target CPU utilization: 50%
- Minimum number of pods: 1
- Maximum number of pods: 10

Verify the HPA:

kubectl get hpa

You should see output similar to:

NAME               REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx-deployment   Deployment/nginx-deployment   10%/50%   1         10        1          1m

Practical Example

Simulating Load

To see the HPA in action, you can simulate a load on the nginx deployment:

Run a Load Generator:

kubectl run -i --tty load-generator --image=busybox /bin/sh

Generate Load:

Inside the load generator pod, run:

while true; do wget -q -O- http://nginx-deployment; done

Observe Scaling:

Monitor the HPA status:
```
kubectl get hpa -w
```
You should see the number of replicas increase as the load increases.

Common Mistakes and Tips

Metrics Server Not Installed: Ensure the Metrics Server is installed and running correctly.
Incorrect Resource Requests/Limits: Make sure your pods have appropriate resource requests and limits set.
Insufficient Permissions: Ensure the HPA controller has the necessary permissions to scale the target resource.

Exercise

Task

Create a deployment for an application of your choice.
Set up an HPA to scale the deployment based on CPU utilization.
Simulate a load and observe the scaling behavior.

Solution

Create Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m
          limits:
            cpu: 200m

Apply the deployment:

kubectl apply -f myapp-deployment.yaml

Create HPA:

kubectl autoscale deployment myapp-deployment --cpu-percent=50 --min=1 --max=10

Simulate Load:

kubectl run -i --tty load-generator --image=busybox /bin/sh

Inside the load generator pod:

while true; do wget -q -O- http://myapp-deployment; done

Observe Scaling:
```
kubectl get hpa -w
```

Conclusion

Horizontal Pod Autoscaling is a powerful feature in Kubernetes that helps maintain optimal performance and resource utilization for your applications. By automatically adjusting the number of pods based on real-time metrics, HPA ensures that your application can handle varying loads efficiently. Understanding and implementing HPA is crucial for managing scalable and resilient applications in a Kubernetes environment.

Horizontal Pod Autoscaling

Key Concepts

How HPA Works

Setting Up Horizontal Pod Autoscaling

Prerequisites

Step-by-Step Guide

Practical Example

Simulating Load

Common Mistakes and Tips

Exercise

Task

Solution

Conclusion

Kubernetes Course

Module 1: Introduction to Kubernetes

Module 2: Core Kubernetes Components

Module 3: Configuration and Secrets Management

Module 4: Networking in Kubernetes

Module 5: Storage in Kubernetes

Module 6: Advanced Kubernetes Concepts

Module 7: Monitoring and Logging

Module 8: Security in Kubernetes

Module 9: Scaling and Performance

Module 10: Kubernetes Ecosystem and Tools

Module 11: Case Studies and Real-World Applications

Module 12: Preparing for Kubernetes Certification