Lesson 12
Scaling applications
~5 min read
Scaling: adjust replica count
replicas:3
P1
P2
P3
kubectl scale deployment nginx-deployment --replicas=3
In Kubernetes it is very easy to scale up or down your applications as load on them increases or decreases. In Kubernetes there are two ways of scaling; manual and automatic.
Doing it manually means that you will have to identify yourself when you need to scale and perform the scaling operation yourself.
To start with, we are going to cover the manual approach. There are two ways of scaling a deployment manually. One approach would be to edit the deployment manually by running:
kubectl edit deployment nginx-deployment, locating the replicas field and modifying the value set there for example, change it from 1 to 3.
Another approach, which is simpler in my opinion, would be to run the scale command:
kubectl scale deployment nginx-deployment --replicas=3
If you have done either of these steps, you can now verify that your application has more pods to handle incoming traffic to them by running:
kubectl get pods
nginx-deployment-557f78c779-bts9t 1/1 Running 0 6m42s
nginx-deployment-557f78c779-s9hqs 1/1 Running 0 5m37s
nginx-deployment-557f78c779-sprdg 1/1 Running 0 5m37sKubernetes also provides support for doing automatic scaling or autoscaling. Autoscaling is when a resource called Horizontal Pod Autoscaler (HPA) has been configured to monitor your application and dynamically adjust the number of running replicas according to some real-time heuristic. Memory usage or CPU utilization are common ones. This allows deployed applications to scale on demand and helps to achieve efficient utilization of resources and gives us the ability to handle sudden unexpected increases in traffic.
There are also two ways of creating a Horizontal Pod Autoscaler for your applications. One approach would be to invoke a command to install the resource on a target deployment. The command for this would be:
kubectl autoscale deployment nginx-deployment --cpu-percent=50 --min=1 --max=10
By running this command you have created a HorizontalPodAutoscaler that maintains between 1 and 10 replicas of the Pods controlled by the nginx-deployment Deployment. Notice that we configured the --cpu-percent heuristic as 50. This means that the HorizontalPodAutoScaler will aim to maintain a 50% CPU utilization on average across all pods in the deployment.
If you ran the autoscale command listed above you should be able to query the HPA status by running:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-deployment Deployment/nginx-deployment <unknown>/50% 1 10 3 3m47sBut wait, why doesn't Kubernetes know how much CPU our deployment is using? To debug this we should run:
kubectl describe hpa
Name: nginx-deployment
Namespace: default
Reference: Deployment/nginx-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: failed to get cpu utilization: missing request for cpu in container nginxNotice this part:
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count:
failed to get cpu utilization: missing request for cpu in container nginx
This is telling us that we need to add a request for CPU as a part of the resource management for the deployment.
Pod resource management
Resource requests and limits are Kubernetes' way of allowing you to specify the computing needs of your pods. If a resource request is set, the kube-scheduler uses this information to make sure that this pod will be run on a node with enough of that resource available. Limits are as the name might indicate a way for us to specify a limit of that resource that will be available to our pod.
This is a good safeguard against pods going rogue and eating up all of our resources. When a pod exceeds its memory limit, Kubernetes terminates the container (OOM kill). When a pod exceeds its CPU limit, the container is throttled rather than killed.
Let's revisit our deployment manifest and add a resource specification to it:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app.kubernetes.io/name: my-nginx
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: my-nginx
template:
metadata:
labels:
app.kubernetes.io/name: my-nginx
spec:
containers:
- name: nginx
image: nginx:1.26.2
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
ports:
- containerPort: 80The sandbox has this updated manifest stored as nginx-deployment.yaml. Apply it:
kubectl apply -f ./nginx-deployment.yamlNow try getting the status of the HorizontalPodAutoscaler:
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-deployment Deployment/nginx-deployment 0%/50% 1 10 1 46hA very interesting thing to notice is that before we fixed the Autoscaler the replica count was 3 pods. After we fixed it the HPA realized that we were underutilizing the requested resources for pods so the HPA scaled it down for us to only one replica. If the CPU resource load were to exceed 50% it would bring up another pod so the average utilization of the target resource would fall back under the specified criterion.