Lesson 10

Debugging & Troubleshooting

~5 min read

Things break in Kubernetes. Images get misspelled, applications crash, configurations go wrong. The good news is that Kubernetes provides excellent diagnostic tools. Let's learn how to use them.

Our cluster has three deployments — but not all of them are healthy. Let's investigate.

Survey the cluster

Start with a high-level view:

kubectl get deployments

You should see that nginx-deployment is ready, but api-server and worker-service show 0 ready replicas. Let's dig deeper:

kubectl get pods

Notice the STATUS column. Instead of Running, you'll see:

ImagePullBackOff — Kubernetes can't pull the container image
CrashLoopBackOff — the container starts but keeps crashing

Diagnosing ImagePullBackOff

The api-server pods are stuck in ImagePullBackOff. This usually means the image name is wrong, the registry doesn't exist, or authentication is missing.

Use describe to see what happened:

kubectl describe pod <api-server-pod-name>

Look at the Events section at the bottom. You'll see Failed and BackOff events with messages about the image pull failure. The image nonexistent-registry.io/api-server:2.0 doesn't exist.

Fix it by updating the image to something that works:

kubectl set image deployment/api-server api-server=nginx:1.26.2

Watch the pods recover:

kubectl get pods

The old pods will terminate and new ones will start with the corrected image.

Diagnosing CrashLoopBackOff

The worker-service pod is in CrashLoopBackOff. This means the container starts but the process inside it keeps failing.

First, check the pod details:

kubectl describe pod <worker-service-pod-name>

The Events section shows BackOff warnings. But to understand why it's crashing, we need the logs:

kubectl logs <worker-service-pod-name>

The logs show a Node.js error — the application can't find its entry point. The image node:20-alpine-broken is faulty.

Fix it with a working image:

kubectl set image deployment/worker-service worker-service=node:20-alpine

Using kubectl exec

Now that you've fixed the api-server deployment, its pods should be running. For running containers, kubectl exec lets you inspect the container from inside. Try it on one of the api-server pods you just fixed:

kubectl exec <api-server-pod-name> -- hostname
kubectl exec <api-server-pod-name> -- cat /etc/hostname

This is invaluable for checking configuration files, environment variables, network connectivity, and file system state. Note that exec only works on running pods — if you try it on a CrashLoopBackOff pod, you'll get an error.

Understanding Events

Kubernetes records events for significant occurrences. View all cluster events:

kubectl get events

Events include scheduling decisions, image pulls, container starts, failures, and more. They're timestamped and categorized as Normal or Warning.

When debugging:

Events tell you what happened
Logs tell you why it happened
Describe gives you the full picture

Debugging checklist

When something isn't working:

kubectl get pods — check pod status and ready counts
kubectl describe pod <name> — look at Events and container State
kubectl logs <name> — read application output
kubectl get events — see cluster-wide activity
kubectl exec <name> -- <command> — inspect a running container

Previous← ConfigMaps & Secrets NextLiveness & Readiness Probes →

Lesson 10

Debugging & Troubleshooting

~5 min read

Things break in Kubernetes. Images get misspelled, applications crash, configurations go wrong. The good news is that Kubernetes provides excellent diagnostic tools. Let's learn how to use them.

Our cluster has three deployments — but not all of them are healthy. Let's investigate.

Survey the cluster

Start with a high-level view:

kubectl get deployments

You should see that nginx-deployment is ready, but api-server and worker-service show 0 ready replicas. Let's dig deeper:

kubectl get pods

Notice the STATUS column. Instead of Running, you'll see:

ImagePullBackOff — Kubernetes can't pull the container image
CrashLoopBackOff — the container starts but keeps crashing

Diagnosing ImagePullBackOff

The api-server pods are stuck in ImagePullBackOff. This usually means the image name is wrong, the registry doesn't exist, or authentication is missing.

Use describe to see what happened:

kubectl describe pod <api-server-pod-name>

Look at the Events section at the bottom. You'll see Failed and BackOff events with messages about the image pull failure. The image nonexistent-registry.io/api-server:2.0 doesn't exist.

Fix it by updating the image to something that works:

kubectl set image deployment/api-server api-server=nginx:1.26.2

Watch the pods recover:

kubectl get pods

The old pods will terminate and new ones will start with the corrected image.

Diagnosing CrashLoopBackOff

The worker-service pod is in CrashLoopBackOff. This means the container starts but the process inside it keeps failing.

First, check the pod details:

kubectl describe pod <worker-service-pod-name>

The Events section shows BackOff warnings. But to understand why it's crashing, we need the logs:

kubectl logs <worker-service-pod-name>

The logs show a Node.js error — the application can't find its entry point. The image node:20-alpine-broken is faulty.

Fix it with a working image:

kubectl set image deployment/worker-service worker-service=node:20-alpine

Using kubectl exec

kubectl exec <api-server-pod-name> -- hostname
kubectl exec <api-server-pod-name> -- cat /etc/hostname

Understanding Events

Kubernetes records events for significant occurrences. View all cluster events:

kubectl get events

Events include scheduling decisions, image pulls, container starts, failures, and more. They're timestamped and categorized as Normal or Warning.

When debugging:

Events tell you what happened
Logs tell you why it happened
Describe gives you the full picture

Debugging checklist

When something isn't working:

kubectl get pods — check pod status and ready counts
kubectl describe pod <name> — look at Events and container State
kubectl logs <name> — read application output
kubectl get events — see cluster-wide activity
kubectl exec <name> -- <command> — inspect a running container

Previous← ConfigMaps & Secrets NextLiveness & Readiness Probes →

Best experienced on desktop

Debugging & Troubleshooting

Survey the cluster

Diagnosing ImagePullBackOff

Diagnosing CrashLoopBackOff

Using kubectl exec

Understanding Events

Debugging checklist

Debugging & Troubleshooting

Survey the cluster

Diagnosing ImagePullBackOff

Diagnosing CrashLoopBackOff

Using kubectl exec

Understanding Events

Debugging checklist