Lesson 10
Debugging & Troubleshooting
~5 min read
Things break in Kubernetes. Images get misspelled, applications crash, configurations go wrong. The good news is that Kubernetes provides excellent diagnostic tools. Let's learn how to use them.
Our cluster has three deployments — but not all of them are healthy. Let's investigate.
Survey the cluster
Start with a high-level view:
kubectl get deploymentsYou should see that nginx-deployment is ready, but api-server and worker-service show 0 ready replicas. Let's dig deeper:
kubectl get podsNotice the STATUS column. Instead of Running, you'll see:
- ImagePullBackOff — Kubernetes can't pull the container image
- CrashLoopBackOff — the container starts but keeps crashing
Diagnosing ImagePullBackOff
The api-server pods are stuck in ImagePullBackOff. This usually means the image name is wrong, the registry doesn't exist, or authentication is missing.
Use describe to see what happened:
kubectl describe pod <api-server-pod-name>Look at the Events section at the bottom. You'll see Failed and BackOff events with messages about the image pull failure. The image nonexistent-registry.io/api-server:2.0 doesn't exist.
Fix it by updating the image to something that works:
kubectl set image deployment/api-server api-server=nginx:1.26.2Watch the pods recover:
kubectl get podsThe old pods will terminate and new ones will start with the corrected image.
Diagnosing CrashLoopBackOff
The worker-service pod is in CrashLoopBackOff. This means the container starts but the process inside it keeps failing.
First, check the pod details:
kubectl describe pod <worker-service-pod-name>The Events section shows BackOff warnings. But to understand why it's crashing, we need the logs:
kubectl logs <worker-service-pod-name>The logs show a Node.js error — the application can't find its entry point. The image node:20-alpine-broken is faulty.
Fix it with a working image:
kubectl set image deployment/worker-service worker-service=node:20-alpineUsing kubectl exec
Now that you've fixed the api-server deployment, its pods should be running. For running containers, kubectl exec lets you inspect the container from inside. Try it on one of the api-server pods you just fixed:
kubectl exec <api-server-pod-name> -- hostname
kubectl exec <api-server-pod-name> -- cat /etc/hostnameThis is invaluable for checking configuration files, environment variables, network connectivity, and file system state. Note that exec only works on running pods — if you try it on a CrashLoopBackOff pod, you'll get an error.
Understanding Events
Kubernetes records events for significant occurrences. View all cluster events:
kubectl get eventsEvents include scheduling decisions, image pulls, container starts, failures, and more. They're timestamped and categorized as Normal or Warning.
When debugging:
- Events tell you what happened
- Logs tell you why it happened
- Describe gives you the full picture
Debugging checklist
When something isn't working:
kubectl get pods— check pod status and ready countskubectl describe pod <name>— look at Events and container Statekubectl logs <name>— read application outputkubectl get events— see cluster-wide activitykubectl exec <name> -- <command>— inspect a running container