Rethink platform engineering and developer impact in the age of AI. Tune in to our webinar on Thursday, May 22.

Mastering Kubernetes Pods Troubleshooting: Advanced Strategies and Solutions

Kay James
Kay James

Kubernetes (K8s) deployments often pose challenges from various angles, including pods, services, ingress, non-responsive clusters, control planes, and high-availability setups. Kubernetes pods are the smallest deployable units in the Kubernetes ecosystem, encapsulating one or more containers that share resources and a network. Pods are designed to run a single instance of an app or process and are created and disposed of as needed. Pods are crucial for scaling, updating, and maintaining apps in a K8s environment.

This article explores the challenges faced with Kubernetes pods and the troubleshooting steps to take. Some of the error messages encountered when running Kubernetes pods include the following:

  • ImagePullBackoff
  • ErrImagePull
  • InvalidImageName
  • CrashLoopBackOff

Sometimes, you do not even encounter the listed errors but still observe that your pods fail. First, it is essential to note that you should understand the API reference when debugging any Kubernetes resources. It explains how the various Kubernetes APIs are defined and how the multiple objects in your pods/ deployments work. The documentation is well-defined under API reference on the Kubernetes website. In this case, when debugging pods, select the pods object from the API reference to get a detailed explanation of how pods work. It defines the fields that go into pods, i.e., version, kind, metadata, spec, and status. Kubernetes also provides a cheat sheet that contains a guide to the commands needed.

Prerequisites

This article assumes the reader has the following:

  • Kind installed for scenario demonstrations
  • Intermediate understanding of Kubernetes architecture
  • Kubectl command line tool

Kubernetes pods error - ImagePullBackoff

The error is shown for three different reasons:

  • Invalid Image
  • Invalid Tag
  • Invalid Permissions

These scenarios arise when you don't have the correct information about your image. You might also not have permission to pull the image from its repository (private repositories). To demonstrate this in the example below, we create an nginx deployment:

➜ ~ kubectl create deploy nginx --image=nginxdeployment.apps/nginx created

Once the pod is running, get the pod name:

 ~ kubectl get pods

NAME                    READY   STATUS    RESTARTS   AGE

nginx-8f458dc5b-hcrsh   1/1     Running   0          100s

Copy the name of the running pod and get further information about it:

➜  ~ kubectl describe pod nginx-8f458dc5b-hcrsh

Name:             nginx-8f458dc5b-hcrsh

hable:NoExecute op=Exists for 300s

Events:

Type    Reason     Age    From               Message

----    ------     ----   ----               -------

Normal  Scheduled  2m43s  default-scheduler  Successfully assigned default/nginx-8f458dc5b-hcrsh to k8s-troubleshooting-control-plane

Normal  Pulling    2m43s  kubelet            Pulling image "nginx"

Normal  Pulled     100s   kubelet            Successfully pulled image "nginx" in 1m2.220189835s

Normal  Created    100s   kubelet            Created container nginx

Normal  Started    100s   kubelet            Started container nginx

The image was pulled successfully. Your Kubernetes pod is running without errors.

To demonstrate ImagePullBackoff, edit the deployment YAML file and specify an image that does not exist:

kubectl edit deploy nginx

containers:

-image: nginxdoestexist

 imagePullPolicy: Always

 name: nginx

The new pod is not successfully deployed

➜  ~ kubectl get pods

NAME                     READY   STATUS             RESTARTS   AGE

nginx-5b847fdb95-mx4pq   0/1     ErrImagePull  0          3m40s

nginx-8f458dc5b-hcrsh    1/1     Running        0          38m

ImagePullBackoff error is shown

➜  ~ kubectl describe pod nginx-6f46cbfbcb-c92bl

Events:

Type     Reason     Age                From               Message

----     ------     ----               ----               -------

Normal   Scheduled  88s                default-scheduler  Successfully assigned default/nginx-6f46cbfbcb-c92bl to k8s-troubleshooting-control-plane

Normal   Pulling    40s (x3 over 88s)  kubelet            Pulling image "nginxdoesntexist"

Warning  Failed     37s (x3 over 85s)  kubelet            Failed to pull image "nginxdoesntexist": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginxdoesntexist:latest": failed to resolve reference "docker.io/library/nginxdoesntexist:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed

Warning  Failed     37s (x3 over 85s)  kubelet            Error: ErrImagePull

Normal   BackOff    11s (x4 over 85s)  kubelet            Back-off pulling image "nginxdoesntexist"

Warning  Failed     11s (x4 over 85s)  kubelet            Error: ImagePullBackOff

Kubernetes Pods Error - Image pulled but the pod is pending.

Whenever you run K8s in a production environment, the K8s administrators allocate ResourceQuotas for each namespace according to the requirements of the namespaces running within a cluster. Namespaces are used for logical separation within the cluster.

When the specifications in the ResourceQuota do not meet the minimal requirement of the application in a pod, the 'Image pulled, but the pod is still pending' error is thrown. In the example below, create a namespace called payments:

➜ ~ kubectl create ns payments

namespace/payments created

Create a ResourceQuota with relevant specifications

➜  ~ cat resourcequota.yaml

apiVersion: v1

kind: ResourceQuota

metadata:

name: compute-resources

spec:

hard:

  requests.cpu: "1"

  requests.memory: 1Gi

  limits.cpu: "2"

  limits.memory: 4Gi

Assign a resource quota to the namespace payments

➜ ~ kubectl apply -f resourcequota.yaml -n paymentsresourcequota/compute-resources created

Create a new deployment within the namespace with the resource quota restrictions:

Despite the deployment being successfully created, no pods exist:

➜ ~ kubectl get pods -n payments

No resources found in payments namespace.

The deployment is created, but there is no pod in the ready status, none up-to-date, and none available:

➜  ~ kubectl get deploy -n payments

NAME    READY   UP-TO-DATE   AVAILABLE   AGE

nginx   0/1     0            0           7m4s

To further debug, describe the nginx deployment. The pods failed to create:

➜  ~ kubectl describe deploy nginx -n payments

Name:                   nginx

Namespace:              payments

CreationTimestamp:      Wed, 24 May 2023 21:37:55 +0300

Labels:                 app=nginx

Annotations:            deployment.kubernetes.io/revision: 1

Selector:               app=nginx

Replicas:               1 desired | 0 updated | 0 total | 0 available | 1 unavailable

StrategyType:           RollingUpdate

MinReadySeconds:        0

RollingUpdateStrategy:  25% max unavailable, 25% max surge

Pod Template:

Labels:  app=nginx

Containers:

 nginx:

  Image:        nginx

  Port:         <none>

  Host Port:    <none>

  Environment:  <none>

  Mounts:       <none>

Volumes:        <none>

Conditions:

Type             Status  Reason

----             ------  ------

Available        False   MinimumReplicasUnavailable

ReplicaFailure   True    FailedCreate

Progressing      False   ProgressDeadlineExceeded

OldReplicaSets:    <none>

NewReplicaSet:     nginx-8f458dc5b (0/1 replicas created)

Events:

Type    Reason             Age   From                   Message

----    ------             ----  ----                   -------

Normal  ScalingReplicaSet  10m   deployment-controller  Scaled up replica set nginx-8f458dc5b to 1

Further analysis from Kubernetes events reveals insufficient memory for the pod to create.

➜ ~ kubectl get events --sort-by=/metadata.creationTimestamp

Kubernetes Pods Error - CrashLoopBackOff

This error occurs when your image is pulled successfully, and your container is created, but your runtime configuration fails. For example, if you have a working Python application that is trying to write to a folder that does not exist or does not have permission to write to that folder. Initially, the application gets executed, then runs into an error. The container is stopped if there is a panic in your application logic. The container will go into a CrashLoopBackOff. Eventually, you observe that the deployment has zero pods, i.e., one pod exists, but it is not running and throws a CrashLoopBackoff error.

Liveness & Readiness Probe Failure

A liveness probe detects if your pod has entered a broken state and can no longer serve traffic. Kubernetes will restart the pod for you. A readiness probe checks if your application is ready to handle the traffic. The readiness probe ensures that your application pulls all the necessary configurations from the configuration map and starts its threads. Only after this process is your application ready to receive traffic. If your application runs into an error during this process, it also goes into CrashLoopBackoff.

Get to Troubleshooting!

This article provides an overview of troubleshooting techniques for Kubernetes pods. It addresses common errors encountered while deploying pods and practical solutions to resolve them. It also provides insight into the reference pages and cheat sheets vital in understanding how Kubernetes works and techniques to identify and resolve issues effectively. By following the guidance presented in this article, readers can enhance their troubleshooting skills and streamline the deployment and management of their Kubernetes pods.

Telepresence, Now in Blackbird

Debug Kubernetes faster with Telepresence—test and troubleshoot services locally while seamlessly integrating with your remote cluster.