Introduction

Practice Kubernetes troubleshooting with realistic error scenarios.

Each scenario is run with kubectl apply commands. To cleanup, run kubectl delete on the same.

Scenarios

Pod Issues

Crashing Pod (CrashLoopBackoff)

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml

Example:

OOMKilled Pod (Out of Memory Kill)

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/oomkill/oomkill_job.yaml

Example:

High CPU Throttling (CPUThrottlingHigh)

Apply the following YAML and wait 15 minutes. (CPU throttling is only an issue if it occurs for a meangingful periods of time. Less than 15 minutes of throttling typically does not trigger an alert.)

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/cpu_throttling/throttling.yaml

Example:

Pending Pod

Apply the following YAML and wait 15 minutes. (By default, most systems only alert after pods are pending for 15 minutes. This prevents false alarms on autoscaled clusters, where its OK for pods to be temporarily pending.)

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/pending_pods/pending_pod_node_selector.yaml

Example:

ImagePullBackOff

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/image_pull_backoff/no_such_image.yaml

Example:

Liveness Probe

Apply the following YAML to simulate a Liveness probe fail.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml

Example:

Job Issues

Failing Job

Deploy a failing job. The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml

Example:

Helm Monitoring

Add robusta's helm chart repository:

helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo update

Deploy a failing release:

helm install kubewatch robusta/kubewatch --set='rbac.create=true,updateStrategy.type=Error' --namespace demo-namespace --create-namespace

Deploy a successful release:

helm upgrade kubewatch robusta/kubewatch --set='rbac.create=true' --namespace demo-namespace --create-namespace

Uninstall kubewatch:

helm del kubewatch  --namespace demo-namespace

Delete the test namespace:

kubectl delete namespace demo-namespace

Example:

An example of broken Helm release, using Robusta's Helm Releases Monitoring feature.

Other Demos

Change Tracking

Deploy a healthy pod. Then break it.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/healthy.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml

Now audit your cluster. If someone else made this change, would you be able to pinpoint the change that broke the application?

Example:

Deployment Image Change Tracking

Create an nginx deployment. Then change the image tag to simulate an unexpected image tag change.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/before_image_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/after_image_change.yaml

Did you immediately get notified about a change in the image tag? Note: You will need to configure a playbook for this to work. Instructions coming soon!

Example:

Ingress Port and Path Change Tracking

Create an ingress. Then changes its port and path to simulate an unexpected ingress modification.

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/before_port_path_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/after_port_path_change.yaml

Did you immediately get notified about a change in the port number and path? Note: You will need to configure a playbook for this to work. Instructions coming soon!

Example:

Drift and Namespace Comparison

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/namespace_drift/example.yaml

Can you quickly tell the difference between the compare1 and compare2 namespaces? What is the drift between them?

Example:

High overhead of GKE Nodes

On GKE, nodes can reserve more than 50% of CPU for themselves. Users pay for CPU that is unavailable to applications.

Reproduction:

Create a default GKE cluster with autopilot disabled. Don't change any other settings.
Deploy the following pod:

kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/gke_node_allocatable/gke_issue.yaml

Run kubectl get pods -o wide gke-node-allocatable-issue

The pod will be Pending. A Pod requesting 1 CPU cannot run on an empty node with 2 CPUs!

Example:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Scenarios

Pod Issues

Crashing Pod (CrashLoopBackoff)

OOMKilled Pod (Out of Memory Kill)

High CPU Throttling (CPUThrottlingHigh)

Pending Pod

ImagePullBackOff

Liveness Probe

Job Issues

Failing Job

Helm Monitoring

Other Demos

Change Tracking

Deployment Image Change Tracking

Ingress Port and Path Change Tracking

Drift and Namespace Comparison

High overhead of GKE Nodes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
cpu_throttling		cpu_throttling
crashloop_backoff		crashloop_backoff
crashpod		crashpod
deployment_image_change		deployment_image_change
evictions		evictions
example_images		example_images
gke_node_allocatable		gke_node_allocatable
image_pull_backoff		image_pull_backoff
ingress_port_path_change		ingress_port_path_change
init_crashloop_backoff		init_crashloop_backoff
job_failure		job_failure
job_run_forever		job_run_forever
liveness_probe_fail		liveness_probe_fail
namespace_drift		namespace_drift
oomkill		oomkill
pending_pods		pending_pods
prometheus_rule_failure		prometheus_rule_failure
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

macmiranda/kubernetes-demos

Folders and files

Latest commit

History

Repository files navigation

Introduction

Scenarios

Pod Issues

Crashing Pod (CrashLoopBackoff)

OOMKilled Pod (Out of Memory Kill)

High CPU Throttling (CPUThrottlingHigh)

Pending Pod

ImagePullBackOff

Liveness Probe

Job Issues

Failing Job

Helm Monitoring

Other Demos

Change Tracking

Deployment Image Change Tracking

Ingress Port and Path Change Tracking

Drift and Namespace Comparison

High overhead of GKE Nodes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages