Practice Kubernetes troubleshooting with realistic error scenarios.
Each scenario is run with kubectl apply commands. To cleanup, run kubectl delete on the same.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/oomkill/oomkill_job.yaml
Apply the following YAML and wait 15 minutes. (CPU throttling is only an issue if it occurs for a meangingful periods of time. Less than 15 minutes of throttling typically does not trigger an alert.)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/cpu_throttling/throttling.yaml
Apply the following YAML and wait 15 minutes. (By default, most systems only alert after pods are pending for 15 minutes. This prevents false alarms on autoscaled clusters, where its OK for pods to be temporarily pending.)
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/pending_pods/pending_pod_node_selector.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/image_pull_backoff/no_such_image.yaml
Apply the following YAML to simulate a Liveness probe fail.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/liveness_probe_fail/failing_liveness_probe.yaml
Deploy a failing job. The job will fail after 60 seconds, then attempt to run again. After two attempts, it will fail for good.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/job_failure/job_crash.yaml
Add robusta's helm chart repository:
helm repo add robusta https://robusta-charts.storage.googleapis.com && helm repo updateDeploy a failing release:
helm install kubewatch robusta/kubewatch --set='rbac.create=true,updateStrategy.type=Error' --namespace demo-namespace --create-namespaceDeploy a successful release:
helm upgrade kubewatch robusta/kubewatch --set='rbac.create=true' --namespace demo-namespace --create-namespaceUninstall kubewatch:
helm del kubewatch --namespace demo-namespace Delete the test namespace:
kubectl delete namespace demo-namespace Deploy a healthy pod. Then break it.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/healthy.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashpod/broken.yamlNow audit your cluster. If someone else made this change, would you be able to pinpoint the change that broke the application?
Create an nginx deployment. Then change the image tag to simulate an unexpected image tag change.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/before_image_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/deployment_image_change/after_image_change.yamlDid you immediately get notified about a change in the image tag? Note: You will need to configure a playbook for this to work. Instructions coming soon!
Create an ingress. Then changes its port and path to simulate an unexpected ingress modification.
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/before_port_path_change.yaml
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/ingress_port_path_change/after_port_path_change.yamlDid you immediately get notified about a change in the port number and path? Note: You will need to configure a playbook for this to work. Instructions coming soon!
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/namespace_drift/example.yamlCan you quickly tell the difference between the compare1 and compare2 namespaces? What is the drift between them?
On GKE, nodes can reserve more than 50% of CPU for themselves. Users pay for CPU that is unavailable to applications.
Reproduction:
- Create a default GKE cluster with autopilot disabled. Don't change any other settings.
- Deploy the following pod:
kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/gke_node_allocatable/gke_issue.yaml
- Run
kubectl get pods -o wide gke-node-allocatable-issue
The pod will be Pending. A Pod requesting 1 CPU cannot run on an empty node with 2 CPUs!











