Cause Crash Loop
Attack
Kubernetes pods
Cause Crash Loop
Attack
Kubernetes pods
Datadog alerts when a Kubernetes pod is in crash loop
Verify that a Datadog monitor alerts you when pods are not ready to accept traffic for a certain time.
Motivation
Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.
Structure
First, check that the Datadog monitor responsible for tracking non-ready containers is in an 'okay' state. As soon as one of the containers is crash looping, caused by the crash loop attack, the Datadog monitor should alert and escalate it to your on-call team.
Solution Sketch
- Kubernetes liveness, readiness, and startup probes
Datadog monitors
Kubernetes cluster
Kubernetes pods
Dynatrace should detect a crash looping as problem
Verify that Dynatrace alerts you on pods not being ready to accept traffic for a certain amount of time.
Motivation
Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.
Structure
First, check that Dynatrace has no problems for an entity and doesn't alert already on non-ready containers. As soon as one of the containers is crash looping, caused by the Steadybit attack crash loop, Dynatrace should detect the problem and alert to ensure your on-call team is taking action.
Solution Sketch
- Kubernetes liveness, readiness, and startup probes
Kubernetes cluster
Kubernetes pods
Grafana alert rule fires when a Kubernetes pod is in crash loop
Verify that a Grafana alert rule alerts you when pods are not ready to accept traffic for a certain time.
Motivation
Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.
Structure
First, check that the Grafana alert rule responsible for tracking non-ready containers is in an 'okay' state. As soon as one of the containers is crash looping, caused by the crash loop attack, the Grafana alert rule should fire and escalate it to your on-call team.
Solution Sketch
- Kubernetes liveness, readiness, and startup probes
Grafana alert rules
Kubernetes cluster
Kubernetes pods
Instana should detect a crash looping as incident
Intent
Verify that Instana alerts you that pods are not ready to accept traffic for some time.
Motivation
Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.
Structure
First, check that Instana has no critical events for an application perspective. As soon as one of the containers is crash looping, caused by the Steadybit attack crash loop, Instana should detect this via a critical event to ensure your on-call team is taking action.
Solution Sketch
- Kubernetes liveness, readiness, and startup probes
Instana application perspectives
Kubernetes cluster
Kubernetes pods