DataDog Alerts when a Kubernetes Pod Is in Crash Loop
DataDog Alerts when a Kubernetes Pod Is in Crash Loop
When one of your containers has problems starting, it may result in a crash loop and, eventually, Kubernetes backing off to restart this container. Verify that Datadog notices a crash loop and will alert you to take action.DataDog Alerts when a Kubernetes Pod Is in Crash Loop
DataDog Alerts when a Kubernetes Pod Is in Crash Loop
When one of your containers has problems starting, it may result in a crash loop and, eventually, Kubernetes backing off to restart this container. Verify that Datadog notices a crash loop and will alert you to take action.Intent
Verify that Datadog monitor alerts you on pods not being ready to accept traffic for a certain amount of time.
Motivation
Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.
Structure
First, check that Datadog monitor responsible for tracking non-ready container is in an 'okay' state. As soon as one of the containers is crash looping, caused by the Steadybit attack crash loop, the Datadog monitor should alert and escalate it to your on-call team.
Environment Example
The Kubernetes deployment gateway
consists of two pods.
We are attacking one of the two pods by causing a crash loop and waiting on a specific monitor in Datadog to detect this crashing pod.
Solution Sketch
Download now
.json (4 kB)
It's quick and easy
- Download .json file
1.
- Upload it inside Steadybit
2.
- Start your experiment!
3.