Trigger Shutdown Host
Trigger Shutdown Host
Triggers a reboot or shutdown of the host.Trigger Shutdown Host
Trigger Shutdown Host
Triggers a reboot or shutdown of the host.Linux Host reboot is alerted by Datadog
When a Linux host is suddenly missing from your system, Datadog should alert you to this. Eventually, everything should recover when only rebooting the host.
Motivation
When you're working in a less volatile system environment, where you expect hosts always to run, you should validate whether you notice whenever a host is rebooting.
Structure
Before restarting a host, we verify that the Datadog monitor is in an ok state Afterward, we trigger the shutdown of a host and expect Datadog to alert about the missing host. Eventually, the host should come back and Datadog turn into an OK state again. While experimenting, we create a downtime for the Monitor so that it will not escalate due to the ongoing alert.
Kubernetes node shutdown results in new node startup
A resilient Kubernetes cluster can cope with a crashing node and simply starts a new one.
Motivation
A changing number of nodes in your Kubernetes cluster is expected, as you may update your nodes from time to time or simply scale the cluster depending on traffic peaks. This is especially true when using spot instances in a Cloud environment. This requires the deployments to be node-independent and properly configured to be rescheduled on a newly started node or a node that still has free resources.
Structure
Before restarting a node, we verify that the cluster is healthy and that the deployments are ready. Afterward, we trigger the shutdown of the node of a specific Kubernetes deployment and expect the deployment to be rescheduled on any other node and a new node to start up within a reasonable amount of time.
Solution Sketch
- Kubernetes liveness, readiness, and startup probes
Warning
Please be aware that we will shut down a node. Please ensure this is fine and your node is either virtual or can somehow be started up afterward.