Steadybit logoResilience Hub
Try SteadybitGitHub icon
Steadybit logoResilience Hub

Drained Nodes Results in Quickly Rescheduling Pods

When draining a node, Kubernetes should reschedule the pods on other nodes to achieve elasticity.
Targets:

Kubernetes cluster

Kubernetes deployments

Kubernetes nodes

Download now

Drained Nodes Results in Quickly Rescheduling Pods

When draining a node, Kubernetes should reschedule the pods on other nodes to achieve elasticity.
Targets:

Kubernetes cluster

Kubernetes deployments

Kubernetes nodes

Download now

Drained Nodes Results in Quickly Rescheduling Pods

When draining a node, Kubernetes should reschedule the pods on other nodes to achieve elasticity.
Targets:

Kubernetes cluster

Kubernetes deployments

Kubernetes nodes

Download now

Drained Nodes Results in Quickly Rescheduling Pods

When draining a node, Kubernetes should reschedule the pods on other nodes to achieve elasticity.
Targets:

Kubernetes cluster

Kubernetes deployments

Kubernetes nodes

Download now
Go back to list
The experiment editor showing the visual structure of the experiment.The experiment editor showing the visual structure of the experiment.

Intent

When draining a node, Kubernetes should reschedule running pods on other nodes without hiccups to ease, e.g., node maintenance.

Motivation

Draining a node may be necessary for, e.g., maintenance of a node. If that happens, Kubernetes should be able to reschedule the pods running on that node within the expected time and without user-noticeable failures.

Structure

For the entire duration of the experiment, a user-facing endpoint should work within expected success rates. At the beginning of the experiment, all pods should be ready to accept traffic. As soon as the node is drained, Kubernetes will evict the pods, but we still expect the pod's redundancy to be able to serve the user-facing endpoint. Eventually, after 120 seconds, all pods should be rescheduled and ready again to recover after the maintenance.

Environment Example

In our example, we check for a user-visible endpoint of the gateway deployment while draining the node. Also, we are limiting the experiment to a pod of gateway and all other pods running on the same host (as this is unavoidable). Starting a new gateway pod and becoming ready may take longer than after a single pod failure (as many pods are potentially rescheduled). Still, we don't expect it to take longer than 120 seconds.


Download now

.json (4 kB)

It's quick and easy

  1. 1.

    Download .json file
  2. 2.

    Upload it inside Steadybit
  3. 3.

    Start your experiment!
Screenshot showing the Steadybit UI elements to import the experiment.json file into the Steadybit platform.
Tags
Kubernetes
Elasticity
GitHub
steadybit/reliability-hub-db/tree/main/recipes/kubernetes-node.drain-node
License
MIT
MaintainerSteadybit

Used Actions

See all
Drain node

Drains a node

Attack

Attack

Kubernetes nodes

Start Using Steadybit Today

Get started with Steadybit, and you’ll get access to all of our features to discover the full power of Steadybit. Available for SaaS and on-prem!

Are you unsure where to begin?

No worries, our reliability experts are here to help: book a demo with them!

Steadybit logoResilience Hub
Try Steadybit
HubActionsTargetsExtensionsRecipes
© 2024 Steadybit GmbH. All rights reserved.
Twitter iconLinkedIn iconGitHub icon