Stop Container

If a pod becomes temporarily unavailable, you want to ensure that Kubernetes is properly reacting, excluding that pod from the Service and restarting it.

Motivation

If configured properly, Kubernetes can detect a non-responding pod and try to fix it by simply restarting the unresponsive pod. Even so, the exact configuration requires careful consideration to avoid killing your pods too early or flooding your cluster's traffic with liveness probes.

Structure

Before killing a container of a Kubernetes pod, we verify that a load-balanced user-facing endpoint is working properly and that all Kubernetes deployment's pods are marked as ready. As soon as one container crashes, Kubernetes should detect the crashed container via a failing liveness probe and mark the related pod as not ready. Now, Kubernetes is expected to restart the container so the pod becomes ready within a certain time. The user-facing HTTP endpoint may suffer from degraded performance when being under load (e.g., lower success rate or higher response time). Even so, this is expected to be within the SLA boundaries.

Solution Sketch

Kubernetes liveness, readiness, and startup probes

Redundancy

Kubernetes

Containers

Kubernetes cluster

Kubernetes deployments

Reasonable recovery time in case of container failures

Quick startup times are favorable in Cloud environments to enable fast recovery and improve scaling.

Motivation

In Cloud environments, it is accepted that a pod or container may crash - the more important principle is that it should recover quickly. A faster startup time is beneficial in that case as it results in a smaller Mean Time To Recover (MTTR) and reduces user-facing downtime. Also, in case of request peaks, a reasonably short startup time allows scaling the deployment properly.

Structure

We simply stop a container of one of the pods to measure the time until it is marked as ready again. Therefore, before stopping the container, we ensure that the deployment is ready. If so, we stop the container and expect the number of ready pods to drop. Within a reasonable time (e.g., 60 seconds), the container should start up again, and all desirable pods should be marked as ready.

Solution Sketch

Kubernetes liveness, readiness, and startup probes

Scalability

Recoverability

Kubernetes

Starter

Containers

Kubernetes cluster

Kubernetes deployments

More Container Actions

See all

Start Using Steadybit Today

Get started with Steadybit, and you’ll get access to all of our features to discover the full power of Steadybit. Available for SaaS and on-prem!

Are you unsure where to begin?

No worries, our reliability experts are here to help: book a demo with them!

Statistics

-Stars

Stop Container

Stop Container

Stop Container

Stop Container

Use Cases

Parameters

Load balancing hides a single container failure for end users

Motivation

Structure

Solution Sketch

Reasonable recovery time in case of container failures

Motivation

Structure

Solution Sketch

Block DNS

Block Traffic

Corrupt Outgoing Packages

Delay Outgoing Traffic

Drop Outgoing Traffic

Fill Disk