Load Balancing Hides a Single Container Failure for End Users

If a pod becomes temporarily unavailable, you want to ensure that Kubernetes is properly reacting, excluding that pod from the Service and restarting it.

Motivation

If configured properly, Kubernetes can detect a non-responding pod and try to fix it by simply restarting the unresponsive pod. Even so, the exact configuration requires careful consideration to avoid killing your pods too early or flooding your cluster's traffic with liveness probes.

Structure

Before killing a container of a Kubernetes pod, we verify that a load-balanced user-facing endpoint is working properly and that all Kubernetes deployment's pods are marked as ready. As soon as one container crashes, Kubernetes should detect the crashed container via a failing liveness probe and mark the related pod as not ready. Now, Kubernetes is expected to restart the container so the pod becomes ready within a certain time. The user-facing HTTP endpoint may suffer from degraded performance when being under load (e.g., lower success rate or higher response time). Even so, this is expected to be within the SLA boundaries.

Solution Sketch

Kubernetes liveness, readiness, and startup probes

How to use this template?

Import via Hub Connection

Steadybit’s Reliability Hub is already connected to your platform. If you are an admin, you can just easily import templates with just one click.

Import template

Are you on-prem?

This is how you import Templates

Import as Experiment

Simply download the template and upload it as an experiment to use it once. Perfect if you are no administrator in the platform and just want to use the template once.