Kubernetes

Extension

A Steadybit extension to check the state of the Kubernetes cluster and inject faults.

Install now

Kubernetes

A Steadybit extension to check the state of the Kubernetes cluster and inject faults.

Extension

Install now

Kubernetes

Extension

A Steadybit extension to check the state of the Kubernetes cluster and inject faults.

Install now

Kubernetes

A Steadybit extension to check the state of the Kubernetes cluster and inject faults.

Extension

Install now

Go back to list

YouTube content is not loaded by default for privacy reasons.

Introduction to the Kubernetes Extension

The Steadybit Kubernetes Extension adds support for Chaos Engineering attacks and checks in your Kubernetes clusters. The extension works for local minikube environments and managed Kubernetes clusters, such as AWS EKS.

Integration and Functionality

The extension uses kubectl-commands and Kubernetes API calls to communicate with your cluster. The extension-container and extension-host are a good addition to support further attacks and checks in your Kubernetes cluster.

Installation of the Extension

If you've installed the Steadybit Agent in a Kubernetes cluster using our provided helm-chart, the Kubernetes extension is already installed by default.

Otherwise, you can also use the helm-chart of the Kubernetes extension to deploy the extension in your cluster.

Provided Target Discovery

See all

Kubernetes Cluster

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes HAProxy Ingresses

Kubernetes NGINX Ingresses

Kubernetes Nodes

Kubernetes Pods

Kubernetes ReplicaSets

Kubernetes Statefulsets

Provided Actions

See all

Provided Pieces of Advice

See all

Image Pull Policy Set To Always

Ensure that your containers are always running the identical latest container image.

Kubernetes

Image Pull Policy

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Image Version Explicitly Configured

Validates usage of explicit image versioning and avoids latest-tags.

Kubernetes

Explicit Versioning

Kubernetes Deployments

Limit CPU Resources

Validates that your Kubernetes resources have a CPU limit configured.

Kubernetes

Limits

CPU

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Limit Ephemeral Storage Resources

Validates that your Kubernetes resources have a ephemeral storage limit configured.

Kubernetes

Limits

Ephemeral Storage

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Limit Memory Resources

Validates that your Kubernetes resources have a memory limit configured.

Kubernetes

Limits

Memory

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

PodAntiAffinity Ensures Scheduling Pods on Different Nodes

Ensure multiple pods aren't deployed on the same cluster's node to increase availability in case one node becomes unavailable.

Kubernetes

Pod Anti Affinity

Redundancy

Kubernetes Deployments

Kubernetes Statefulsets

Probes Configured

Validates configuration of liveness and readiness probes.

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Redundant Pod Deployment

Validate pod redundancy to have at least 2 pods per deployment

Kubernetes

Redundancy

Kubernetes Deployments

Requesting Reasonable CPU Resources

Validates that your Kubernetes resources request reasonable CPU.

Kubernetes

Requests

CPU

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Requesting Reasonable Ephemeral Storage Resources

Validates that your Kubernetes resources request reasonable ephemeral storage.

Kubernetes

Requests

Ephemeral Storage

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Requesting Reasonable Memory Resources

Validates that your Kubernetes resources request reasonable memory.

Kubernetes

Requests

Memory

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes Statefulsets

Rolling Update Deployments

Validates that you deploy Kubernetes resources via rolling updates.

Kubernetes

Deployment Strategy

Rolling Update

Kubernetes Deployments

Schedule Pods Across Zones

Validate whether, right now, multiple pods of the same workload resource are deployed in the same Availability Zone.

Kubernetes

Availability Zones

Redundancy

Kubernetes Deployments

Kubernetes Statefulsets

Useful Templates (4 of 42)

See all

AppDynamics alerts when a Kubernetes pod is in crash loop

Verify that an AppDynamics health violation alerts you when pods are not ready to accept traffic for a certain time.

Motivation

Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.

Structure

First, check that the AppDynamics health violation responsible for tracking non-ready containers is in a non-violating state. As soon as one of the containers is crash looping, caused by the crash loop attack, the AppDynamics health violation should notify and escalate it to your on-call team.

Solution Sketch

Kubernetes liveness, readiness, and startup probes

Kubernetes deployment survives Redis latency

Verify that your application handles an increased latency in a Redis cache properly, allowing for increased processing time while maintaining throughput.

Motivation

Latency issues in Redis can lead to degraded system performance, longer response times, and potentially lost or delayed data. By testing your system's resilience to Redis latency, you can ensure that it can handle increased processing time and maintain its throughput during increased latency. Additionally, you can identify any potential bottlenecks or inefficiencies in your system and take appropriate measures to optimize its performance and reliability.

Structure

We will verify that a load-balanced user-facing endpoint fully works while having all pods ready. As soon as we simulate Redis latency, we expect the system to maintain its throughput and indicate unavailability appropriately. We can introduce delays in Redis operations to simulate latency. The experiment aims to ensure that your system can handle increased processing time and maintain its throughput during increased latency. The performance should return to normal after the latency has ended.

Redis

Recoverability

Datadog

Kubernetes deployment survives Redis downtime

Check that your application gracefully handles a Redis cache downtime and continues to deliver its intended functionality. The cache downtime may be caused by an unavailable Redis instance or a complete cluster.

Motivation

Redis downtime can lead to degraded system performance, lost data, and potentially long system recovery times. By testing your system's resilience to Redis downtime, you can ensure that it can handle the outage gracefully and continue to deliver its intended functionality. Additionally, you can identify any potential weaknesses in your system and take appropriate measures to improve its performance and resilience.

Structure

We will verify that a load-balanced user-facing endpoint fully works while having all pods ready. As soon as we simulate Redis downtime, we expect the system to indicate unavailability appropriately and maintain its throughput. We can block the traffic to the Redis instance to simulate downtime. The experiment aims to ensure that your system can gracefully handle the outage and continue delivering its intended functionality. The performance should return to normal after the Redis instance is available again.

Redis

Recoverability

Datadog

Certificate TLS/SSL expiry for Kubernetes deployment

Turn time forward and check whether your TLS/SSL certificates are valid.

Motivation

Noticing the TLS/SSL certification expiry too late is one problem you can easily avoid by frequently checking your expiry dates. While observability tools already handle this job nicely, you can't know whether they are working in your environment. With this experiment, you can turn the time forward to check whether your HTTPS endpoint works at a given date in the future. Additionally, you can configure one of the observability integrations to validate your observability tool's alerting.

Structure

First, we validate that the given HTTPS endpoint is working today. Next, we will travel with the host in time to validate that the HTTPS endpoint continues to work on a given date. If the TLS/SSL certificate has already expired at that date, the HTTP check will throw failures.

Warning

Please be aware that we will manipulate the time for a given Kubernetes node. Containers running at that host may struggle to deal with the change in the clock correctly, and you may experience other side effects.

Certificate Expiry

Start Using Steadybit Today

Get started with Steadybit, and you’ll get access to all of our features to discover the full power of Steadybit. Available for SaaS and on-prem!

Are you unsure where to begin?

No worries, our reliability experts are here to help: book a demo with them!

Statistics

-Stars

Kubernetes

Kubernetes

Kubernetes

Kubernetes

YouTube content is not loaded by default for privacy reasons.

Introduction to the Kubernetes Extension

Integration and Functionality

Installation of the Extension

Kubernetes Cluster

Kubernetes Daemonsets

Kubernetes Deployments

Kubernetes HAProxy Ingresses

Kubernetes NGINX Ingresses

Kubernetes Nodes

Kubernetes Pods

Kubernetes ReplicaSets

Kubernetes Statefulsets

Block Traffic to Backend

Block Traffic to Backend

Cause Crash Loop

DaemonSet Pod Count

Delay Traffic to Backend

Delay Traffic to Backend

Delete Pod

Deployment Pod Count

Deployment Rollout Status

Drain Node

Kubernetes Event Logs

Node Count

Pod Count Metrics

ReplicaSet Pod Count

Rollout Restart Deployment

Scale Deployment

Scale ReplicaSet

Scale StatefulSet

Set Image

StatefulSet Pod Count

Taint node

Image Pull Policy Set To Always

Image Version Explicitly Configured

Limit CPU Resources

Limit Ephemeral Storage Resources

Limit Memory Resources

PodAntiAffinity Ensures Scheduling Pods on Different Nodes

Probes Configured

Redundant Pod Deployment

Requesting Reasonable CPU Resources

Requesting Reasonable Ephemeral Storage Resources

Requesting Reasonable Memory Resources

Rolling Update Deployments

Schedule Pods Across Zones

AppDynamics alerts when a Kubernetes pod is in crash loop

Motivation

Structure

Solution Sketch

Kubernetes deployment survives Redis latency

Motivation

Structure

Kubernetes deployment survives Redis downtime

Motivation

Structure

Certificate TLS/SSL expiry for Kubernetes deployment

Motivation

Structure

Warning