Splunk
Splunk
A Steadybit check implementation for data exposed through Splunk Observability Cloud.Splunk
Splunk
A Steadybit check implementation for data exposed through Splunk Observability Cloud.YouTube content is not loaded by default for privacy reasons.
Introduction to the Splunk Extension
The Steadybit Splunk Extension bridges the world of Steadybit and Splunk. The extension adds checks to your Chaos Engineering experiments to validate Splunk Detector / SLO state and reports events of your experiments to Splunk as custom events to ease correlation.
Integration and Functionality
Integration of Splunk into Steadybit works via the Splunk Observability Cloud API. Thus, all you need is a Splunk's Access Token with api and ingest permissions.
Integration of Splunk in Steadybit
With the Detector Check and SLO Check you can integrate your Splunk detectors/SLOs into your experiments. Verify that the Splunk Detector or SLO notice a fault injected by Steadybit to check that your observability strategy is working as expected.
Integration of Steadybit in Splunk
The extension automatically reports experiment executions to Splunk Observability Cloud, which helps you to correlate experiments with your dashboards.
Installation and Setup
To integrate the Splunk extension with your environment, follow our setup guide.
Splunk Detector fires when a Kubernetes pod is in crash loop
Verify that a Splunk detector alerts you when pods are not ready to accept traffic for a certain time.
Motivation
Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.
Structure
First, check that the Splunk detector responsible for tracking non-ready containers is in an 'okay' state. As soon as one of the containers is crash looping, caused by the crash loop attack, the Splunk alert rule should fire and escalate it to your on-call team.
Solution Sketch
- Kubernetes liveness, readiness, and startup probes