New Relic

Extension

Integration of Steadybit and New Relic.

Install now

New Relic

Integration of Steadybit and New Relic.

Extension

Install now

New Relic

Extension

Integration of Steadybit and New Relic.

Install now

New Relic

Integration of Steadybit and New Relic.

Extension

Install now

Go back to list

YouTube content is not loaded by default for privacy reasons.

Introduction to the New Relic Extension

The Steadybit New Relic Extension bridges the world of Steadybit and New Relic. The extension adds checks to your Chaos Engineering experiments to validate detection of New Relic incidents and the state of workloads. Furthermore it reports events of your experiments to New Relic to ease correlation.

Integration and Functionality

Integration of New Relic in Steadybit

Integration of New Relic into Steadybit works via the New Relic GraphQL API. All you need is an API Key and the base URL of your New Relic API.

With the Incident Check you can integrate your New Relic incidents into your experiments. Check that your observability strategy is working as expected by verifying that New Relic notices a problem which is injected by Steadybit.

With the Workload Check you can check the state of a workload.

With the Create Muting Rule you can mute your alerting during an experiment to avoid false alarms and avoid incident processes.

Integration of Steadybit in New Relic

This type of integration is using the Insight Collector API of New Relic. You need an API Key of Type "LICENSE" and the base URL of your New Relic API.

The extension automatically reports experiment executions to New Relic which helps you to correlate detected incidents in New Relic.

Furthermore, you can get a dashboard to see amount of experiment executions in your New Relic environment.

Installation and Setup

To integrate the New Relic extension with your environment, follow our setup guide.

Provided Target Discovery

See all

New Relic Accounts

New Relic Workloads

Provided Actions

See all

Useful Templates

See all

New Relic detects an incident for CPU spikes in an ECS task

Validate your observability to detect a CPU spike in your AWS ECS cluster

Motivation

When you have New Relic configured to detect CPU spikes in your AWS ECS cluster, you can easily validate your observability strategy with this experiment template.

Structure

First, we validate whether New Relic has no ongoing incident. After that, we inject the CPU spike for an ECS service and expected that New Relic detect this as an incident within the given time frame of 3 minutes.

New Relic

AWS ECS

CPU

ECS Tasks

New Relic Accounts

New Relic should detect a crash looping as problem

Verify that New Relic alerts you that pods are not ready to accept traffic for some time.

Motivation

Kubernetes features a readiness probe to determine whether your pod is ready to accept traffic. If it isn't becoming ready, Kubernetes tries to solve it by restarting the underlying container and hoping to achieve its readiness eventually. If this isn't working, Kubernetes will eventually back off to restart the container, and the Kubernetes resource remains non-functional.

Structure

First, check that New Relic has no critical events for related entities. As soon as one of the containers is crash looping, caused by the Steadybit attack crash loop, New Relic should detect this via an incident to ensure your on-call team is taking action.

Solution Sketch

Kubernetes liveness, readiness, and startup probes

Crash loop

New Relic

Harden Observability

Kubernetes

Kubernetes cluster

Kubernetes pods

New Relic Accounts

New Relic should detect a disrupted workflow when a workload is unavailable

Verify that New Relic alerts you to disruptions in your workflow, such as a critical deployment without pods ready to serve traffic.

Motivation

Kubernetes features a liveness probe to determine whether your pod is healthy and can accept traffic. If Kubernetes cannot probe a pod, it restarts it in the hope that it will eventually be ready. In case it is a critical deployment, New Relic workflow should alert on this disruption

Structure

First, check that the New Relic Workflow is marked as operational As soon as all pods of a workload aren't reachable, caused by the block traffic attack, New Relic should detect this by marking the workflow as disrupted and ensuring your on-call team is taking action.