Steadybit logoResilience Hub
Try SteadybitGitHub icon
Steadybit logoResilience Hub

Graceful Degradation and Datadog Alerts when Postgres Database Can Not Be Reached

Targets:
Containers
Datadog monitors
Kubernetes cluster
Kubernetes deployments
Use Template

Graceful Degradation and Datadog Alerts when Postgres Database Can Not Be Reached

Use Template

Graceful Degradation and Datadog Alerts when Postgres Database Can Not Be Reached

Targets:
Containers
Datadog monitors
Kubernetes cluster
Kubernetes deployments
Use Template

Graceful Degradation and Datadog Alerts when Postgres Database Can Not Be Reached

Use Template
Go back to list

An unavailable database should be handled by your application gracefully and indicated appropriately Specifically, we want to ensure that at least one monitor in Datadog is alerting us to the outage. You can address a potential impact on your system by implementing, e.g., a failover or caching mechanism.

Motivation

Database outages can occur for various reasons, including hardware failures, software bugs, network connectivity issues, or even intentional attacks. Such outages can severely affect your application, such as lost revenue, dissatisfied customers, and reputational damage. By testing your application's resilience to a database outage, you can identify areas for improvement and implement measures to minimize the impact of such outages on your system.

Structure

To conduct this experiment, we will ensure that all pods are ready and that the load-balanced user-facing endpoint is fully functional. We will then simulate an unavailable PostgreSQL database by blocking the PostgreSQL database client connection on a given hostname. During the outage, we will monitor the system and ensure that the user-facing endpoint indicates unavailability by responding with a "Service unavailable" status. We will also verify that at least one monitor in Datadog is alerting us to the database outage. Once the database becomes available again, we will verify that the endpoint automatically recovers and resumes its normal operation. We will also analyze the monitoring data to identify any potential weaknesses in the system and take appropriate measures to address them. By conducting this experiment, we can identify any weaknesses in our system's resilience to database outages and take appropriate measures to minimize their impact.


Tags
RDS
Postgres
Recoverability
Datadog
Database
GitHub
steadybit/reliability-hub-db/tree/main/templates/db-postgresql.postgresql-unavailable-datadog-check
License
MIT
MaintainerAntoine Choimet (SRE)
Use Template

How to use this template?

Import via Hub Connection

Steadybit’s Reliability Hub is already connected to your platform. If you are an admin, you can just easily import templates with just one click.

Import template

Are you on-prem?

This is how you import Templates

Import as Experiment

Simply download the template and upload it as an experiment to use it once. Perfect if you are no administrator in the platform and just want to use the template once.

.json (4KB)

Block Traffic
Blocks network traffic (incoming and outgoing).
AttackAttack
Containers
>_ boost your chaos journey Connect Your Hub to Steadybit

Maximise Steadybit potential! Connect your own Hub to the platform and smoothly import your own templates: using them it’s never been this easy!

Steadybit logoResilience Hub
Try Steadybit
© 2025 Steadybit GmbH. All rights reserved.
Twitter iconLinkedIn iconGitHub icon