Graceful Degradation and Datadog Alerts when Postgres Suffers Latency

Your application should continue functioning properly and indicate unavailability appropriately in case of increased connection latency to PostgreSQL. Additionally, this experiment can highlight requests that need optimization of timeouts to prevent dropped requests.

Motivation

Latencies in shared or overloaded databases are common and can significantly impact the performance of your application. By conducting this experiment, you can gain insights into the robustness of your application and identify areas for improvement.

Structure

To conduct this experiment, we will ensure that all pods are ready and that the load-balanced user-facing endpoint is fully functional. We will then simulate a latency attack on the PostgreSQL database by adding a delay of 100 milliseconds to all traffic to the database hostname. During the attack, we will monitor the system's behavior to ensure the service remains operational and can deliver its purpose. We will also analyze the performance metrics to identify any request types most affected by the latency and optimize them accordingly. Finally, we will end the attack and monitor the system's recovery time to ensure it returns to its normal state promptly. By conducting this experiment, we can gain valuable insights into our application's resilience to database latencies and make informed decisions to optimize its performance under stress.

How to use this template?

Import via Hub Connection

Steadybit’s Reliability Hub is already connected to your platform. If you are an admin, you can just easily import templates with just one click.

Import template

Are you on-prem?

This is how you import Templates

Import as Experiment

Simply download the template and upload it as an experiment to use it once. Perfect if you are no administrator in the platform and just want to use the template once.