Delay Traffic
Attack
Containers
Delay Traffic
Attack
Containers
Kubernetes deployment survives Redis latency
Verify that your application handles an increased latency in a Redis cache properly, allowing for increased processing time while maintaining throughput.
Motivation
Latency issues in Redis can lead to degraded system performance, longer response times, and potentially lost or delayed data. By testing your system's resilience to Redis latency, you can ensure that it can handle increased processing time and maintain its throughput during increased latency. Additionally, you can identify any potential bottlenecks or inefficiencies in your system and take appropriate measures to optimize its performance and reliability.
Structure
We will verify that a load-balanced user-facing endpoint fully works while having all pods ready. As soon as we simulate Redis latency, we expect the system to maintain its throughput and indicate unavailability appropriately. We can introduce delays in Redis operations to simulate latency. The experiment aims to ensure that your system can handle increased processing time and maintain its throughput during increased latency. The performance should return to normal after the latency has ended.
Containers
Datadog monitors
Kubernetes cluster
Kubernetes deployments
Graceful degradation and Datadog alerts when Postgres suffers latency
Your application should continue functioning properly and indicate unavailability appropriately in case of increased connection latency to PostgreSQL. Additionally, this experiment can highlight requests that need optimization of timeouts to prevent dropped requests.
Motivation
Latencies in shared or overloaded databases are common and can significantly impact the performance of your application. By conducting this experiment, you can gain insights into the robustness of your application and identify areas for improvement.
Structure
To conduct this experiment, we will ensure that all pods are ready and that the load-balanced user-facing endpoint is fully functional. We will then simulate a latency attack on the PostgreSQL database by adding a delay of 100 milliseconds to all traffic to the database hostname. During the attack, we will monitor the system's behavior to ensure the service remains operational and can deliver its purpose. We will also analyze the performance metrics to identify any request types most affected by the latency and optimize them accordingly. Finally, we will end the attack and monitor the system's recovery time to ensure it returns to its normal state promptly. By conducting this experiment, we can gain valuable insights into our application's resilience to database latencies and make informed decisions to optimize its performance under stress.
Containers
Datadog monitors
Kubernetes cluster
Kubernetes deployments
Third-party service suffers high latency for a Kubernetes Deployment
Identify the effect of high latency of a third-party service on your deployment's service's success metrics.
Motivation
When you provide a synchronous service via HTTP that requires the availability of other downstream third-party services, you absolutely should check how your service behaves in case the third-party service suffers high latency. Also, you want to validate whether your service is working again as soon as the third-party service is working again.
Structure
We ensure that a load-balanced user-facing endpoint fully works while having all pods ready. When we simulate the third-party service's high latency, we expect the user-facing endpoint to work within specified HTTP success rates.. To simulate the high latency, we can delay the traffic to the third-party service on the client side using its hostname. The endpoint should recover automatically once the third-party service is reachable again.
Containers
Kubernetes cluster
Kubernetes deployments
Graceful degradation of Kubernetes deployment while Kafka suffers a high latency
Verify that your application handles an increased latency in your Kafka message delivery properly, allowing for increased processing time while maintaining the throughput.
Motivation
Latency in Kafka can occur for various reasons, such as network congestion, increased load, or insufficient resources. Such latency can impact your application's performance, causing delays in processing messages and affecting overall throughput. By testing your system's resilience to Kafka latency, you can identify any potential weaknesses in your system and take appropriate measures to improve its performance.
Structure
To conduct this experiment, we will ensure that all Kafka topics and producers are ready and that the consumer receives and processes messages correctly. We will then induce latency on Kafka by introducing a delay on all incoming and outgoing messages. During the experiment, we will monitor the system to ensure it continues delivering its intended functionality and maintaining its throughput despite the increased processing time. We will also analyze the monitoring data to identify any potential bottlenecks or inefficiencies in the system and take appropriate measures to address them. Once the experiment is complete, we will remove the latency and monitor the system's recovery time to ensure it returns to its normal state promptly. By conducting this experiment, we can identify any potential weaknesses in our system's resilience to Kafka latency and take appropriate measures to improve its performance and reliability.
Containers
Datadog monitors
Kubernetes cluster
Kubernetes deployments