Load Testing

If the goal is Scalability, where we want to scale from 1 → 1,000,000 users and we need to make sure our system can handle that load. We need to prove this load analytically and quantitatively.

We need to do load testing for:
- Workload increases
- Spikes in workload increases
- Uptime requirements (here we need endurance tests)

This is not the same as stress testing

To satisfy a load test, we basically just need a yes or no answer, an exact number would be a stress test.

Test Planning

What will be tested
How will we test
How will we know the test is passed

Although, remember that testing adds cost. Therefore, we want to test critical workflows. Critical is determined by product requirements, good things to load test:

Computationally intensive work (takes a lot of effort)
User experience
External timing requirements

Note

We need to mimic multiple users repeatedly performing the same tasks for hours or even days.

Designing a Load Test

Test in an environment that is as close to production as possible (don’t typically worry about hardware)
Also use a real workload (close to live customer data)
Lighter workloads are okay for regression testing but you need to actually put your system under pressure to see how it performs under pressure
- You can simulate pressure by limiting ram or concurrently running something CPU intensive
Results that are achieved also need to be reproducible
- There is likely some variation though

There are different types of loads you can apply:

Aggregate Workloads break down load by share
Use-Case-Based Workloads are derived from UML, etc

Designing Fault Inducing Loads

You need to analyze either source code or system models.
Source Code:

Identify potential load sensitive modules and regions for load sensitive faults. The Load Sensitivity Index (LSI) indicates the net increase / decrease of heap space used for each iteration.
We want to write test cases which exercise the code regions with high LSI indices. System Models
Software systems can be modelled into Queues, where each service is a queue.
We want to conduct a simulation on the queue model to know which service is likely the bottleneck and design tests to exercise that bottleneck.

Tip

If users have similar behaviours we can group them together

If two sequences of actions are similar, they can also be grouped

We want to measure the frequency that users do things, group similar users and then choose a representative in each cluster

Running Load Tests

We want to use driver-based test execution since:

Easy to automate
Scales to large numbers of requests However:
We need to load driver configs
Hard to track some system behaviour There are specialized benchmarking tools to do this. Here we want to:
Setup our test
Generate load
Monitor and collect data from our test

System Deployment for live and driver based executions

Field load testing is realistic but costly

Hardware selection:

Dedicated hardware

Cloud based testing

Creating realistic databases

Import realistic raw data

Sanitize a prod database

Mimimic realistic network traffic

We need to recruit users for live executions but for driver based ones we can configure out workload. Recording user behaviour and replaying that behaviour to simulate load is also a good idea.

Generating Load

Static configuration
- Time driven where we change the workload if a performance counter hits a value
Dynamic configuration
- Dynamically steer the testing loads based on system feedback
- We can dynamically do this by instrumentation like we do in Profiler Guided Optimization We can accomplish both with a driver.

Endurance Tests

You get a degradation of performance due to the accumulation of things like swapping memory, log lengths getting too long, filling up disk, etc
We also need to determine how long to run a test for, this depends on product requirements

Evaluating Success

There are 2 main answers we get from load testing:

Can the system handle a load of X?
What is the maximum load Y the current system can handle (failure point) Sometimes the first question is all we need. It might be expensive to find Y.

We can compare our data against threshold values (max, median, average, p90), we can also compare against derived data.

We can also look for patterns in our data to find problems (memory leaks, patterns in logs, deadlocks due to throttling).

By looking at our data, we can establish a baseline for normalcy, and autonomously flag behaviours that violate our rules for normalcy.

We also may need to add observability to be able to decide if a test passes. The raw results also might need post processing. Once again, we know that due to Flaky Tests, testing is not binary, it is a distribution. We need to be able to consistently handle load.

In terms of failing a load test, we can make our programs better, but there is a point where we cannot do better due to limitations. This means that maybe the software we have is not the right one (redesign), maybe we need to change our expectations and think outside the constraints and product requirements.

Summary

Load testing gives a picture at a given moment, we need to re-test to make sure we catch slowdowns

Software tends to become more complex over time (slower), it takes effort to stay on top of it

Load testing techniques are similar to black box system level testing

This depends more on human expertise and domain knowledge than functional testing

Measurement Bias

This is hard to avoid and unpredictable. This makes results seem non-deterministic.

🤖 Dan Huynh

Recent Notes

Dan Huynh

Linearity

CAP Theorem

Causality

Quorum Reads and Writes

Explorer

Load Testing

Test Planning

Designing a Load Test

Designing Fault Inducing Loads

Running Load Tests

Generating Load

Endurance Tests

Evaluating Success

Graph View

Recent Notes

Dan Huynh

Linearity

CAP Theorem

Table of Contents

Backlinks