If the goal is Scalability, where we want to scale from 1 β 1,000,000 users and we need to make sure our system can handle that load. We need to prove this load analytically and quantitatively.
- We need to do load testing for:
- Workload increases
- Spikes in workload increases
- Uptime requirements (here we need endurance tests)
This is not the same as stress testing
To satisfy a load test, we basically just need a yes or no answer, an exact number would be a stress test.
Test Planning
- What will be tested
- How will we test
- How will we know the test is passed
Although, remember that testing adds cost. Therefore, we want to test critical workflows. Critical is determined by product requirements, good things to load test:
- Computationally intensive work (takes a lot of effort)
- User experience
- External timing requirements
Note
We need to mimic multiple users repeatedly performing the same tasks for hours or even days.
Designing a Load Test
- Test in an environment that is as close to production as possible (donβt typically worry about hardware)
- Also use a real workload (close to live customer data)
- Lighter workloads are okay for regression testing but you need to actually put your system under pressure to see how it performs under pressure
- You can simulate pressure by limiting ram or concurrently running something CPU intensive
- Results that are achieved also need to be reproducible
- There is likely some variation though
There are different types of loads you can apply:
- Aggregate Workloads break down load by share
- Use-Case-Based Workloads are derived from UML, etc
Designing Fault Inducing Loads
You need to analyze either source code or system models.
Source Code:
- Identify potential load sensitive modules and regions for load sensitive faults. The Load Sensitivity Index (LSI) indicates the net increase / decrease of heap space used for each iteration.
- We want to write test cases which exercise the code regions with high LSI indices. System Models
- Software systems can be modelled into Queues, where each service is a queue.
- We want to conduct a simulation on the queue model to know which service is likely the bottleneck and design tests to exercise that bottleneck.
Tip
- If users have similar behaviours we can group them together
- If two sequences of actions are similar, they can also be grouped
We want to measure the frequency that users do things, group similar users and then choose a representative in each cluster
Running Load Tests
We want to use driver-based test execution since:
- Easy to automate
- Scales to large numbers of requests However:
- We need to load driver configs
- Hard to track some system behaviour There are specialized benchmarking tools to do this. Here we want to:
- Setup our test
- Generate load
- Monitor and collect data from our test
System Deployment for live and driver based executions
- Field load testing is realistic but costly
- Hardware selection:
- Dedicated hardware
- Cloud based testing
- Creating realistic databases
- Import realistic raw data
- Sanitize a prod database
- Mimimic realistic network traffic
We need to recruit users for live executions but for driver based ones we can configure out workload. Recording user behaviour and replaying that behaviour to simulate load is also a good idea.
Generating Load
- Static configuration
- Time driven where we change the workload if a performance counter hits a value
- Dynamic configuration
- Dynamically steer the testing loads based on system feedback
- We can dynamically do this by instrumentation like we do in Profiler Guided Optimization We can accomplish both with a driver.
Endurance Tests
- You get a degradation of performance due to the accumulation of things like swapping memory, log lengths getting too long, filling up disk, etc
- We also need to determine how long to run a test for, this depends on product requirements
Evaluating Success
There are 2 main answers we get from load testing:
- Can the system handle a load of X?
- What is the maximum load Y the current system can handle (failure point) Sometimes the first question is all we need. It might be expensive to find Y.
We can compare our data against threshold values (max, median, average, p90), we can also compare against derived data.
We can also look for patterns in our data to find problems (memory leaks, patterns in logs, deadlocks due to throttling).
By looking at our data, we can establish a baseline for normalcy, and autonomously flag behaviours that violate our rules for normalcy.
We also may need to add observability to be able to decide if a test passes. The raw results also might need post processing. Once again, we know that due to Flaky Tests, testing is not binary, it is a distribution. We need to be able to consistently handle load.
In terms of failing a load test, we can make our programs better, but there is a point where we cannot do better due to limitations. This means that maybe the software we have is not the right one (redesign), maybe we need to change our expectations and think outside the constraints and product requirements.
Summary
- Load testing gives a picture at a given moment, we need to re-test to make sure we catch slowdowns
- Software tends to become more complex over time (slower), it takes effort to stay on top of it
- Load testing techniques are similar to black box system level testing
- This depends more on human expertise and domain knowledge than functional testing
Measurement Bias
This is hard to avoid and unpredictable. This makes results seem non-deterministic.