From http://www.performancetesting.co.za/Baseline%20Testing.htm
Introduction
Baseline tests form a very important part of the performance test methodology I follow. If done properly, 85% of performance problems can be identified and solved when proper baseline tests are done. Recent experiences on three different projects show this figure to be closer to 90%. This really has me very excited about the whole idea and implementation of baseline testing.
What is the baseline testing referred to here? Why and how are baseline tests executed? What is in place during these tests? How do baseline tests differ from other performance tests? In this article I’ll answer these and some other questions about baseline testing.
Why Baseline Tests?
The main advantage of running proper baseline tests is the time that is saved by this. Performance testing is a complex process and most often enough time is not available for proper test cycles. The time problem often leads to less baseline tests being done, but over time I’ve learnt that this is not the place to try and save time. In fact, baseline testing is where time can be saved.
A performance test project should at least include baseline and load tests. If possible a stress test and some volume and soak (endurance) testing should also be included. There should be a clear understanding of each test type. The table below describes each type of performance testing with the terminology I use in my methodology.
Performance Testing | |
Type of testing | Description |
Baseline | Done for each script with 1, 2, 5, 10, 20 and 50 users to determine baselines for mainly response times. |
Load | Test system with multiple users to determine performance under load. The number of users is usually specified by the performance test requirements. |
Stress | Load the system to its breakpoint. This is to determine the system break point or threshold. How the system breaks and recovers should also be monitored. |
Soak (Endurance) | Testing a system under load for an extended period of time to establish stability and behavior under sustained use. |
Volume | Testing a system with a certain amount of data. Usually includes high data and throughput volumes. |
When I started performance testing 9 years ago I learnt about and was trained with the term “load testing”. Sometimes people referred to or used the term “stress testing” and I was told not to worry, it’s the same thing. Today people still use mainly those two terms. The focus is definitely still on load testing and there’s nothing wrong with that. The end goal should always be a system that performs well under load. The way we get there is what matters.
Preparing for and executing a load test can be a tedious task. A lot of project time is spent on the preparation for load test execution. Data setup and management is one of the main areas that take a lot of the tester’s time. Scripting, data preparation, linking data to scripts, scenario planning and configuration, test environment preparation and monitoring all form part of the preparation before a load test can be executed. Once everything is in place the test is started, but it seldom happens that the first load test is a success. Most of the time numerous errors occur and some users will pass while others fail until all users eventually fail. The result is usually a big mess within a very short time and data that took hours to prepare will have to be prepared again. Monitoring also becomes a nightmare with different errors experienced in different places. The result of this is very busy graphs that are difficult to interpret as can be seen on the example below.
The problem here is that a lot of time is needed before the next test can be run and it can take a lot of time before something useful can be extracted from the test results. More useful results are needed in a shorter space of time and that’s what baseline testing delivers.
Useful Results Quickly
Having useful results to show quickly is a bonus on any project. As mentioned before the preparation for a load test can take a long time. During this time people will be waiting impatiently for the first test results. If the results aren’t very useful or take a long time to analyze, the situation can become difficult to manage. On the other hand, if the first results are available after a short time and easy to read and interpret, you will find a lot of happy people around you that will support you for the rest of the project. This, plus the fact that you don’t have to spend hours and hours to get ready for the next test run, is the real benefit of baseline tests. At the same time you also get the actual baseline results for the tests and can report on this immediately.
Preparation for baseline tests doesn’t take as long as preparation for load tests. Test data is not needed in large volumes and the hardware and software requirements to execute the tests are easily achieved. The results are easy to read and reports can be done very quickly. If a problem shows up during the baseline testing this is identified easily and can be isolated at the same time. Isolating problems that show on busy graphs usually takes a long time, but with baseline tests a lot of time is saved with this as well.
Executing Baseline Tests
Ideally baseline tests should be done with 1, 2, 5, 10, 20 and 50 users. This should be done for each script that will be part of the load test individually and immediately isolates any problems that might occur. The baseline tests should run for 20 to 30 minutes to get useful averages. Fight for time to do this and make it part of your script and load test preparation. The baseline tests give everyone a clear view of system performance and save a lot of time with identifying and isolating the causes of problems. Large volumes of test data is not lost with every failed test run and preparation for the next test can be completed quickly. Use these facts to motivate the need for proper baseline tests. When done properly, you’ll need a lot less time for the load testing.
The goal to work for is to start the first load test only after all the problems identified during baseline testing have been resolved. This opens the opportunity to have an almost perfect run when executing the first load test. Errors encountered during the load test will nearly all be load related with other problems already fixed during the baseline tests.
Monitoring
As with all performance testing, proper monitoring should be done during the baseline testing. Without proper monitoring you often have to repeat a test if a problem is encountered to be able to identify what is wrong where. I always keep a close eye on monitoring throughout the performance testing. This gives me a very good feel for the system and I can pick up very quickly if something is not right at any point during any test. Include all the responsible people right from the start. Meaningful results can’t be extracted without proper monitoring.
Examples – Baseline Test Results
Let’s look at some examples to put everything into perspective.
The first series of graphs show average transaction response times for a script that captures customer details. The graphs are for 1, 5, 20 and 50 users. In this example a problem was identified with a specific step, Search Postal Code, which had to be fixed before the actual load test with 500 users could be done. The data preparation for this script was horrendous with a lot of unique data that could only be used once. Starting with a 500 user load test would have caused many headaches.
The Search Postal Code transaction is highlighted in red on each graph.
Capture Details – 1 User
Capture Details – 5 Users
Capture Details – 20 Users
Capture Details – 50 Users
The degradation in performance from 1 to 50 users can clearly be seen. With 50 users the system eventually crashed. The advantage of the baseline tests are graphs with limited measurements where the problem transaction can be easily identified. If this was run with other scripts and numerous other measurements, isolating the problem could have been really difficult.
These results were available early in the project and justified the importance of the performance testing immediately. Looking good early on helps to gain respect and trust from the right people for the rest of the project.
The next example shows the importance of having monitoring in place during the baseline testing. In this example a search function caused the system to crash with 30 users. The requirement was to have a successful 1000 user test at the end of the performance testing. The time saved by running baseline tests for each individual script on this specific project was enormous. Problems as the one in the example were identified in 3 out of 7 scripts. If all the scripts were run together and with load, finding and isolating the problems responsible for the system crash would have been very difficult and time consuming.
The 30 user test for the search function failed when the users received time-outs as can be seen on the graph below. The graph shows the transaction response times with the running users.
Running Users vs. Average Response Time (users left axis, response times right axis)
The system resources showed very high CPU usage on the application server when users performed the search function. The graph below shows the CPU utilization of the application server as well as the database server during the 30 user test that failed. Note the low CPU usage on the database compared to the application server
Running Users vs. System Resources (users left axis, CPU percentage right axis)
Resource monitoring helped to identify the problem quickly. As in the first example, these results could be reported on early and meaningful results were distributed soon after the start of the testing effort. The importance of having monitoring in place at all times is highlighted once again.
Summary
With numerous successes as shown in the examples above, baseline testing has proven itself as an essential part of the performance testing process. The time saved by finding and isolating problems early on is enormous. Showing good, meaningful results early on a project has many advantages. It highlights the importance of performance testing and showing the improvements when problems are fixed make the rest of the project a very positive experience.
Include baseline tests in your planning and leave enough time for this. Do it properly, don’t take any shortcuts. The benefits are great and when the actual load testing begins, the system will already perform well. You will also know the system well by then and will be able to see changes in trends or performance immediately. This makes the testing just that much better and you a better performance tester.