Analyzing Performance-Testing Results to Correlate Performance Plateaus and Stress Areas
A year or so ago I had the pleasure of attending a conference at which Scott Barber gave two presentations on performance testing. The first presentation was on the effective presentation of performance test data; the second was on the modeling of application user communities. After watching both presentations and talking with Scott, I was able to draw a couple of insights. First, many times we focus too much on the problems that are easy to identify, rather than taking the time to determine where the real problems may be hiding. Second, we tend to performance test for the sake of performance testing, rather than taking the time to understand the usage of the application and the business drivers within which it operates. This behavior results in not knowing what to test and in not understanding how much performance testing is enough for the application.
In this article, I'll build on some of Scott's work to show how we can combine performance-degradation curves and complex performance scenarios to help determine "good enough" quality for an application in terms of performance. Throughout the article, I'll refer to Scott's work by providing a quick summary and stealing an example for illustration, and then move on to the next topic. I leave it to you to do the research necessary to fully understand the summarized content. This article is intended for the experienced performance tester or test manager.
Before we jump into the guts of this article, it might be good to establish some working definitions and concepts. Let's start with performance-degradation curves. In his article on creating a performance degradation curve, Scott Barber outlines a basic response-time degradation curve. If you're not familiar with this work, take a minute to read that article first; it sets the stage for what we're about to cover.
Figure 1 is an example of a response-time degradation curve. Degradation curves are common among performance testers; they go by various names, so forgive me if you know this curve by another name. A response-time degradation curve plots the response time experienced by the user against the user load. It's worth pointing out that the various user loads represented on one of these plots all use the same user-community model (explained in more detail later in this article). Later on, I'll discuss how to compare loads based on different models. This example shows the response times for two web pages (the home page and page 1) under differing loads (from 1 to 200 users). Curves like the one in Figure 1 are good for comparing specific page-response times across multiple tests using the same model, graphically displaying where performance starts to decline and where performance becomes unacceptable.
Figure 1 A basic response-time degradation curve.
The shape of a typical response-time degradation curve can be broken down into four regions (see Figure 2):
- The single-user region is just that—the response time for a single user on the system. This is useful for establishing a point of reference.
- The performance plateau shows the best performance you can expect under the specific conditions of that particular test without further performance tuning. This area represents good candidates for baselines and/or benchmarks.
- The stress region is where the application "degrades gracefully." Typically, the max recommended user load is the beginning of the stress region.
- The knee in performance is the point where performance "degrades ungracefully."
Figure 2 Four regions of the response-time degradation curve.
These regions are typically used by testers to help them determine where performance starts to degrade for any given portion of the application. It has been my experience that these charts are used primarily for two purposes:
- The effective display of performance information, in an effort to show "good enough" performance or poor performance in relation to some stated requirement. For example, if I had a requirement stating that the home page must load in under six seconds with 100 concurrent users, I could confirm that requirement using the chart in Figure 2. If the requirement was for 200 users, I could use the same graph to show that more work needs to be done to meet the requirement.
- As a tool used to determine the knee in performance while performance tuning. Where the knee occurs is the absolute maximum load you ever want your application/system to encounter. Data collected after the knee is the load data that exploits your critical bottleneck; this data is then used to research and correct performance bottlenecks. Many times this is an iterative process in an effort to push the knee in performance further to the right (or to a higher load).
While many testers are interested in the stress region and the knee in performance, in this article we'll take a slightly different view on this data.