Basic Java Performance Lingo
Let's begin our discussion of performance with a quick review of key performance
The terminology used in performance analysis may be new to you, but you already know the underlying concepts from everyday life. In this chapter, we apply performance terminology to familiar experiences. We use analogies based on "brick and mortar" stores in our neighborhoods to describe performance concepts found in the virtual world of web site software. These analogies also demonstrate how performance terminology really describes everyday reality.
Load: Customers Using Your Web Site
Let's consider a traditional brick and mortar bookstore. A traditional bookstore serves customers and contains a certain number of customers at any point in time. If we use the store's security camera to take a snapshot of the sales floor at some point during the day, we get a picture similar to Figure 1.1.
Figure 1.1 Customers in a traditional bookstore
In this picture, we see some of the customers browsing the shelves, while others interact with the store clerks. The customers ready to make purchases go to the clerks operating the cash registers. Other customers needing assistance with a book selection go to the clerk at the information desk.
The bookstore frequently contains more customers than clerks. Specifically, the bookstore contains more customers than cash registers. Intuitively, we know it usually takes some browsing time on the customer's part to pick out a selection for purchase, so we don't need a clerk for every customer.
An on-line store also serves customers, though these customers are represented by requests to the web site instead of a physical presence in a store. The on-line store uses computing resources to handle customer requests (the electronic equivalent of our bookstore clerks).
Surprisingly, the customers using our on-line bookstore behave much like the customers visiting a brick and mortar bookstore. The on-line customers request pages from our web site (like asking the clerk at the info desk for help), and then spend some amount of time looking at a given page in their web browsers before making their next request. As they do in our brick and mortar store, the on-line customers may browse for a while before they make a purchase and often they make no purchase at all.
Figure 1.2 shows an on-line bookstore with a total of 11 customers: 2 with requests being handled by computing resources and 9 browsing recently created web pages. As this figure shows, at any point in time, an on-line store typically has more customers than the number of requests being handled. In fact, for some web sites, the customers reading pages far exceed those actively making requests to the web site.
Figure 1.2 Customers using an on-line bookstore
So how does this all relate to load? Load is all of the customers using your web site at a point in time. Load includes customers making requests to your web site as well as those reading pages from previous requests. For example, the customer load in our bookstore snapshot in Figure 1.1 was 11 customers. (A performance expert might tell you the store is under an "11-customer load" at this point). In performance testing, we often "put the system under load." This means we plan to generate customer traffic against the system, often by using special test software to generate customers.
Obviously, a term such as a lightly loaded system means the web site has few customers. Likewise, a heavily loaded system means the web site has many customers. Also, we often drop the term customer in the web space, preferring the terms user, client, or visitor instead. (Not all web sites actually sell things, so "customer" is not always appropriate.) However, we do need some way of differentiating between all clients and those clients actually making requests to our web site. We use the terms concurrent load and active load to make this distinction. Let's discuss these terms in more detail.1
Concurrent Load: Users Currently Using the Web Site
As we noticed earlier, inside our traditional bookstore some customers look for books, while others interact with the clerks. For example, look at a security camera snapshot in Figure 1.3. At this point in time, the store contains 11 customers: 5 customers browsing, 1 requesting help from the info desk clerk, and 5 interacting with clerks at the cash registers to make purchases. Concurrent load refers to all of the customers in the store at a point in time, regardless of their activity.
Figure 1.3 Concurrent users in a traditional bookstore
Web site visitors reading a previously requested web page resemble customers browsing in our traditional bookstorethey're using the web site, but not actively engaging it to satisfy a request. The concurrent load for your web site includes the customers browsing previously requested pages in addition to the active clients. Figure 1.4 shows an on-line bookstore with 6 active client requests, and 5 users browsing previously created pages for a total of 11 concurrent clients.
Figure 1.4 Concurrent web site load
You may wonder why we care about the total users rather than just those making requests. Look at the traditional store example. All of the customers in the store use resources, even if they're not interacting with a clerk. For instance, the store owner provides floor space where browsing customers can stand as well as parking places out front for their cars. Likewise, web site visitors often require web site resources even when they're just reading a web page on their browsers. The web site often uses memory and other resources to keep information about users during their visit. (We expand this topic in Chapter 2's discussion of HTTP sessions.) Since any user potentially consumes resources, regardless of her current activity, we sometimes need to consider all of the users visiting our web site. This is concurrent load.
Active Load: Customers Making Requests of the Web Site
Active load refers to customers currently making requests. In a brick and mortar store, the active customers are interacting with the sales staff. The bookstore may contain many customers, but only a subset of these actually is at the cash registers or at the information desk interacting with the sales clerks. We call these customers "active" because they actively want some service from the store to buy or find a book. Figure 1.5 shows a traditional bookstore with six active customers.
Figure 1.5 Active customers in a traditional bookstore
Much as they do in the traditional bookstore, on-line customers also make requests of the web site. These requests require web resources, particularly processing resources, to complete. For example, the on-line customer might request a search for books by a particular author, or request to purchase the items in an on-line shopping cart. See Figure 1.6 for an example. By definition, the term active applies to the user from the moment the request arrives at our web site until the requested information returns to the user.
Figure 1.6 Active web site client
Peak Load: Maximum Concurrent Web Site Customers
After the bookstore opens, customer traffic rarely remains constant throughout the day. The store usually receives more customers during lunchtime or after school than in the morning or late at night. Over the course of a week, the store probably receives most of its traffic on Saturday. Over the course of a year, the Christmas holiday represents the store's overall busiest period, with the day after Thanksgiving being the busiest day of the year.
Peak load refers to the maximum concurrent customers in the store within some time period. For example, Figure 1.7 shows a graph of hourly customer visits during the course of a day. Notice that the store experiences two activity "spikes" during lunchtime and between 4 PM and 5 PM each day. The peak period, however, is the hour between 4 PM and 5 PM each day. The store receives its peak load (50 customers) during this period.
Figure 1.7 Traditional store customers per hour
Peak load is not the average load. The store shown in Figure 1.7 averages 25 customers an hour during the day. If we only consider averages, the store might not schedule enough sales clerks to handle the 50 customers arriving between 4 PM and 5 PM. Note that our concerns about averaging apply to the time scale as well. Perhaps Figure 1.7 represents an average day at the bookstore. While this information proves useful for scheduling most of the year, it doesn't tell us how many people arrive on our peak day (the day after Thanksgiving) or their arrival distribution during that day.
Like a traditional bookstore, web sites experience an uneven distribution of users throughout the day. The peak user arrival times vary from web site to web site. For example, brokerage web sites receive intense load every day at market opening, while an on-line bookstore receives most of its traffic at lunchtime or after the workday ends. Also, web sites, like their brick and mortar counterparts, experience unusually high traffic during peak times such as the Christmas holidays or a stock market rush. However, web site traffic patterns differ significantly from those of traditional stores. An on-line bookstore's lunchtime spike often lasts three hours as lunchtime rolls across time zones from the East Coast to the West Coast.
Again, the peak load is our planning focus. A web site unable to support its peak traffic is an unsuccessful web site. When planning your performance test, find the peak loading goals for your web site, and build the test to exercise this load. (See Chapter 6 for details on developing an accurate performance test.)
Throughput: Customers Served over Time
A clerk sells a customer a book at the cash register. This transaction requires one minute. Let's assume it always takes one minute to complete a customer sale, regardless of how many books the customer buys. In this case, if the store contains five cashiers, the store serves five customers per minute. (Figure 1.8 illustrates this scenario.) In performance terms, five customers per minute is the store's throughput. Throughput measures the customers served relative to some unit of time. It is a unique metric because it has an upper bound. No matter how many customers come to the store, the maximum number handled during a specific time interval remains unchanged. (We know from experience the clerk cannot check out two customers simultaneously.)
Figure 1.8 Throughput dynamics without waiting
So with five clerks operating five cash registers, our store serves a maximum of five customers a minute. This represents our maximum throughput, regardless of how many customers actually want to checkout during this time. For example, Figure 1.9 shows the same store, only now with ten customers; however, the clerks still sell at a rate of one customer per minute. So, with five clerks, we still only serve five customers per minute (our store's maximum throughput remains unchanged). Thus five of the customers must wait or queue. After reaching the throughput upper bound, adding more users to the store does not increase throughput. This is, as you might guess, an important concept to grasp.
Figure 1.9 Throughput dynamics with waiting
An on-line site exhibits the same throughput dynamics as the traditional store. An on-line store handles requests, typically initiated by a customer using a web browser. For example, a customer may search for particular books or purchase the contents of a shopping cart. A web site handles a specific number of requests in parallel; for example, a web site may handle 20 customer requests simultaneously. If each request takes one second to process, the on-line store throughput is 20 requests per second.
Just as in our traditional store, adding more requests does not increase throughput after we reach our throughput upper bound. If the web site receives 30 requests per second, but the maximum throughput is 20 requests per second, some of the requests must wait or queue.
Throughput Curve: Finding the Throughput Upper Bound
A performance test uses a series of test runs to understand the relationship between load and throughput. A graph of the data from these runs establishes the throughput curve for your system. As the load on your system increases, the throughput usually increases as well until it reaches the throughput upper bound (maximum throughput). On the graphs shown in this section, we plot the load on the x-axis and the throughput (requests per second or customers per second) on the y-axis.
Figure 1.10 shows the throughput curve for the bookstore. With one customer, the throughput is one customer per minute; at two customers, it is two customers per minute. This pattern continues through five customers, with a throughput of five customers per minute. The graph shows that as the load increases between one and five customers, the throughput increases to a maximum of five customers per minute. Once the load reaches five customers, all five clerks are busy at the cash registers. Throughput then remains a constant five customers per minute, even as the number of customers increases. After we reach maximum throughput, we have reached the throughput plateau. Beyond maximum throughput, adding load, or users, results in a consistent, flat throughput curve (a plateau). For the bookstore, the throughput plateau is five customers per minute.
Figure 1.10 Throughput curve: brick and mortar store
Web sites produce a similar throughput curve with a throughput plateau. Figure 1.11 shows a typical throughput curve for a web site. (Obviously, this graph contains more data points than the bookstore throughput graph.) The "Light load zone" in the figure shows that, as the number of user requests increases, the throughput increases almost linearly. At light loads, requests face very little congestion for resources. After some point, congestion starts to build up, and throughput increases at a much lower rate until it reaches a saturation point. This is the throughput upper-bound value.
Figure 1.11 Typical web site throughput curve. From Gennaro Cuomo, "A Methodology for Production Performance Tuning," an IBM WebSphere Standard/Advanced White Paper. Copyright 2000 by IBM Corp. Reprinted by permission of IBM Corp.
The throughput maximum typically represents some bottleneck in the web site, usually a saturated resource (the on-line equivalent of all sales clerks being busy). The CPU often becomes the constraining resource on your web site. After your CPU(s) reach 100% utilization, the web site lacks processing capacity to handle additional requests.
As client load increases in the "Heavy load zone" in Figure 1.11, throughput remains relatively constant; however, the response time increases proportionally to the user load (see the next section on response time for more details). At some point, represented by the "buckle zone," one of the system components becomes exhausted, and throughput starts to degrade. For example, the system might enter the buckle zone if the network connections at your web server exhaust the limits of the network adapter, or if you exceed the operating system limits for file handles.
So far, we've discussed throughput generically as "requests per second." We often hear throughput discussed in terms of hits, transactions, pages, or users in some unit of time (usually a second, but sometimes in terms of a day or week). Not surprisingly, how you measure your throughput makes a big difference in how you set up your tests, and also affects your hardware plan. We briefly discuss the differences between hits, transactions, pages, and users in the next sections. A more thorough discussion, with an example, appears in Chapter 6.
A hit may mean one of several different things. For an HTTP server specialist, a hit means a request to the HTTP server. Because HTML (Hyper Text Markup Language) pages usually contain embedded elements, such as gifs or jpegs, one HTML page might require multiple HTTP "hits" as the browser retrieves all of the elements from the server to build a page.
Regrettably, the rest of the world uses the term hit in very ambiguous ways. Sometimes hit refers to an entire page, including embedded elements. Also, many companies routinely use hit to mean an entire site visit by a given user. (A site visit usually encompasses many pages, not to mention the embedded elements and frames included in those pages.) Therefore, you must first make sure everyone discussing a "hit" is discussing the same thing. Any misunderstanding on this point drastically impacts the success of the performance test and the production web site.
Transaction rate is the most common measurement of throughput. (Web sites often measure their throughput in transactions per second.) Usually, web sites define a transaction as a single HTTP request and response pair. However, the definition for transaction, like that for hit, often means different things to different people. Sometimes transactions involve more complex behavior than dealing with a simple request/response pair. Within your team, establish your definition of a transaction, and use it consistently.
Since it is important to look at your web site as a whole, understanding throughput in terms of pages per second makes sense. Your web site may handle a high request volume in terms of HTTP requests during the day. However, if each of your web pages contains many embedded elements, this request volume may not translate into a high page volume. (It takes many requests, in this case, to build a single page.) From a user's perspective, the throughput of your web site could be much lower than your transaction volume.
The user rate measures the users visiting the web site during a period of time. The web site receives many visitors over the course of the day. They interact with the web site to perform one or more tasks, which may involve navigating through several pages.
The most common definition of user refers to one of potentially many visits a user might make to the web site over the course of day. The user visit includes the set of web pages navigated while using the web site. Because a user visit encompasses multiple pages, the user rate is usually lower than the page rate or transaction rate for a web site.
In practice, the definition of user varies from team to team. For example, web masters frequently interchange user with hit (a single request/response pair, not a multipage web site visit). See Chapter 6 for an expanded discussion. Again, you and your team need to pick a consistent definition and stick with it.
Response Time: Time to Serve the Customer
In the traditional store, each customer sale takes a certain amount of time. For example, it may take a sales clerk one minute to check out a customer and complete a sale. Prior to actually purchasing a book, you may wait in line for an available clerk. This wait time adds to the length of time required to complete a purchase. Figure 1.12 shows the last customer in line waiting four minutes to reach the sales clerk. After the customer reaches the clerk, the sale requires the standard one minute of processing time to complete. Therefore, this customer's total checkout time includes the four minutes of waiting, plus the one minute of actually checking out, resulting in a total checkout time of five minutes.
Figure 1.12 Total checkout time in a traditional store
A customer who initiates a request from a browser waits a certain amount of time before the web page resulting from the request appears in the browser. For example, if the on-line customer issues a request to purchase a book, the browser submits the request to the web site, and the customer waits until the web site returns an order confirmation page to the browser. Note that this works much like our traditional bookstore: Our purchase takes some quantity of time, regardless of whether we wait in line or not prior to making the purchase.
Web site response time refers to the time from when the customer initiates a request from his browser until the resulting HTML page returns to the browser. (Technically, response time refers to the time between the request and the display of all the page's content. However, the user's perceived response time spans the request to the first appearance of returning page data in the browser.)2
Remember that the customer's request shares the web site with potentially many other simultaneous requests. If the request finds the processing capacity of your web site fully engaged, the request "waits." (Actually, the web site usually queues the request until processing capacity becomes available to satisfy it.) Just as in our traditional bookstore, the time spent waiting adds to the actual service time of the request. The total response time in this case is the wait time plus the actual service time.
Again, response time is the time it takes to serve the customer. In both the brick and mortar example and the on-line store, the response time consists of the total time it takes to purchase the book. For example, if a customer submits a request to purchase a book, we measure the response time from when the user submits the request via a browser until the browser displays the confirmation page. This time includes any "wait time" for busy resources. Figure 1.13 shows the response time for a typical web request.
Figure 1.13 Response time in an on-line store
Also, in either a traditional or on-line environment, customers come with limited patience. If response time grows too long, the customer stops waiting and leaves, maybe never to return to your store or web site. Therefore, response time is the critical measurement for most web sites. Performance testing strives to minimize response time and ensure it does not exceed your web site objectives, even during peak loading.
Think time is the time a user takes between submitting requests. Think time and response time represent two different concepts. Response time measures the time the user waits for a response to return from the web site (including any wait time for server resources). Think time, however, measures the interval between a user's requests. During the think time, the server performs no work for the user because the user has not made a request.
For example, the user's first request to search for a book may require five seconds to process. After the server returns a list of books matching the search criteria, the user reads through the list and decides what action to take next. Maybe the user chooses to submit another search, or maybe he requests more details about a particular book on the list.
During this think time, the user may read material previously obtained from the server, choose further server activity, or even go for a snack. Regardless, from your web site's perspective, the user is not active during this time. For example, Figure 1.14 shows a user making two separate requests, one minute apart. The response time for each request is only five seconds, but the total time the user spends shopping is one minute and ten seconds. Chapter 7 contains a more detailed discussion of think time and the important role it plays in performance testing and capacity planning.
Figure 1.14 Think time example
While think time is beyond the scope of our control, we can (and must) carefully control and monitor our web site's response time. Make no mistake: Response time is the critical performance measurement for any web site because it represents how long users wait for a given request. Like their traditional store counterparts, on-line customers often refuse to wait patiently for an overworked server. If your site cannot deliver pages quickly, even under heavy load, you lose customers. Discouraged customers often never return to an underperforming web site. The primary objective of your performance test should be to minimize and measure your web site's response time. Even at peak load, keep your web site's response time under its responsiveness objectives.
Response Time Graph
Understanding your web site's response time requires an understanding of the relationship between load, throughput, and response time. As discussed earlier, your web site does not possess infinite throughput potential, but achieves its maximum throughput at a certain user load. Beyond this point (known as the throughput saturation point), throughput remains constant. However, the response time begins to increase. Response time increases after throughput saturation because of resource constraints. Additional load waits for these limited resources before actually doing useful work.
Let's look at the load, throughput, and response time dynamics in a brick and mortar store. If only one customer wants to check out, she receives immediate service from the cashier. If we require one minute to complete her sale, the store's throughput becomes one customer per minute. Let's extend our example by assuming that the store contains five cashiers. If five customers arrive at the same time for checkout, they all receive simultaneous service from the individual cashiers (see Figure 1.15). Since these five customers did not wait, they experienced one-minute response time (the time required to check out). Likewise, since our store served multiple customers in parallel, our throughput increased to five customers per minute.
Figure 1.15 Response time queue for a brick and mortar store
This throughput (five customers per minute) represents our store's throughput upper bound. A cashier cannot check out multiple customers at once, so our store experiences a resource constraint (cashiers) that prevents it from exceeding this boundary. If a sixth person arrives while all five cashiers are busy with other customers, the sixth person waits, or queues, until a resource (in this case, a cashier) becomes available. Time spent in the queue waiting for a resource is called wait time. In our store, the maximum wait time for the sixth person in line is one minute (the time required to check out the person currently being served by the cashier). Figure 1.15 shows an example of customers experiencing wait time.
Wait time also influences the response time for the sixth customer. We must add the wait time to his overall service time. In the case of our sixth person, the response time increases to two minutes: one minute waiting for a cash register to clear plus one minute actually spent checking out (yes, 1 + 1 = 2). However, because of limited resources (again, the cashiers), the store's throughput does not increase as more customers arrive. This demonstrates our previous claim about throughput: Once you hit the throughput upper bound, response time begins to increase as additional load enters the system. As Figure 1.15 shows, the response time doubles as the number of customers doubles past the throughput saturation point.
The on-line store exhibits the same dynamics in the relationship between throughput, response time, and load. If our on-line store handles a maximum of 5 requests per second with a one-second response time, the next 10 simultaneous requests experience a maximum response time of two seconds, while 20 simultaneous requests experience a maximum response time of four seconds, and so forth. Figure 1.16 shows the linear relationship between response time and load after reaching the throughput upper bound. (This graph does not show the throughput buckle zone, as displayed in Figure 1.11, but it still exists. Once the load goes beyond a certain point, average response time growth is more than linear.)
Figure 1.16 Throughput curve and response time graph
Understanding how many customers your web site may handle simultaneously and how long your waiting lines and response times become at your busiest times is the primary motivation behind performance load testing. This understanding, in our opinion, forms the basis of "due diligence" for any commercial web site. Clearly, we need more than a throughput curve like the one shown in Figure 1.11. We also need a response time graph to show the relationship between load, response times, and the throughput upper bound, like that shown in Figure 1.16.
This graph resembles the throughput graph in Figure 1.11 in several ways. The left y-axis plots our throughput, while the x-axis plots our load. However, we add another y-axis on the far right to plot our response times. As you see, with one to five customers, the response time averages one second. However, the web site reaches throughput saturation at this point. Beyond this point, the graph shows unchanging throughput despite increased load (a sure sign of throughput saturation). The response times for the increased load, however, grow linearly beyond the response time at the throughput saturation point. (Again, watch out for the buckle zone, where response times often grow exponentially.)
Throughout the rest of the book, we discuss what to do if your throughput and response time numbers do not meet your expectations. In fact, few web sites start with ideal throughput and response times. Usually, the first performance tests show throughput much lower and response times much higher than you actually require. The process of performance tuning focuses on improving these numbers and giving your web site the performance your customers demand. Even if your results fall well below expectations by a factor of 10, or even 100, don't despair. Keep reading. Sometimes very small performance adjustments make a tremendous difference in overall performance.
Most bookstores open for limited periods during each day, for example, 10 AM to 9 PM. Although the store opens at 10 AM, the manager might panic if a large number of people arrived at 10:01 AM to buy books. Why? Because the store needs time to fully prepare. Before the bookstore works at optimum efficiency, the staff must complete some opening activities. The bookshelves need dusting and straightening. Returned books need reshelving. The cash tills in the registers might be low on change, so the manager makes a quick trip to the bank for coins. Maybe a few employees get stuck in traffic and arrive a few minutes late. In any case, the store works most efficiently after the store staff arrives, and finishes these start-up activities. Also, the traffic levels at off-peak times tend to be low. The store rarely experiences peak traffic at 10 AM or 9 PM simply because customers do not come in large numbers during those hours.
When we measure the response time for our store, we want to measure at a time when the store operates efficiently and when it experiences significant (even peak) loading. Otherwise, our data may not be valid. For example, if we measure response time at 10:01 AM, just after opening, we may get mixed results. A customer might experience terrific response time because she's the only customer in the store. However, on some days, a customer might wait an unusually long time if the store is out of change and the manager must go to the bank for coins. If we take measurements at 10:01 AM, our measurements are outside the store's steady state. The store needs more load, or more preparation, to demonstrate its typical behavior.
Surprisingly, our performance tests must also consider steady-state issues to obtain valid data. The web site under test also experiences "opening the store" activities, albeit quite different than dusting shelves. "Opening activities" for web sites include loading servlets into memory, compiling JavaServer Pages (JSPs), priming caches, and other activities with a one-time cost. Since the vast majority of your web site users never experience the cost associated with these activities, try factoring them out of your tests by selecting your measurements carefully.
Just as the web site requires preparation time, your test usually requires time to reach full loading. Most tests start with a few virtual users and increase the load over time until achieving the maximum load for the test. We call this the ramp-up period; the load ramps up to maximum users. Likewise, all virtual users in your load may not stop at the same time, but finish the test scenario over some time period. We call this the ramp-down period as you wait for the test to complete.
We want to capture data only after all of the users start. We also want to delay the measurement slightly after starting all of the users to give the system time to adjust to the load. Figure 1.17 demonstrates this concept. The ramp-up occurs as the user load increases up to 100 users. During this time, our web site goes through its "opening activities," such as bringing servlets into memory and priming caches. Again, we do not include this time in our measurements because the data tends to be ambiguous. Likewise, we do not want to take measurements while the tests finish during the ramp-down period. Instead, note the time period during full loading when we actually take measurements. This gives us an accurate picture of how the web site operates under this load. In this case, we take measurements from our full load period, giving ourselves two time periods of buffer after ramp-up and before ramp-down.
Figure 1.17 Steady-state measurement interval
Remember, you want measurements best representing your site under load. Failing to account for the ramp-up and ramp-down periods of your performance test may invalidate the entire test.