Now that we've covered some of the basic performance measurement terms, let's discuss some of the terminology associated with optimizing your web site. After all, we want not only to know about our web site's performance, but also to improve it. So how do you improve your web site's performance? For example, recall our earlier discussion of performance issues such as the throughput upper bound. What do you do after you find the upper bound, and it is lower than you hoped? At this point, you need to optimize your web site or apply scaling techniques to push the site beyond its current performance.
Optimization usually focuses on the web application. You may reduce the number of steps the application performs (known as "shortening the application's path length") to save time. You also may eliminate key resource bottlenecks by pooling resources, or making other adjustments to improve resource availability. Optimizing a web site requires a knowledge of how it behaves for any given user, as well as how it behaves under load from many users. For example, we might discover our web site's "checkout" function requires 20 seconds to complete for a single user. We must optimize this function to improve the response time. Likewise, under a light load of ten users, we find our search function response time increases from 3 seconds to 50 seconds. In this case, we need to remove a resource contention only present under load.
Scaling techniques, on the other hand, focus on adding resources to increase the web site's capacity. For example, we might double our throughput capacity by doubling the number of available servers. However, not all web sites scale well, so we must test the web site before assuming additional hardware provides any capacity benefit. This section covers the basic terminology of optimization. Later chapters give more detail on how to apply one or all of these techniques to your web site.
Path Length: The Steps to Service a Request
In the previous section, we measured key activities in our brick and mortar store. In this section, we discuss some optimizations we might make now that we better understand the operation of the store. Let's look first at the checkout process, which takes one minute per customer to complete. So what happens during the checkout process? During this one minute, the clerk performs multiple actions, such as greeting the customer, entering her name and address, entering the price of the book into the cash register, taking money from the customer, providing change, and putting the book into a bag. Figure 1.18 outlines this list of activities, or path. Keep in mind that if we reduce checkout time from 1 minute to 30 seconds, we not only reduce the store's response time but also increase its overall throughput. For example, assuming five cashiers, throughput increases from five to ten customers per minute. (One customer every 30 seconds results in two customers per minute. Multiply by five cashiers, and you get an overall throughput of ten customers per minute.)
Figure 1.18 Reducing the checkout process path length
We decide to reduce the checkout time by reducing the path length of the checkout activity. The path length is the number of steps required to complete an activity, as well as the time each step takes. We reduce the path length either by (1) speeding up a step or (2) removing a step entirely. For example, as shown in Figure 1.18, the clerk types in the customer's name and address as part of the checkout process. The store never uses this information, so we remove this step, which reduces the number of steps required and removes ten seconds from every checkout. While entering the customer's name proved optional, entering the price of the customer's purchases remains mandatory. Entering the price takes 30 seconds, a significant part of our processing time, because each clerk manually enters the price from a tag on the item. Purchasing a bar-code scanner and automating this step speeds up the checkout process by 20 seconds. These adjustments cut the checkout time in half and doubled throughput. However, these long-term performance gains required an investment in a detailed understanding of the checkout process.
Much like the cashier, your web application code executes "steps" to complete each request. These steps form the path through your code for each request. Likewise, making the path through your code shorter or more efficient improves both response time and throughput. Of course, before improving web application code, you must first understand it. Code analysis requires both programming skill and time, particularly if you did not originally develop the software under analysis. While it does require an investment, code path optimization is usually the most effective technique for improving web site performance. Just as with the brick and mortar store, reducing a web application's code path involves two tactics: (1) Removing unnecessary code and (2) improving the performance of the remaining code. Again, the path is the execution path your code takes. A code path might include the following steps: initialization, reading data, comparing values, and writing updates.
In order to optimize the code, we want to remove any unnecessary code from the path a given request takes. For example, the code path for our checkout function may contain a loop. At each pass through the loop, several statements execute. (If we loop ten times, we execute the statements ten times each.) Sometimes we find a statement, such as a constant declaration or the like, that does not need to be repeated inside the loop. Moving this statement outside the loop reduces the steps we execute on our code path (in ten iterations, we reduce the path length by nine statements.)
However, just as with our brick and mortar store, we cannot always change the number of steps we execute. In these cases, we consider making the steps themselves faster. For example, we might pool database connections to make obtaining a connection in our code faster. If we frequently manipulate String objects, we might build our own lightweight parser rather than using StringTokenizer throughout our code. (See Chapter 4 for specific Java tuning advice.)
Bottleneck: Resource Contention under Load
Bottleneck refers to a resource contention inside your web site. As the name implies, a bottleneck chokes the flow of traffic through your site as requests contend for one or more limited resources.
In order to uncover bottlenecks, we need to look at our store as a whole rather than focusing on individual areas. Our store's response time and throughput rely on more than just the number of customers going through the checkout line.
For example, during Christmas, the manager receives many complaints from customers about long lines. However, the cashiers say they often wait for customers and never see more than one or two people waiting to check out, even during the busy part of the day. The manager decides to explore other areas of the store, such as the information desk, for long lines. As Figure 1.19 shows, he finds the line in the gift-wrap department. While customers make it through checkout quickly, they experience long delays if they want their purchases wrapped. The gift-wrap desk acts as a bottleneck on the throughput and response time of the entire store. Sufficient resources in one part of your store do not guarantee acceptable overall throughput and response time. The store operates as a system: A bottleneck in one resource often impacts the throughput and response time of the whole.
Figure 1.19 Example bottleneck: gift-wrap department
Removing bottlenecks is both an iterative and ongoing process. It is iterative because we may find that removing one bottleneck introduces a new one. For example, if the information desk became a bottleneck, improving that situation may put more pressure on the cashiers (people move more quickly to the cash registers rather than stalling at the information desk). After removing a bottleneck, be sure to reevaluate the system for new sources of contention. The process is ongoing because usage patterns change. Our gift-wrap department probably doesn't get a lot of traffic in August, but at Christmas it becomes a bottleneck because our customers' use of the store changes. Instead of purchasing books for themselves, they buy gifts for others. Constant monitoring allows you to identify usage shifts, and reallocate (or add) resources as needed to meet the new traffic patterns.
On-line stores frequently experience bottlenecks. We discussed earlier tuning the code paths throughout your web application. However, code path tuning focuses on single-user performance; bottlenecks emerge only when we put the system under load. While code path tuning gives us more efficient code, we also must study the system under load to look for these resource sharing issues.
Technically, we find a bottleneck wherever the code processing a request must wait for a resource. For example, when your web site receives a request to purchase a book, the code path to handle this request probably includes a database update. If the database update locks rows needed by a request from a second user who wants to purchase a different book, the processing of the second user's request waits until the desired rows become available. This locking design works great in a single-user test, giving us problems only when we add load (additional users). This example again underscores the importance of concurrency and load testing prior to releasing your web site.
While poor database sharing strategies often develop into serious bottlenecks, you may also uncover bottlenecks associated with sharing hardware resources. Each request requires CPU and memory resources from your web site. If you handle many simultaneous requests, these hardware resources may quickly deplete, forcing request processing to wait. CPU and memory often become bottlenecks under load. For more information on identifying and resolving bottlenecks, see Chapter 13. Bottlenecks often produce unusual symptoms, so check this chapter for more details on how to spot resource sharing problems.
Scaling: Adding Resources to Improve Performance
Scaling a web site simply means adding hardware to improve its performance. Of course, this only works if your hardware resources, such as CPU and memory, present the largest bottleneck in the web site's system. If your code spends most of its time waiting for CPU resource or memory, adding more hardware improves performance. Regrettably, web site teams often upgrade or add hardware only to find little or no performance benefits after they finish the upgrade. We recommend tuning the web site and performing at least fundamental bottleneck analysis before adding hardware. Dedicate some time to making your web applications more efficient. Most important, find and eliminate any major bottlenecks, such as database locking issues. If you perform these optimization steps well, your hardware resources eventually become your only remaining bottlenecks. At this point, scaling provides real performance benefits.
In a perfect world, scaling does not affect your application. You simply add new machines, or increase memory, or upgrade your CPUs. However, adding resources often changes the dynamics of your application. For example, if your application used to run on a single server, adding a new server may introduce remote communications between the application and a naming service or a database. Because of these unknowns, we recommend performing a scalability test prior to scaling your web site. This test tells you how well the web site performs after you add new hardware and whether the new configuration meets your performance expectations. Also, before purchasing new equipment, decide which form of scaling works best for your web application. Most scaling focuses on two basic approaches: vertical scaling and horizontal scaling.
In scaling, we increase throughput by handling more customers in parallel. Inside a bookstore, this means additional cash registers and cashiers. Figure 1.20 shows five additional cash registers, giving the store a total of ten registers. This raises the throughput of the store to a maximum of 20 customers per minute (two customers per minute per cashier multiplied by ten cashiers, assuminghere and in the rest of this chapter's examplesour improved response time of 30 seconds per customer). We call adding resources to our existing store vertical scaling. We relieve resource contention within our physical store by increasing the constrained resource (in this case, cashiers and cash registers).
Figure 1.20 Traditional store handling more customers in parallel. Each cashier handles two customers per minute.
On-line stores also use vertical scaling to process more requests in parallel. Vertical scaling at a web site helps us get the most out of each available server machine; it occurs at both the hardware and software levels. From a hardware perspective, we might increase the processors (CPUs) for each server. For example, we might upgrade a two CPU machine (also known as a two-way) to a four CPU machine (a four-way). In theory, this gives the machine twice the processing capabilities. In reality, we rarely obtain a true performance doubling with this technique. Our ability to use extra CPUs inside an existing server depends on several factors:
The existing servers must be upgradeable (they permit us to add more CPUs).
The operating system and the Java Virtual Machine (JVM) support symmetric multiprocessing (SMP). Unless your OS and JVM support SMP, they cannot distribute work to multiple processors, thus making it difficult to take advantage of additional CPUs. (Happily, most newer operating systems and JVMs support SMP to a point.)
SMP is an important factor in vertical scaling. (In fact, you may hear the term SMP scaling as another name for vertical scaling.) As we mentioned, most modern operating systems and JVMs support SMP scaling, but only to a point. On extremely large servers (12 processors or more), a single JVM may not fully exercise all of the processors available. With such servers, we turn to vertical software scaling. Rather than one JVM, we execute two or more until all CPUs on the machine become fully engaged. Running multiple JVM processes allows us to better utilize all the resources of these large servers. (Chapters 2 and 11 discuss vertical scaling in more detail.)
At some point, the bookstore may physically exhaust the available floor space and be unable to add more cashiers. When this happens, we cannot continue to increase the capacity of the store, even though the store does not meet the demands of our peak customer loads.
In the brick and mortar world, one of the following three alternatives typically transpires in this situation:
The store relocates to a larger facility.
A competitor opens a store nearby and takes customers from your store.
Your company opens a second store some distance away and customers shop at the most convenient location.
In any case, we expand our physical resources to meet customer demand. Either we provide more facilities, or our competition does. By opening the second store, we anticipate increasing our throughput, or at least reducing our response time, by distributing the customers between multiple stores.
For example, Figure 1.21 shows our throughput if a second store opens with ten checkout lines and a throughput of 20 customers per minute. If the first store continues to handle ten customers per minute, the overall throughput increases to 30 customers per minute. We call this horizontal scaling. We cannot continue to grow inside a single store (vertical scaling), so we add more stores.
Figure 1.21 Scaling with multiple bookstores. Each cashier handles two customers per minute.
When it comes to horizontal scaling, on-line stores beat traditional stores hands down. Web sites grow by adding servers, not by building new bookstores. It's much easier and cheaper for a web site to grow to meet explosive customer demand (and fend off the competition) than for a traditional store.
Horizontal scaling in the web world means adding duplicate resources (servers, networks, and the like) to handle increasing load. If you employ the clever technique of load balancing, your users never know how many servers make up your web site. Your URL points to the load balancer, which picks a server from your web site's server pool to respond to the user's request. Load balancing algorithms vary from the simplistic (round-robin) to the highly sophisticated (monitoring server activity, and picking the least busy server to handle the next request). Because the load balancer interacts with a pool of servers, known as a cluster, we sometimes call horizontal scaling cluster scaling. Figure 1.22 show a small cluster of two servers. The load balancer sends incoming requests to Server A or Server B for resolution. This cluster also allows the web site to support double the throughput of either server acting alone. (See Chapter 3 for more details on horizontal scaling.)
Figure 1.22 Horizontal scaling example
In an ideal world, we experience linear scaling when we increase our web site's capacity. Linear scaling means that as our resources double, so does our throughput. In the real world, perfect linear scaling seldom occurs. However, horizontal scaling techniques come closest to ideal linear scaling. Vertical scaling, while often improving overall throughput, rarely achieves true linear scaling.
In Figure 1.20, doubling the number of cash registers and cashiers doubled the throughput in our store. This is an example of linear scaling. In reality, linear scaling often depends on complex interactions among various resources. For example, in order to increase our throughput, we need more customers in the store. This requires also increasing a range of other resources from the information desk personnel to parking spaces. If we fail to grow any of these resources, we probably won't experience true linear throughput growth.
Scaling a web site also requires taking into consideration all required resources. You may want to double your throughput by upgrading your two-processor server to a four-processor machine. However, as you double your processors, you must also appropriately scale other resources such as memory and disk space.
Frequently, adding capacity also creates more complex concurrency issues within the server. More load often exposes new resource bottlenecks within your application. Again, consider testing your web site with its new capacity to determine the actual benefit of hardware upgrades.
Considerations for scaling your web site and setting appropriate expectations with respect to linear scaling are discussed in more detail in Chapter 11.