Home > Articles

  • Print
  • + Share This
  • 💬 Discuss

Measuring Scalability

Scalability is almost as easy to measure as performance is. We know that scalability refers to an application's ability to accommodate rising resource demand gracefully, without a noticeable loss in QoS. To measure scalability, it would seem that we need to calculate how well increasing demand is handled. But how exactly do we do this?

Let's consider a simple example. Suppose that we deploy an online banking application. One type of request that clients can make is to view recent bank transactions. Suppose that when a single client connects to the system, it takes a speedy 10 ms of server-side time to process this request. Note that network latency and other client or network issues affecting the delivery of the response will increase the end-to-end response time; for example, maybe end-to-end response time will be 1,000 ms for a single client. But, to keep our example simple, let's consider just server-side time.

Next, suppose that 50 users simultaneously want to view their recent transactions, and that it takes an average of 500 ms of server-side time to process each of these 50 concurrent requests. Obviously, our server-side response time has slowed because of the concurrency of demands. That is to be expected.

Our next question might be: How well does our application scale? To answer this, we need some scalability metrics, such as the following:

  • Throughput—the rate at which transactions are processed by the system

  • Resource usage—the usage levels for the various resources involved (CPU, memory, disk, bandwidth)

  • Cost—the price per transaction

A more detailed discussion of these and other metrics can be found in Scaling for E-Business: Technologies, Models, Performance, and Capacity Planning (Menasce and Almeida, 2000). Measuring resource use is fairly easy; measuring throughput and cost requires a bit more explanation.

What is the throughput in both of the cases described, with one user and with 50 users? To calculate this, we can take advantage of something called Little's law, a simple but very useful measure that can be applied very broadly. Consider the simple black box shown in Figure 1–3. Little's law says that if this box contains an average of N users, and the average user spends R seconds in that box, then the throughput X of that box is roughly

X = N/R.

Little's law can be applied to almost any device: a server, a disk, a system, or a Web application. Indeed, any system that employs a notion of input and output and that can be considered a black box is a candidate for this kind of analysis.

Figure 1–3 Little's law

Armed with this knowledge, we can now apply it to our example. Specifically, we can calculate application throughput for different numbers of concurrent users. Our N will be transactions, and since R is in seconds, we will measure throughput in terms of transactions per second (tps). At the same time, let's add some data to our banking example. Table 1–3 summarizes what we might observe, along with throughputs calculated using Little's law. Again, keep in mind that this is just an example; I pulled these response times from thin air. Even so, they are not unreasonable.

Based on these numbers, how well does our application scale? It's still hard to say. We can quote numbers, but do they mean anything? Not really. The problem here is that we need a comparison—something to hold up against our mythical application so we can judge how well or how poorly our example scales.

Table 1–3: Sample Application Response and Throughput Times Average Response

Concurrent Users

Time (ms)

Throughput (tps)

1

10

100

50

500

100

100

1200

83.333

150

2200

68.182

200

4000

50


One good comparison is against a "linearly scalable" version of our application, by which I mean an application that continues to do exactly the same amount of work per second no matter how many clients use it. This is not to say the average response time will remain constant—no way. In fact, it will increase, but in a perfectly predictable manner. However, our throughput will remain constant. Linearly scalable applications are perfectly scalable in that their performance degrades at a constant rate directly proportional to their demands.

If our application is indeed linearly scalable, we'll see the numbers shown in Table 1–4. Notice that our performance degrades in a constant manner: The average response time is ten times the number of concurrent users. However, our throughput is constant at 100 tps.

To understand this data better, and how we can use it in a comparison with our original mythical application results, let's view their trends in graph form. Figure 1–4 illustrates average response time as a function of the number of concurrent users; Figure 1–5 shows throughput as a function of the number of users. These graphs also compare our results with results for an idealized system whose response time increases linearly with the number of concurrent users.

Figure 1–4 Scalability from the client's point of view

Figure 1–5 Scalability from the server's point of view

Figure 1–4 shows that our application starts to deviate from linear scalability after about 50 concurrent users. With a higher number of concurrent sessions, the line migrates toward an exponential trend. Notice that I'm drawing attention to the nature of the line, not the numbers to which the line corresponds. As we discussed earlier, scalability analysis is not the same as performance analysis; (that is, a slow application is not necessarily unable to scale). While we are interested in the average time per request from a performance standpoint, we are more interested in performance trends with higher concurrent demand, or how well an application deals with increased load, when it comes to scalability.

Figure 1–5 shows that a theoretical application should maintain a constant number of transactions per second. This makes sense: Even though our average response time may increase, the amount of work done per unit time remains the same. (Think of a kitchen faucet: It is reasonable that even though it takes longer to wash 100 dishes than to wash one, the number of dishes per second should remain constant.) Notice that our mythical application becomes less productive after 50 concurrent users. In this sense, it would be better to replicate our application and limit the number of concurrent users to 50 if we want to achieve maximum throughput.

Table 1–4: Linearly Scalable Application Response and Throughput Times Average Response

Concurrent Users

Time (ms)

Throughput (tps)

1

10

100

50

500

100

100

1000

100

150

1500

100

200

2000

100


Analyzing response time and throughput trends, as we have done here, is important for gauging the scalability of your system. Figure 1–4 and 1–5 show how to compare an application and its theoretical potential. Figure 1–4 illustrates the efficiency from the client's point of view, where the focus is on latency; Figure 1–5 shows application efficiency from the server's point of view, where the focus is on productivity (work done per time unit).

  • + Share This
  • 🔖 Save To Your Account

Discussions

comments powered by Disqus