3.4 Identifying Bottlenecks
With all these factors in mind, the goal of data interpretation and analysis is to determine if there is a particular bottleneck. Once a particular bottleneck is identified, tuning to improve performance can be initiated. Characteristics of bottlenecks are:
A particular resource is saturated.
The queue for the resource grows over time.
Other resources may be starved as a result.
Response time is not satisfactory.
3.4.1 Resource Saturation
Resource saturation is often thought of as 100% utilization. The entire resource is consumed. Additional requests for the resource are required to wait. However, this is not sufficient as proof of a bottleneck. Two examples reinforce this point.
Disk utilization is determined by periodically monitoring whether there are any requests in the queue for each disk drive. The total number of requests in the queue is not factored into the disk utilization metric. Although 100% disk utilization is an indicator of a busy disk, it does not mean that the disk cannot support more I/Os.
The idle loop in the kernel is used to compute CPU utilization. On a workstation executing a compute-intensive application, CPU utilization is probably 100% for a long period of time. However, if no other processes are waiting for the CPU, and response time is satisfactory, there is no bottleneck.
There are utilization metrics for the CPU, memory, network, and disk resources.
3.4.2 Growing Resource Queue
Resource queue growth over time is a strong indicator of a bottleneck, in conjunction with the utilization metric. The queue for a resource tends to grow when demand increases and when there is not enough resource available to keep up with the requests. It is easier to develop rules of thumb for queue metrics than for utilization metrics.
3.4.3 Resource Starvation
Resource starvation can occur when one resource is saturated and another resource depends upon it. For instance, in a memory-bound environment, CPU cycles are needed to handle page faults. This leaves less of the CPU resource for application use.
3.4.4 Unsatisfactory Response Time
Unsatisfactory response time is sometimes the final arbiter of whether or not a bottleneck exists. The CPU example given above demonstrates this point. If no other processes are waiting for the CPU, and the application produces the results in a satisfactory time period, then there is no bottleneck. However, if the CPU is saturated and response or throughput expectations are not being met, then a CPU bottleneck exists.
3.4.5 Bottleneck Summary
Multiple metrics should always be reviewed to validate that a bottleneck exists. For example, the CPU utilization metric is a measure of saturation. The run queue metric is a measure of queue growth. Both metrics are needed to establish that a CPU bottleneck is present. Multiple tools should be used to validate that there is not a problem with a particular tool yielding misleading data. Consider the following analogy.
A three-lane highway is built to accommodate a certain maximum traffic flow, for example, twenty vehicles per minute distributed across the three lanes. This would be considered 100% utilization. Additional traffic entering the highway would be forced to wait at the entrance ramp, producing a queue. Suppose that a tractor/trailer overturns and blocks two lanes of the highway. Now, the same amount of traffic must funnel through the one remaining open lane. With the same number of cars on the road, the highway is now more than saturated. The queue builds at the entrance ramps. Resource starvation occurs, since two lanes are closed. The time it takes to travel a given distance on the highway now becomes unacceptably long.
Once a bottleneck is identified, it can possibly be alleviated. However, alleviating one bottleneck may result in the emergence of a new one to investigate.