- Performance and Disease
- Business Requirements
- Medical Analogues
- Lab Tests and Record Keeping
- Traps and Pitfalls
- Where Does the Time Go?
- Diagnostic Strategies
- Selected Tools and Techniques
- Third-Party URLs
- About the Author
- Ordering Sun Documents
- Accessing Sun Documentation Online
Complex systems can produce a veritable flood of performance-related data. Systems administrators are naturally compelled to pore over these reams of data looking to find something wrong or to discover opportunities for tuning or for process improvement. All too often, the actual business performance metrics of the system are not formally included in the data, and not included in the process.
While subsystem tuning often leads to improved business throughput, it can occasionally backfire. For example, if improving the efficiency of one part of a system increases the burden on some other part of the system to the point where it becomes unstable, chaotic behavior can occur, possibly cascading through the entire system.
The best measure of any system change is its impact on business requirements; therefore, the principle objective of all tuning should be viewed in terms of business requirements. Business requirements may be expressed in terms of service level agreements (SLAs) or critical-to-quality3 (CTQ) measures. SLAs and CTQs may comprise the substance of contractual agreements that feature penalties for non-conformance.
For the performance analyst, it is adequate to view business requirements in three principle categories:
Performance involves the primary metrics of how well a system is doing its job (for example, transactions per unit time, time per transaction, or time required to complete a specific set of tasks).
It is not enough for an average response time to be sub-second if there are too many outliers or if outliers are long enough to result in complaints of user-perceived outages. Long-running tasks are often required to predictably complete in their prescribed operational "window".
Given an unlimited budget for time, equipment, and people, most business goals can be met4. Computer systems make good business sense only when they deliver the expected business value within budgets.
These three factors correlate directly with the famous engineering axiom: "Good, Reliable, Cheappick any two." Additional business metrics of headroom and availability are often cited, but these are actually only variants of these three principle business metrics.
Headroom is the concept of having some extra performance capacity to provide confidence that a system's performance will remain adequate throughout its intended design life. Business management must have confidence not only that a system will do what it was designed to do, but also that it will be able to accommodate forecasted business growth. In addition, systems must not grossly malfunction when driven beyond their nominal design point. Encountering the need for unbudgeted upgrades is bad. Headroom is rarely indicated with any accuracy by simple metrics such as "percent idle CPU". The most effective means of assessing headroom is by test-to-scale and test-to-fail techniques.
Availability issues can be viewed as either performance or predictability issues, where performance drops to zero. Availability can also encompass performance issues such as cluster failover or reboot time.
In practice, making the right tradeoffs in these categories is the key to business success. Optimizing non-business metrics is not necessarily folly, but it can be wasteful to optimize factors that produce no gains in business terms5.