BluePrint for Benchmarking Success
This article provides guidelines for anyone who is assembling a benchmark and expecting to get information back that will help make a decision on which computer to buy. Benchmarking is expensive in both currency and people time. If the benchmark is not well-defined, it is a waste of time for both the customer and the computer vendor.
When defining a benchmark, the person assembling it must consider many things to complete the benchmark successfully. This article groups these items in four categoriesgoals, practicality, commitment, and rules.
This article contains little specific technical information. The information presented is not Sun-specific. It discusses how the customer can be successful at benchmarking any vendor's hardware and software.
The audience for this article is project managers who are charged with benchmarking their compute environment on one or more computer vendor's platforms and Sun™ system engineers who are working with a customer or prospect to define a benchmark.
This article covers the following topics:
Define the Rules
Types of Runs
The SituationYou have heard the sales pitches. You've read the marketing brochures. You've explored the vendor's website. You've checked the industry standard benchmarks. You've talked with the references. Now you have decided that a benchmark is the next step to help determine the best vendor for your computing needs. How are you going to do it?
The SolutionThis is a complex question that must be answered specifically for your compute environment. While you are planning your benchmark, you must determine the goals, practicality, commitment, and the rules. This article provides you with many items to consider to make the benchmark a success. Most of these items are non-technical, but some technical issues are mentioned. The differences between HPTC (High Performance and Technical Computing) and Commercial (database and related applications) benchmarking are stated where needed.
The results of the benchmark are not the only criteria for selecting a computer; they are just one of several criteria. What new information do you expect to learn from this exercise? What is the weight that this benchmark holds compared to other criteria in the decision making?
Why are you looking for new compute resources? Are your users complaining about response time? Are engineers idle waiting for analysis to complete? Are you expecting new projects to start up? What do you expect the workload to look like one month after installation? Six months? One year? Can the benchmark be used to simulate the workload you expect to see 18-24 months from now?
The benchmark must reflect the key usage of the computing resource to be filled with this purchase. Typically, you will not benchmark word processing or email applications on a workstation or Sun Ray™ appliance server. That information is available in various published white papers. But you may want to measure the response time of your particular application as it accesses your database on a remote server. Or you may want to determine if the typical engineering design for your organization can be analyzed overnight. If you multiply this by the number of users, will it represent your production environment?
Pretend that you are at the end of the benchmark. Are the results measurable? Can they be readily compared between vendors? A set of basic metrics can include and certainly are not limited to:
Raw CPU speed (Megaflops)
Single CPU performance of code (time)
Parallel performance (scalability, elapsed time)
Elapsed time per job in a throughput run
Number of jobs per hour per CPU in a throughput run
Elapsed time to update the database
Transactions per second
Response time for 10 simultaneous DSS queries
Elapsed time of a large batch run
Parallel performance of a batch query (scalability, elapsed time)
Now review your current compute environment and compose your benchmark with that goal in minda comparable measure. Basically, the most important issue is to reduce the number of measurements while still providing meaningful information. Too many measurements will make any comparison too difficult and non-deterministic.