Home > Articles > Hardware > Upgrading & Repairing

Upgrading and Repairing PCs Tip #2: Comparing Processor Performance

  • Print
  • + Share This
How can two processors that run at the same clock rate perform differently, with one running “faster” than the other? In this excerpt from the 22nd edition of Scott Mueller's Upgrading and Repairing PCs, Scott discusses how to compare processor performance.

Find more tips from Upgrading and Repairing PCs here.

From the book

A common misunderstanding about processors is their different speed ratings. This section covers processor speed in general.

A computer system’s clock speed is measured as a frequency, usually expressed as a number of cycles per second. A crystal oscillator controls clock speeds using a sliver of quartz sometimes housed in what looks like a small tin container. Newer systems include the oscillator circuitry in the motherboard chipset, so it might not be a visible separate component on newer boards. As voltage is applied to the quartz, it begins to vibrate (oscillate) at a harmonic rate dictated by the shape and size of the crystal (sliver). The oscillations emanate from the crystal in the form of a current that alternates at the harmonic rate of the crystal. This alternating current is the clock signal that forms the time base on which the computer operates. A typical computer system runs millions or billions of these cycles per second, so speed is measured in megahertz or gigahertz. (One hertz is equal to one cycle per second.) An alternating current signal is like a sine wave, with the time between the peaks of each wave defining the frequency (see Figure 3.1).

A single cycle is the smallest element of time for the processor. Every action requires at least one cycle and usually multiple cycles. To transfer data to and from memory, for example, a processor such as the Pentium 4 needs a minimum of three cycles to set up the first memory transfer and then only a single cycle per transfer for the next three to six consecutive transfers. The extra cycles on the first transfer typically are called wait states. A wait state is a clock tick in which nothing happens. This ensures that the processor isn’t getting ahead of the rest of the computer.

Figure 3.1 Alternating current signal showing clock cycle timing.

◊◊ See the Chapter 6 section “Memory Modules,” p. 375.

The time required to execute instructions also varies:

  • 8086 and 8088—The original 8086 and 8088 processors take an average of 12 cycles to execute a single instruction.
  • 286 and 386—The 286 and 386 processors improve this rate to about 4.5 cycles per instruction.
  • 486—The 486 and most other fourth-generation Intel-compatible processors, such as the AMD 5x86, drop the rate further, to about 2 cycles per instruction.
  • Pentium/K6—The Pentium architecture and other fifth-generation Intel-compatible processors, such as those from AMD and VIA/Cyrix, include twin instruction pipelines and other improvements that provide for operation at one or two instructions per cycle.
  • P6/P7 and newer—Sixth-, seventh-, and newer-generation processors can execute as many as three or more instructions per cycle, with multiples of that possible on multicore processors.

Different instruction execution times (in cycles) make comparing systems based purely on clock speed or number of cycles per second difficult. How can two processors that run at the same clock rate perform differently, with one running “faster” than the other? The answer is simple: efficiency.

The main reason the 486 is considered fast relative to the 386 is that it executes twice as many instructions in the same number of cycles. The same thing is true for a Pentium; it executes about twice as many instructions in a given number of cycles as a 486. The Pentium II and III are about 50% faster than an equivalent Pentium at a given clock speed because they can execute about that many more instructions in the same number of cycles.

Unfortunately, after the Pentium III, it becomes much more difficult to compare processors on clock speed alone. This is because the different internal architectures make some processors more efficient than others, but these same efficiency differences result in circuitry that is capable of running at different maximum speeds. The less efficient the circuit, the higher the clock speed it can attain, and vice versa. Another difference is that later processors include varying sizes of L2 and L3 cache.

The final difference in modern processors is the use of multiple processor cores. High-end processors such as the Intel Core i7-5960X and the AMD FX-9590 include eight processor cores. The Intel Core i7-5960X also features 20MB of cache RAM, while the AMD FX-9590 includes 16MB of cache RAM. Not surprisingly, increasng the number of processor cores can offer a significant boost to overall processor performance.

With single-core processors, one of the biggest factors in efficiency is the number of stages in the processor’s internal pipeline. The Pentium III and AMD Athlon and Athlon XP had 10 stages, while the Pentium 4 Prescott featured 31 stages.

A deeper pipeline effectively breaks down instructions into smaller microsteps, which allows overall higher clock rates to be achieved using the same silicon technology. However, this also means that overall fewer instructions can be executed in a single cycle as compared to processors with shorter pipelines. This is because, if a branch prediction or speculative execution step fails (which happens fairly frequently inside the processor as it attempts to line up instructions in advance), the entire pipeline has to be flushed and refilled. Thus, if you compared a modern Intel Core i7 or AMD FX to a Pentium 4 running at the same clock speed, the Core i7 and FX would execute more instructions in the same number of cycles.

Although it is a disadvantage to have a deeper pipeline in terms of instruction efficiency, processors with deeper pipelines can run at higher clock rates on a given manufacturing technology. Thus, even though a deeper pipeline might be less efficient, it’s possible for the higher resulting clock speeds to make up for it. The deeper 20- or 31-stage pipeline in the P4 architecture enabled significantly higher clock speeds to be achieved using the same silicon die process as other chips. As an example, the 0.13-micron process Pentium 4 ran up to 3.4GHz, whereas the Athlon XP topped out at 2.2GHz (3200+ model) in the same introduction timeframe. Even though the Pentium 4 executed fewer instructions in each cycle, the overall higher cycling speeds made up for the loss of efficiency; the higher clock speed versus the more efficient processing effectively cancelled each other out.

Unfortunately, the deep pipeline combined with high clock rates did come with a penalty in power consumption, and therefore heat generation as well. Eventually, it was determined that the power penalty was too great, causing Intel to drop back to a more efficient design in its Core microarchitecture processors. Rather than solely increase clock rates, performance was increased by combining multiple processors into a single chip, thus improving the effective instruction efficiency even further. This began the push toward multicore processors.

One thing is clear in all of this confusion: Raw clock speed is not a good way to compare chips, unless they are from the same manufacturer, model, and family.

To fairly compare various CPUs at different clock speeds, in 1992 Intel devised a specific series of benchmarks called the Intel Comparative Microprocessor Performance (iCOMP) index. The iCOMP index benchmark was released in original iCOMP, iCOMP 2.0, and iCOMP 3.0 versions.

The iCOMP 2.0 index was derived from several independent benchmarks as an indication of relative processor performance. The benchmarks balance integer with floating-point and multimedia performance. The iCOMP 3.0 index was based on processor performance in productivity, multimedia, 3D, and the Internet.

The iCOMP 2.0 index comparison for Pentium 75 through Pentium II 450 is available in Chapter 3 of Upgrading and Repairing PCs, 19th Edition, found in its entirety on the disc packaged with this book.

Until it became controversial, Intel and AMD both rated their latest processors using the commercially available BAPCo SYSmark benchmark suites BAPCo, the Business Applications Performance Corporation, is a non-profit consortium that develops benchmark applications for PC and tablet testing. SYSmark is an application-based benchmark that runs various scripts to do actual work using popular applications. Many companies use it to test and compare PC systems and components. The SYSmark benchmark is a much more modern and real-world benchmark than the iCOMP benchmark Intel previously used, and because it is available to anybody, the results can be independently verified. You can purchase the SYSmark benchmark software from BAPCo at http://www.bapco.com. SYSmark 2012 (the current version is SYSmark 2014) became controversial because AMD, NVIDIA, and VIA resigned from BAPco in 2011. These companies withdrew from BAPco because they believe that this version of the SYSmark benchmark is optimized for Intel processors rather than being processor neutral. AMD’s recent processor designs have emphasized the role of the integrated GPU and heterogenous computing (the use of both CPU and GPU for calculations) in its consumer-level designs, while Intel, though its recent processors include integrated GPUs, stresses CPU performance in its designs. Meanwhile, VIA emphasizes ultra-low power consumption and optimization for basic computer tasks. As several technology columnists have noted, Intel, AMD, and VIA are no longer pursuing the same goals in processor design, so a common benchmark might no longer make much sense.

Despite the controversy, Anandtech, a leading technology website, continues to use SYSmark 2014 to rate the latest AMD and Intel processors. You can view the benchmark results for older processors using SYSmark 2012 (the previous benchmark) and many of the most recent processors using SYSmark 2014 at http://www.anandtech.com/bench/CPU/. Tom’s Hardware uses PCMark as one of its benchmark apps and provides results at http://www.tomshardware.com/charts/processors,6.html.

Regardless of the benchmark apps you rely on, focus on the scores for specific scenarios that match the work you plan to do with the computers for which you are responsible. For example, SYSmark 2014 focuses on performance using Adobe photo and video editing apps, Microsoft Office apps, and other popular utilities. See http://bapco.com/products/sysmark-2014 and click the Applications tab for the complete list. PCMark 8’s Applications benchmark measures the performance of the most common Adobe Creative Suite or Creative Cloud apps and Microsoft Office 2010 or 2013 apps. See the PCMark 8 Technical Guide (PDF) available at http://www.futuremark.com/support/guides for details.

  • + Share This
  • 🔖 Save To Your Account