Home > Articles > Hardware

  • Print
  • + Share This
This chapter is from the book

3.2 The G5: Lineage and Roadmap

As we saw earlier, the G5 is a derivative of IBM's POWER4 processor. In this section, we will briefly look at how the G5 is similar to and different from the POWER4 and some of the POWER4's successors. This will help us understand the position of the G5 in the POWER/PowerPC roadmap. Table 3–2 provides a high-level summary of some key features of the POWER4 and POWER5 lines.

Table 3–2. POWER4 and Newer Processors

 

POWER4

POWER4+

POWER5

POWER5+

Year introduced

2001

2002

2004

2005

Lithography

180 nm

130 nm

130 nm

90 nm

Cores/chip

2

2

2

2

Transistors

174 million

184 million

276 million/chip [a]

276 million/chip

Die size

415 mm2

267 mm2

389 mm2/chip

243 mm2/chip

LPAR [b]

Yes

Yes

Yes

Yes

SMT [c]

No

No

Yes

Yes

Memory controller

Off-chip

Off-chip

On-chip

On-chip

Fast Path

No

No

Yes

Yes

L1 I-cache

2x64KB

2x64KB

2x64KB

2x64KB

L1 D-cache

2x32KB

2x32KB

2x32KB

2x32KB

L2 cache

1.41MB

1.5MB

1.875MB

1.875MB

L3 cache

32MB+

32MB+

36MB+

36MB+

3.2.1 Fundamental Aspects of the G5

All POWER processors listed in Table 3–2, as well as the G5 derivatives, share some fundamental architectural features. They are all 64-bit and superscalar, and they perform speculative, out-of-order execution. Let us briefly discuss each of these terms.

3.2.1.1 64-bit Processor

Although there is no formal definition of what constitutes a 64-bit processor, the following attributes are shared by all 64-bit processors:

  • 64-bit-wide general-purpose registers
  • Support for 64-bit virtual addressing, although the physical or virtual address spaces may not use all 64 bits
  • Integer arithmetic and logical operations performed on all 64 bits of a 64-bit operand—without being broken down into, say, two operations on two 32-bit quantities

The PowerPC architecture was designed to support both 32-bit and 64-bit computation modes—an implementation is free to implement only the 32-bit subset. The G5 supports both computation modes. In fact, the POWER4 supports multiple processor architectures: the 32-bit and 64-bit POWER; the 32-bit and 64-bit PowerPC; and the 64-bit Amazon architecture. We will use the term PowerPC to refer to both the processor and the processor architecture. We will discuss the 64-bit capabilities of the 970FX in Section 3.3.12.1.

3.2.1.2 Superscalar

If we define scalar to be a processor design in which one instruction is issued per clock cycle, then a superscalar processor would be one that issues a variable number of instructions per clock cycle, allowing a clock-cycle-per-instruction (CPI) ratio of less than 1. It is important to note that even though a superscalar processor can issue multiple instructions in a clock cycle, it can do so only with several caveats, such as whether the instructions depend on each other and which specific functional units they use. Superscalar processors typically have multiple functional units, including multiple units of the same type.

3.2.1.3 Speculative Execution

A speculative processor can execute instructions before it is determined whether those instructions will need to be executed (instructions may not need to be executed because of a branch that bypasses them, for example). Therefore, instruction execution does not wait for control dependencies to resolve—it waits only for the instruction's operands (data) to become available. Such speculation can be done by the compiler, the processor, or both. The processors in Table 3–2 employ in-hardware dynamic branch prediction (with multiple branches "in flight"), speculation, and dynamic scheduling of instruction groups to achieve substantial instruction-level parallelism.

3.2.1.4 Out-of-Order Execution

A processor that performs out-of-order execution includes additional hardware that can bypass instructions whose operands are not available—say, due to a cache miss that occurred during register loading. Thus, rather than always executing instructions in the order they appear in the programs being run, the processor may execute instructions whose operands are ready, deferring the bypassed instructions for execution at a more appropriate time.

3.2.2 New POWER Generations

The POWER4 contains two processor cores in a single chip. Moreover, the POWER4 architecture has features that help in virtualization. Examples include a special hypervisor mode in the processor, the ability to include an address offset when using nonvirtual memory addressing, and support for multiple global interrupt queues in the interrupt controller. IBM's Logical Partitioning (LPAR) allows multiple independent operating system images (such as AIX and Linux) to be run on a single POWER4-based system simultaneously. Dynamic LPAR (DLPAR), introduced in AIX 5L Version 5.2, allows dynamic addition and removal of resources from active partitions.

The POWER4+ improves upon the POWER4 by reducing its size, consuming less power, providing a larger L2 cache, and allowing more DLPAR partitions.

The POWER5 introduces simultaneous multithreading (SMT), wherein a single processor supports multiple instruction streams—in this case, two—simultaneously.

The POWER5 supports other important features such as the following:

  • 64-way multiprocessing.
  • Subprocessor partitioning (or micropartitioning), wherein multiple LPAR partitions can share a single processor. [19] Micropartitioned LPARs support automatic CPU load balancing.
  • Virtual Inter-partition Ethernet, which enables a VLAN connection between LPARs—at gigabit or even higher speeds—without requiring physical network interface cards. Virtual Ethernet devices can be defined through the management console. Multiple virtual adapters are supported per partition, depending on the operating system.
  • Virtual I/O Server Partition, [20] which provides virtual disk storage and Ethernet adapter sharing. Ethernet sharing connects virtual Ethernet to external networks.
  • An on-chip memory controller.
  • Dynamic firmware updates.
  • Detection and correction of errors in transmitting data courtesy of specialized circuitry.
  • Fast Path, the ability to execute some common software operations directly within the processor. For example, certain parts of TCP/IP processing that are traditionally handled within the operating system using a sequence of processor instructions could be performed via a single instruction. Such silicon acceleration could be applied to other operating system areas such as message passing and virtual memory.

Besides using 90-nm technology, the POWER5+ adds several features to the POWER5's feature set, for example: 16GB page sizes, 1TB segments, multiple page sizes per segment, a larger (2048-entry) translation lookaside buffer (TLB), and a larger number of memory controller read queues.

The POWER6 is expected to add evolutionary improvements and to extend the Fast Path concept even further, allowing functions of higher-level software—for example, databases and application servers—to be performed in silicon. [21] It is likely to be based on a 65-nm process and is expected to have multiple ultra-high-frequency cores and multiple L2 caches.

3.2.3 The PowerPC 970, 970FX, and 970MP

The PowerPC 970 was introduced in October 2002 as a 64-bit high-performance processor for desktops, entry-level servers, and embedded systems. The 970 can be thought of as a stripped-down POWER4+. Apple used the 970—followed by the 970FX and the 970MP—in its G5-based systems. Table 3–3 contains a brief comparison of the specifications of these processors. Figure 3–3 shows a pictorial comparison. Note that unlike the POWER4+, whose L2 cache is shared between cores, each core in the 970MP has its own L2 cache, which is twice as large as the L2 cache in the 970 or the 970FX.

Table 3–3. POWER4+ and the PowerPC 9xx

 

POWER4+

PowerPC 970

PowerPC 970FX

PowerPC 970MP

Year introduced

2002

2002

2004

2005

Lithography

130 nm

130 nm

90 nm [a]

90 nm

Cores/chip

2

1

1

2

Transistors

184 million

55 million

58 million

183 million

Die size

267 mm2

121 mm2

66 mm2

154 mm2

LPAR

Yes

No

No

No

SMT

No

No

No

No

Memory controller

Off-chip

Off-chip

Off-chip

Off-chip

Fast Path

No

No

No

No

L1 I-cache

2x64KB

64KB

64KB

2x64KB

L1 D-cache

2x32KB

32KB

32KB

2x32KB

L2 cache

1.41MB shared [b]

512KB

512KB

2x1MB

L3 cache

32MB+

None

None

None

VMX (AltiVec [c] )

No

Yes

Yes

Yes

PowerTune [d]

No

No

Yes

Yes

Another noteworthy point about the 970MP is that both its cores share the same input and output busses. In particular, the output bus is shared "fairly" between cores using a simple round-robin algorithm.

singhfig3-3.gif

Figure 3–3 The PowerPC 9xx family and the POWER4+

3.2.4 The Intel Core Duo

In contrast, the Intel Core Duo processor line used in the first x86-based Macintosh computers (the iMac and the MacBook Pro) has the following key characteristics:

  • Two cores per chip
  • Manufactured using 65-nm process technology
  • 90.3 mm2 die size
  • 151.6 million transistors
  • Up to 2.16GHz frequency (along with a 667MHz processor system bus)
  • 32KB on-die I-cache and 32KB on-die D-cache (write-back)
  • 2MB on-die L2 cache (shared between the two cores)
  • Data prefetch logic
  • Streaming SIMD [22] Extensions 2 (SSE2) and Streaming SIMD Extensions 3 (SSE3)
  • Sophisticated power and thermal management features
  • + Share This
  • 🔖 Save To Your Account