3.5 Gigabit Ethernet
Gigabit Ethernet owes its existence to the technical innovations of Fibre Channel transport and the historical momentum of Ethernet and IP networking. From Fibre Channel, Gigabit Ethernet has taken the breakthrough technology of gigabit physical specifications, fiber optics, CDR, 8b/10b data encoding, and ordered sets for link commands and delimiters. From Ethernet, it has inherited mainstream status and seamless integration to a vast installed base of operating systems and network infrastructures. Although Fibre Channel has had to struggle for credibility as an emergent technology, Gigabit Ethernet's credibility was established before it was even implemented. Today, 10 Gigabit Ethernet and higher speeds are assumed to be the logical evolution of the technology and of future enterprise networks. In the process, Ethernet is shedding some of its characteristic attributes such as collision detection and shared topology, but is retaining its name.
3.5.1 Gigabit Ethernet Layers
The reference model for Gigabit Ethernet is defined in the Institute of Electrical and Electronics Engineers (IEEE) 802.3z standard. Like 10/100 Ethernet, Gigabit Ethernet is a physical and data link technology, corresponding to the lower two OSI layers, as shown in Table 34.
Table 34 Gigabit Ethernet physical and data link layers
OSI Reference Layer |
Gigabit Ethernet Layer |
Data link layer |
Media access control (MAC) client sublayer |
MAC control (optional) |
|
MAC |
|
Physical Layer |
Reconciliation |
Gigabit media independent interface |
|
Media-dependent PHY group |
|
Medium-dependent interface |
|
Medium |
The Gigabit Ethernet physical layer contains both media-dependent and media-independent components. This allows the gigabit media-independent interface to be implemented in silicon and still interface with a variety of network cabling, including long- and shortwave optical fiber and shielded copper. The reconciliation sublayer passes signaling primitives between upper and lower layers, including transmit and receive status as well as carrier sense and collision detection. In practice, Gigabit Ethernet switching relies on dedicated, full duplex links and does not need a collision detection method. Carrier sense multiple access with collision detection (CSMA/CD) is incorporated into the standard to provide backward compatibility with standard and Fast Ethernet.
Unlike Fibre Channel, Gigabit Ethernet's 8b/10b encoding occurs at the physical layer via sublayers in the media-dependent physical (PHY) group. As shown in Figure 314, Fibre Channel layers FC-0 and FC-1 are brought into the lower layer physical interface, whereas traditional 802.3 Ethernet provides MAC and logical link control (LLC; or its offspring, MAC client) to support the upper layer protocols.
Figure 314 Gigabit Ethernet/Ethernet and Fibre Channel.
To facilitate its integration into conventional Ethernet networks and wide area transports, Gigabit Ethernet uses standard Ethernet framing as shown in Figure 315. The preamble and SOF delimiter are followed by the destination (DA) and source (SA) MAC addresses of the communicating devices. Creative use of bytes within the length/type field enable enhanced functionality such as VLAN tagging, as discussed later. The data field may contain as much as 1,500 bytes of user data, with pad bytes if required. The CRC is part of the frame check sequence. Optional frame padding is provided by the extension field, although this is only required for gigabit half-duplex transmissions.
Figure 315 Standard Ethernet frame format.
IP over Ethernet is inserted into the data field and provides the network layer routing information to move user data from one network segment to another. TCP/IP provides higher level session control for traffic pacing and the ability to recover from packet loss. Although IP can be carried in other frame formats, link-layer enchancements for Ethernet offer additional reliability and performance capability unmatched by other transports, including Fibre Channel. These include VLANs, QoS, link-layer flow control, and trunking. Collectively, these functions provide a set of powerful tools for constructing storage networks based on IP and Gigabit Ethernet.
3.5.2 802.1Q VLAN Tagging
LANs in a switched Ethernet infrastructure enable the sharing of network resources such as large Gigabit Ethernet switches while segregating traffic from designated groups of devices. Members of a VLAN can communicate among themselves but lack visibility to the rest of the network. Sensitive information (for example, financial or human resource) can thus be isolated from other users, although the traffic is running through a common infrastructure. VLAN tagging was standardized in 1998 through the IEEE 802.1Q committee. An analogous capability is provided in Fibre Channel through a technique called zoning. Standards for zoning are still under construction, but only relate to the exchange of zone information by Fibre Channel switches. How zones are actually implemented within a switch is still proprietary. Consequently, there is no direct equivalent to 802.1Q's more open and flexible format.
VLAN tagging is accomplished by manipulating the length/type field in the Ethernet frame. To indicate that the frame is tagged, a unique 2-byte descriptor of hex "8100" is inserted into the field. This tag type field is followed by a 2-byte tag control information field, as shown in Table 35, which carries the VLAN identifier and user priority bits as described later. The 12-bit VLAN identifier allows for as many as 4,096 VLANs to be assigned on a single switched infrastructurefar more than the number of zones typically offered by Fibre Channel switch vendors.
Table 35 IEEE 802.1Q VLAN tag fields
802.1Q Tag Type Field |
Tag Control Information Field |
||
81-00 |
User Priority |
Canonical format indicator bit (CFI) |
VLAN Identifier |
16 bits |
3 bits |
1 bit |
12 bits |
From a performance standpoint, VLAN tagging is a highly efficient means of segregating network participants into communicating groups without incurring the overhead of MAC address filtering. Intervening switches use the logical VLAN identifier, rather than the MAC address, to route traffic properly from switch to switch, and this in turn simplifies the switch decision process. As long as the appropriate switch port is associated with the proper VLAN identifier, no examination of the MAC address is required. Final filtering against the MAC address occurs at the end point.
All major Gigabit Ethernet switch vendors support the 802.1Q standard. This makes it a very useful feature not only for data paths that must cross switch boundaries, but for heterogeneous switched networks as well. For IP storage network applications, 802.1Q facilitates separation of storage traffic from user messaging traffic as well as segregation of different types of storage traffic (for example, on-line transaction processing) from tape backup. Compared with Fibre Channel zoning, 802.1Q VLANs offer more flexibility and lack the complexity of vendor-specific implementations.
3.5.3 802.1p/Q Frame Prioritization
The 802.1Q VLAN tag control information field allocates 3 bits for user priority. The definition for these User Priority bits is provided by IEEE 802.1p/Q, and enables individual frames to be marked for priority delivery. The QoS supported by 802.1p/Q allows for eight levels of priority assignment. This ensures that mission-critical traffic will receive preferential treatment in potentially congested conditions across multiswitch networks, and thus minimizes frame loss resulting from transient bottlenecks.
For storage network applications, the ability to prioritize transactions in an IP-based SAN is a tremendous asset. Storage networks normally support a wide variety of applications, not all of which require high priority. Updating an on-line customer order or a financial transaction between banks, for example, rates a much higher priority for business operations than a tape backup stream. The class of service provided by 802.1p/Q allows storage administrators to select the applications that should receive higher priority transport and assign them to one of the eight available priority levels. In a multiswitch network, class of service ensures that prioritized frames will have preference across interswitch links. Except for a few proprietary port-based implementations, Fibre Channel currently does not support frame prioritization and thus cannot distinguish between mission-critical and less essential storage applications.
3.5.4 802.3x Flow Control
Flow control at the data link level helps to minimize frame loss and avoids latency resulting from error recovery at the higher layer protocols. In Fibre Channel, flow control for class 3 service is provided by a buffer credit scheme. As buffers are available to receive more frames, the target device issues receiver readys (R_RDYs) to the initiator, one per available buffer. In Gigabit Ethernet, link-layer flow control is provided by the IEEE 802.3x standard. The 802.3x implementation uses a MAC control PAUSE frame to hold off the sending party if congestion is detected. If, for example, receive buffers on a switch port are approaching saturation, the switch can issue a PAUSE frame to the transmitting device so that the receive buffers have time to empty. Typically, the PAUSE frame is issued when a certain high-water mark is reached, but before the switch buffers are completely full.
Because the PAUSE frame is a type of MAC control frame, the frame structure is slightly different from the conventional data frame. Like VLAN tagging, the length/type field is used to indicate the special nature of the frame, in this case hex "8808" to indicate a MAC control frame. As shown in Table 36, this indicator is followed by an opcode of hex "0001" to define further the MAC control frame as a PAUSE frame. The amount of time that a transmitting device should cease issuing frames is specified by the opcode parameter field. pause_time cannot be specified in fixed units such as microseconds, because this would prove too inflexible for backward compatibility and future Ethernet transmission rates. Instead, pause_time is specified in pause_quanta, with one pause_quanta equal to 512 bit_times for the link speed being used. The timer value can be between 0 and 65,535 pause_quanta, or a maximum of approximately 33 msec at Gigabit Ethernet's 1.25-Gbps transmission rate. If the device that issued the PAUSE frame empties its buffers before the stated pause_time has elapsed, it can issue another PAUSE frame with pause_time set to zero. This signals the transmitting device that frame transmission can resume.
Table 36 IEEE 802.3X PAUSE frame format
Length/Type |
MAC Control Opcode |
Parameters |
88-08 |
00-01 |
Pause_time |
16 bits |
16 bits |
16 bits |
Because PAUSE frames may be used between any devices and the switch ports to which they are attached, and because Gigabit Ethernet only allows one device per port, there is no need to personalize the PAUSE frame with the recipient's MAC address. Instead, a universal, well-known address of 04-80-C2-00-00-01 is used in the destination address field. When a switch port receives the PAUSE frame with this address, it processes the frame but does not forward it to the network.
The 802.3x flow control provided by Gigabit Ethernet switches creates new opportunities for high-performance storage traffic over IP. Fibre Channel class 3 service has already demonstrated the viability of a connectionless, unacknowledged class of service, providing there is a flow control mechanism to pace frame transmission. In Fibre Channel fabrics using class 3, as with 802.3x in Ethernet, the flow control conversation occurs between the switch port and its attached device. As the switch port buffers fill, it stops sending R_RDYs until additional buffers are freed. In Gigabit Ethernet, this function is performed with PAUSE frames, with the same practical result. In either case, buffer overruns and the consequent loss of frames are avoided, and this is accomplished with minimal impact on performance.
The reliability provided by the gigabit infrastructure through data link flow control enables streamlined protocols to be run at the upper layer. For IP storage, the equivalent to Fibre Channel class 3 is UDP. UDP is connectionless and unacknowledged, and thus is unsuited to very congested environments such as the Internet. For contained data center storage applications, however, 802.3x flow control and storage over UDP/IP can offer a reliable and extremely high-performance solution without incurring the protocol overhead of TCP/IP.
3.5.5 802.3ad Link Aggregation
Link aggregation, or trunking, provides higher bandwidth for switched networks by provisioning multiple connections between switches or between a switch and an end device such as a server. Link aggregation also facilitates scaling the network over time, because additional links to a trunked group can be added incrementally as bandwidth requirements increase. In Figure 316, two Gigabit Ethernet switches share three aggregated links for a total available bandwidth of 7.5 Gbps full duplex.
Figure 316 Link aggregation between two Gigabit Ethernet switches.
Originally, the 802.3ad standards initiative was promoted as a means to provide higher bandwidth for standard 10/100-Mbps Ethernet networks. Link aggregation was a means of satisfying higher bandwidth requirements while Gigabit Ethernet was still being developed. As with memory, CPUs, and storage, however, whatever performance or capacity is reached at any given point in time is never sufficient for the increasing demands of users and applications. Consequently, bundled Ethernet links have been replaced with bundled Gigabit Ethernet links, which at some point will be superseded by bundled 10 Gigabit and higher Ethernet links. Replicators, for example, will no doubt require bundled 100 Gigabit Ethernet links.
Link aggregation must resolve several issues to avoid creating more problems than it fixes. In normal bridge environments, the spanning tree algorithm would, on encountering multiple links between two devices, simply disable the redundant links and only allow a single data path. This would prevent duplication of frames and potential out-of-order delivery. Link aggregation must therefore make multiple links between two devices appear as a single path, while simultaneously providing a mechanism to avoid frame duplication and ensure in-order frame delivery. This could be accomplished by manipulating MAC addresses (for example, assigning the same MAC address to every trunked link) or by inserting a link aggregation intelligence between the MAC client and MAC layers. The status of link availability, current load, and conversations through the trunk require monitoring to ensure that frames are not lost or inadvertantly reordered.
In-order delivery of frames is guaranteed if a conversation between two end devices is maintained across a single link in the trunk. Although this is not as efficient link utilization as simply shipping each frame over any available connection, it avoids the extra logic required to monitor frame ordering and to reassemble them before delivery to the recipient. At the same time, additional transactions by other devices benefit from the availability of the aggregated interswitch links, and switch-to-switch bottlenecks are avoided.
Link aggregation as specified in 802.3ad is almost mandatory for IP-based storage networks, particularly when multiple Gigabit Ethernet switches are used to build the SAN backbone. Along with 802.1p/Q prioritization, link aggregation can ensure that mission-critical storage traffic has an available path through the network and that multiple instances of mission-critical transactions can occur simultaneously. This requirement will be satisfied temporarily by the arrival of 10Gb uplinks between switches, but these will inevitably be "trunked" to provide even higher bandwidth over time.
3.5.6 Gigabit Ethernet Physical Layer Considerations
Gigabit Ethernet has borrowed so heavily from the Fibre Channel physical layer that there are relatively few differences between them. Gigabit Ethernet has a slightly higher transmission rate of 1.25 Gbps, compared with Fibre Channel's 1.0625 Gbps. For storage applications, Gigabit Ethernet's faster clock can drive approximately 15 MBps more bandwidth over interswitch links.
Transceivers for Gigabit Ethernet applications may also vary, although some GBICs can interoperate with both Fibre Channel and Gigabit Ethernet transmission speeds. Gigabit Ethernet has introduced support for new cable types for gigabit transport, including category 5 unshielded twisted pair. As shown in Table 37, cable distances are comparable with Fibre Channel, with the exception of long-wave, single-mode cabling (10 km for Fibre Channel).
Table 37: Gigabit Ethernet cable specifications
Cable Type |
Diameter |
Laser Type |
Maximum Distance (m) |
1000BASE-T |
CAT-5 UTP |
N/A |
100 |
1000BASE-CX |
STP |
N/A |
25 |
1000BASE-LX |
10 mm |
Long wave |
5,000 |
1000BASE-SX |
50 mm |
Short wave |
500 |
1000BASE-LX |
50 mm |
Long wave |
550 |