Chapter 4. Spanning Tree

The Spanning Tree Protocol (STP) is a Layer 2 protocol that prevents loops in transparently bridged networks. Due to the nature of transparent bridging, when an active looped topology exists, a network meltdown generally occurs in a matter seconds. STP is a protocol that builds a logical loop-free topology, ensuring the network does not suffer from major problems such as a broadcast storm or bridge table corruption.

STP was originally developed by Digital Equipment Corporation in 1983 to address the issues of running transparent bridging in a looped Layer 2 topology. Today, STP exists in two flavors:

  • DEC— The original Spanning Tree Protocol, created by Digital Equipment Corporation.
  • IEEE— Standards-based Spanning Tree Protocol, specified in the 802.1d standard, initially developed by Radia Perlman. This protocol was developed from the DEC STP implementation; however, the versions are incompatible with each other. The IEEE 802.1d version is almost exclusively used in today's networks.

This chapter focuses exclusively on the IEEE 802.1d version of STP. After some initial introductory material, this chapter presents the following configuration scenarios, which provide you with the practical knowledge required to implement spanning tree:

  • Scenario 4-1: Configuring the Root Bridge
  • Scenario 4-2: Configuring STP Load Sharing
  • Scenario 4-3: Configuring Root Guard
  • Scenario 4-4: Configuring Spanning Tree PortFast
  • Scenario 4-5: Configuring PortFast BPDU Guard
  • Scenario 4-6: Configuring PortFast BPDU Filter
  • Scenario 4-7: Configuring UplinkFast
  • Scenario 4-8: Configuring BackboneFast
  • Scenario 4-9: Improving Convergence and Load Sharing by Using a Multilayer Topology
  • Scenario 4-10: Troubleshooting Spanning Tree

Introduction

Spanning tree is designed to ensure a loop-free forwarding topology is generated in a multi-switch LAN. Due to the nature of transparent bridging, if any loops are in the active Layer 2 topology, frames such as broadcast frames, multicast frames, and unknown unicast frames will continuously circle the looped network.

Transparent bridging defines no mechanism such as the TTL (time-to-live) field used in IP packets to prevent frames from continuously circling a looped network. This fact causes a snowball effect, with the number of broadcast, multicast, and unknown unicast frames looping the network increasing. Because broadcast frames must also be processed by the CPU of every device receiving the frame, CPU usage on every device increases as more and more frames loop the network. Eventually (normally within a matter of seconds), the entire network goes into a meltdown. CPU time and memory on each switch are consumed just processing each broadcast frame, and the available bandwidth on each link for valid traffic becomes less and less. As you can see, a looped Layer 2 topology is a catastrophe for any network, and you definitely need to prevent loops in the topology, while still providing redundant paths.

In this section, you are introduced to the concepts of spanning tree and how it can generate a loop-free topology that dynamically reconverges to a new loop-free topology in the event of failures.

Spanning Tree Operation

A looped Layer 2 topology causes serious issues. Simply blocking a port from sending and receiving data can prevent a looped topology. Spanning tree is the protocol responsible for determining a loop-free topology and blocking the appropriate ports as required. To create a loop-free topology, spanning tree forms a tree structure that is generated from a root node or root bridge.

The root bridge is the heart of the spanning-tree topology, and is used as a reference point to generate a loop-free topology. Once the root bridge is selected, each bridge determines the best path to reach the root bridge and blocks any other paths that introduce loops. In a converged spanning-tree topology, a port can be either in a forwarding state or in a blocking state. Only ports that are considered the best path to the root bridge are placed into a forwarding state; all other ports are placed into a blocking state.

When spanning tree first initializes, each switch generates a unique bridge ID per VLAN, which is used by spanning tree to uniquely identify the switch. The bridge ID consists of the bridge MAC address plus a 2-byte field called bridge priority, which can be altered to directly affect whether or not a bridge becomes the root bridge. The bridge priority can be configured as any value between 0 and 65535 and is 32768 on Cisco switches. Figure 4-1 shows the structure of the bridge ID.

04fig01.gif

Figure 4-1 Bridge ID

The bridge ID is used to select the root bridge; the bridge with the lowest bridge ID always becomes the root bridge. An example of a bridge ID is 32768.000d.7903.0c00. The first portion (32768) is the bridge priority, represented in decimal, while the remaining portion of the bridge ID (000d.7903.0c00) is the hexadecimal representation of the bridge MAC address. Once the bridge ID has been determined, each bridge starts out by assuming that it is the root bridge and begins to generate configuration bridge protocol data units (BPDUs). Configuration BPDUs are the main communication mechanism for spanning tree and are used to determine the root bridge as well as whether or not a port should be forwarding or blocking. A configuration BPDU has various fields that are used to indicate parameters that are important to the generation of the final spanning-tree topology. Table 4-1 describes the important fields that are present in each configuration BPDU.

Table 4-1. Important Configuration BPDU Fields

Field

Description

Type

Determines whether the BPDU is a configuration BPDU or a topology change BPDU.

Root bridge ID

Lists the bridge ID of what the sender of the BPDU considers is the root bridge. Once spanning tree has converged, the root bridge ID should be identical for all BPDUs in the network.

Root path cost

Lists the cost to the root bridge of the bridge sending the configuration BPDUs. This cost helps the receiving bridge to determine the shortest path to the root bridge when multiple configuration BPDUs are received from multiple bridges.

Sender bridge ID

Lists the bridge ID of the device that sent the configuration BPDU. This value is the same for all BPDUs by the same switch.

Port ID

A value that describes the port from which the BPDU was sent.

Max age (seconds)

Determines the maximum time a BPDU is considered valid. By default, this value is 20 seconds. If BPDUs stop being received, after 20 seconds the last BPDU received is considered invalid, which means the existing root bridge is considered down.

Hello time (seconds)

The interval at which configuration BPDUs are generated. By default this value is 2 seconds. It is important to note that the root bridge only ever generates configuration BPDUs. Each non-root bridge merely propagates the configuration BPDUs that are generated by the root bridge.

Forward delay

Indicates the time spent in the listening and learning phases. These phases are used to allow time for a root bridge to be elected and for a loop-free topology to be determined.

With regards to selecting the root bridge, the important field in Table 4-1 is the root bridge ID. If a bridge receives a configuration BPDU that lists a lower root bridge ID than what the bridge considers is the current root bridge ID, the bridge immediately considers the lower root bridge ID as the root bridge and begins propagating configuration BPDUs received from this root bridge. Eventually, in a Layer 2 network with multiple bridges, the bridge with the lowest bridge ID becomes known as the root bridge to all bridges. At this point, the root bridge has been selected, and each non-root bridge now begins the process of generating a loop-free topology. Figure 4-2 demonstrates the selection of a root bridge.

04fig02.gif

Figure 4-2 Selecting the Root Bridge

In Figure 4-2, Bridge A is selected as the root bridge, because it has the lowest bridge ID. Once the root bridge has been selected, all non-root bridges do not actually generate configuration BPDUs by themselves. Each non-root bridge generates configuration BPDUs only when a configuration BPDU originated by the root bridge is received. The non-root bridge updates certain fields in the configuration BPDU (such as root path cost and sender bridge ID) and then propagates the updated configuration BPDU out all ports, except the port upon which the BPDU was generated. This process ensures that configuration BPDUs are propagated throughout the entire network to all switches.

Once the root bridge has been selected, each non-root bridge attempts to build a topology that forms the lowest-cost path to the root bridge. To accommodate this requirement, spanning tree uses the concept of cost. The concept of cost in spanning tree is a measure of the how preferable a link or logical port is in comparison to other links. The lower the cost, the more preferable the link. For example, a 10-Mbps port is considered less preferable than a 100-Mbps port and, thus, has a higher cost to indicate this. Each logical port has a default cost associated with it, which is defined in the 802.1d standard and depends on the bandwidth of the link. The cost for a logical port can be modified to influence root port selection. Table 4-2 shows the 802.1d default costs for various bandwidths associated with a link.

Table 4-2. IEEE 802.1d Default STP Costs

Bandwidth (Mbps)

Cost

4

250

10

100

16

62

100

19

155

14

622

6

1000 (1 Gbps)

4

10000 (10 Gbps)

2

Generating a Loop-Free Topology

It is important to understand that every logical port within a spanning-tree instance transitions through several states upon port initialization. Table 4-3 summarizes each of the STP states a port can be in.

Table 4-3. Spanning Tree Port States

State

Description

User data being forwarded?

Disabled

The port is in a non-functional state, which might be due to a hardware failure or due to the port being administratively shut down.

No

Listening

The port is sending and receiving configuration BPDUs and is determining the root bridge and the role the port should take.

No

Learning

The switch is accepting user data on the port, but is not forwarding it, instead populating the bridge table with destination MAC address information. This ensures that the network is not suddenly flooded with unicast floods.

No

Forwarding

This state is transitioned to from the listening state. In this state, the port forwards all traffic. Only ports that represent the shortest path to the root bridge are placed into a forwarding state.

Yes

Blocking

The port is being blocked from sending or receiving any user data, but still sends and receives configuration BPDUs. A port is placed into the blocking state if it is determined to not represent the shortest path to the root bridge.

No

As you can see, user data is only forwarded when a port is in the forwarding state. Spanning tree takes this very cautious approach to prevent any loops from forming even for a short time, because a broadcast storm can bring down a network in seconds. Figure 4-3 illustrates how a port transitions through each of the various states to reach either a forwarding state or a blocking state.

04fig03.gif

Figure 4-3 STP State Transition

In Figure 4-3, you can see the various events that cause a transition in port state. Notice that a port in the Disabled state only ever transitions to a Blocking state (unless Cisco PortFast is configured), which ensures a loop cannot be created before the network topology is learned. Each of the phases listed in Table 4-3 and Figure 4-3 are now described.

Disabled

A port is disabled when the Layer 2 protocol is down on the port, whether it be because the port has been administratively shut down, because it is not connected, or because of some issue with processing BPDUs. A port transitions from the Disabled state to the Blocking state and then immediately to a Listening state after it is initialized at the Layer 2 level.

Listening

The Listening state is the phase where most of the important legwork of generating a loop-free topology is performed. To generate a loop-free topology, spanning tree goes through the following processes:

  1. Elect the root bridge— You have already seen how a root bridge is elected. The bridge with the lowest bridge ID is selected as the root bridge.
  2. Select the root port— Every non-root bridge selects a single root port, which is the port that provides the closest path to the root bridge. The concept of cost is used to determine which path is the most optimal—the port that provides a path with the lowest cost to the root bridge is selected as the root port.
  3. Select a designated bridge (port) for each segment— Once the root bridge and root port have been determined, each switch determines whether or not it represents the shortest path to the root for each segment attached to the switch (excluding the segment attached to the root port). If the switch determines it represents the closest path to the root on a segment, it configures itself as the designated bridge for the segment and configures the port as a designated port. Each designated port is placed into a forwarding state, while all other non-designated ports are placed into a Blocking state. The exception to this configuration is if one of the non-designated ports represents the root port on another switch. In this case, the root port on the other switch remains in a Forwarding state, as well as remaining the designated port on the local switch.

When any of the decisions just listed for spanning-tree topology calculation are made, all those decisions are based upon the configuration BPDUs that are received by each bridge. No matter what the decision, whether it is selecting the root bridge or a root port, the same selection process is used for all decisions. This selection process is known as the Spanning-Tree Algorithm (STA) and is described in Table 4-4.

Table 4-4. The Spanning-Tree Algorithm

Priority

Criteria

1

Select the lowest root bridge ID

2

Select the lowest root path cost

3

Select the lowest sender bridge ID

4

Select the lowest port priority

Each of the criteria in Table 4-4 is processed one by one, by comparing the configuration BPDUs received on a port with the configuration BPDUs that are sent out a port, until a decision can be made. If parameters are equal, the next criterion is processed until a decision can be made. Referring back to Table 4-1, you can see that each of the selection criteria is a field in configuration BPDUs.

For example, consider the process of selecting the root bridge. If you take the STA and apply it to this process, you can see that the lowest root bridge ID becomes the root bridge. When it comes to selecting the root port, because a root bridge has been selected, the root bridge ID on all configuration BPDUs is the same, so this criteria cannot be used to make a selection. This fact means that the next criterion is evaluated (select the lowest root path cost). Again, if the criteria is the same on the configuration BPDUs being compared, the next criteria is evaluated, which is to select the lowest sender bridge ID.

Learning

During the Learning phase, the spanning-tree topology has normally been determined, and the switch is accepting user data. However, it is not forwarding it. The purpose of this phase is to populate the local bridging table on each switch, so that once traffic is actually forwarded, the switch does not need to flood a lot of traffic. Because the bridging table has been populated to a certain extent, the amount of unknown unicast destination MAC addresses is reduced, reducing the amount of flooding in the network.

Forwarding

After the Learning phase, if a port has been selected as either a root port or designated port, it is placed into the Forwarding state, which means that the port forwards user data. A port remains in the forwarding state until a topology change occurs where the path to the root bridge is affected or the root bridge itself fails. If this change occurs, the port transitions to the Listening phase and performs the appropriate selection processes.

Spanning-Tree Timers

Spanning-tree timers are important because they determine how quickly or slowly a spanning-tree topology can react to a link or bridge failure and converge to a new topology. As indicated in Table 4-1, there are three spanning-tree timers:

  • Hello timer— The interval at which each configuration BPDU is generated. The default is two seconds, meaning that a configuration BPDU is generated every two seconds.
  • Max age timer— Controls how long a configuration BPDU is valid after being received. The default is 20 seconds, meaning that if a configuration BPDU is not received within 20 seconds of the previous, the previous configuration BPDU is no longer valid and a new root bridge must be selected.
  • Forward delay— Controls the amount of time that a bridge port spends in each of the Listening and Learning phases before transitioning a blocking port to a Forwarding state.

It is important to ensure that the spanning-tree timers implemented are consistent throughout the spanning-tree topology. To ensure this, the root bridge configures the spanning-tree timers and attaches these to each configuration BPDU generated (see Table 4-1). Each non-root bridge inherits the spanning-tree timers in the configuration BPDUs, overriding any local configuration and ensuring the spanning-tree timers are consistent for the entire topology.

If a failure occurs in the spanning-tree topology, the various STP timers control how quickly the spanning-tree topology can converge. The following describes how to calculate the convergence time for different types of failures:

  • Direct failure— A direct failure is detected immediately and enables a switch to immediately expire the Max Age timer, invalidating all current configuration BPDUs. At this point, the switch announces itself as the root bridge and must pass through the Listening and Learning phases before forwarding traffic. Because the forward delay timer determines how long the Listening and Learning phases are, the convergence time for a direct failure is defined as 2 x forward delay. For example, if the forward delay timer is the standard 15 seconds, the convergence time of a direct failure will be 2 x 15 seconds or 30 seconds.
  • Indirect failure— An indirect failure is not detected immediately and relies upon configuration BPDUs not being received for the duration of the Max Age timer. Once the Max Age timer expires, the root bridge is considered down, and the switch will announce itself as the root bridge and must pass through the listening and learning phases before forwarding traffic. The convergence time for an indirect failure can be calculated as the Max age timer + 2 x forward delay. For example, if using the default STP timers, the convergence time of an indirect failure is 20 + (2 * 15) seconds or 50 seconds.

You can optimize spanning-tree timers to reduce the default convergence times, depending on your spanning-tree topology. Spanning-tree timers are dependant upon the network diameter of the Layer 2 network, which is defined as the maximum number of bridge hops between any two devices. The timers also depend on the value of the Hello timer, which can be reduced to ensure topology changes are learned of faster than when using the standard Hello timer value. Each timer is calculated so as to ensure that configuration BPDUs can be propagated throughout the network fully before decisions are made about forwarding or blocking ports. Clearly, if there are more bridge hops for a configuration BPDU to travel, the time required for propagation of BPDUs throughout the entire network is higher.

The default spanning-tree timers are designed to accommodate a spanning-tree topology that has a network diameter of seven. For some topologies, the network diameter might be lower than this; in these cases, the spanning-tree timers can safely be reduced. The 802.1d specification includes the correct formula for calculating spanning-tree timers based upon the Hello timer used and the network diameter. Cisco Catalyst switches provides tools that calculate the correct spanning-tree timers based upon network diameter and Hello timer interval.

Recent Spanning Tree Developments

The IEEE has been busy at work recently and has released new specifications relating to spanning tree. Two important specifications are now supported by certain Cisco Catalyst switches:

  • Rapid Spanning Tree Protocol (RSTP)
  • Multiple Spanning Tree (MST)

Each of these new protocols is now discussed.

RSTP

The most significant development for spanning tree in recent times is the 802.1w specification, which is also known as Rapid Spanning Tree Protocol (RSTP). RSTP is intended to replace the 802.1d standard and redefines the states that switch ports can be in, as well as how switches detect failure and the associated convergence time. With the advent of Layer 3 switching and the use of multilayer design to reduce the convergence times for modern switched networks, a primary goal of RSTP is to reduce convergence times to at least similar levels. RSTP achieves this and also includes standards-based implementations of PortFast, UplinkFast, and BackboneFast.

RSTP is supported from CatOS 7.1 and native IOS 12.1(11)EX on the Catalyst 6000/6500 platform. RSTP support is present from CatOS 7.2 on the Catalyst 4000, and at the time of this writing, it is not supported on the Cisco IOS-based Catalyst 4000 with Supervisor 3. It is supported from Cisco IOS 12.1(9)EA on the Cisco 2950 and 3550 platforms.

MST

The other important specification relating to spanning tree is the 802.1s specification, which is also known as Multiple Spanning Tree (MST). MST relates to how spanning tree interoperates with topologies that include multiple VLANs. On Cisco Catalyst switches, you can define the mode of spanning tree operation, which determines how the switch maintains STP for multiple VLANs. The following lists the common STP modes of operation:

  • CST (Common Spanning Tree)— Prior to 802.1s, the only standards-based interpretation of STP and its relation to multiple VLANs was available in the 802.1q specification, which dictated that a single spanning-tree instance should be used for all VLANs. This feature is also known as CST. The reason for defining CST is to ensure interoperability with non-802.1q bridges, as all STP communication is sent untagged on the native VLAN. Having only a single spanning-tree instance means that each switch CPU needs to deal only with a single STP instance; however, you cannot implement load sharing (multiple STP instances are required for load sharing), which is a major drawback for many networks.
  • Cisco PVST+ (Per-VLAN spanning tree)— Cisco developed the proprietary PVST+ mode of operation, which allows multiple STP instances to operate in a Layer 2 network, allowing for STP load sharing. PVST+ operates a unique STP instance per VLAN, which means that if you have 500 VLANs active in the Layer 2 network, 500 STP instances exist. Of course although 500 VLANs might exist in the network, only a handful of different paths through the network normally exist, meaning that you might require only several different STP topologies to implement load sharing. Thus, although PVST+ allows you to implement load sharing, the implementation is flawed in that a single STP instance is required for each VLAN, even if VLANs share the same STP topology. This flawed implementation can have a detrimental effect on CPUs in environments that support hundreds or thousands of VLANs.
  • Multiple Spanning Tree (MST)— MST combines the best of both 802.1Q and PVST+. MST allows you to map a configurable number of VLANs to a single STP instance, which means that all VLANs that share the same STP topology can be supported by just one STP instance. Load sharing is achieved by having multiple STP instances, but the number of STP instances that must be maintained on each switch can be matched to the number of different logical topologies required for your network to implement load sharing.

Figure 4-4 demonstrates a simple STP topology that includes 1000 VLANs and shows how load sharing is achieved for each of the technologies just discussed.

04fig04.gif

Figure 4-4 STP Load Sharing and 802.1Q, PVST+, and MST

In Figure 4-4, the STP instances for each spanning tree mode are shown. In CST (802.1q) mode, a single STP instance exists for all VLANs, and only one active STP topology exists. This arrangement means that Switch-C can have only one active path uplink.

In PVST+ mode, a single STP instance exists for each VLAN, which means that 1000 STP instances exist in total. This arrangement allows for load sharing to be implemented by configuring 500 STP instances to use one uplink on Switch-C as the active forwarding path and the remaining 500 STP instances to use the other uplink on Switch-C. Although STP load sharing is now possible with PVST+, it comes at the expense of significant CPU load on every switch in the network because 1000 STP instances need to be maintained.

Finally, with MST (802.1s) mode, only two STP instances are required, because you need only two separate STP topologies to implement load sharing. The first STP instance is used for VLANs 1-500, and the second STP instance is used for VLANs 501-1000. MST achieves the same load sharing results as PVST+ (traffic for 500 VLANs are forwarded over each uplink on Switch-C), but does so only requiring two STP instances, which significantly reduces CPU load on all switches throughout the network.

MST is supported from CatOS 7.1 and native IOS 12.1(11)EX on the Catalyst 6000/6500 platform. MST support is present from CatOS 7.1 on the Catalyst 4000, and at the time of writing, it is not supported on the Cisco IOS-based Catalyst 4000 with Supervisor 3. MST is supported from Cisco IOS 12.1(9)EA on the Cisco 2950 and 3550 platforms.

+ Share This