Home > Articles > Networking > Routing & Switching

  • Print
  • + Share This
This chapter is from the book

NSF/SSO, NSR, Graceful Restart to Ensure Robust Routing

Nonstop forwarding (NSF) refers to the capability of the data plane to continue to function hitless when the routing plane disappears (momentarily, that is) and most likely fails over to a standby RP. Of course, the routing information and topology might change during this time and result in an invalid FIB, and therefore the switchover times should be as small as possible. The Cisco ASR 1000 provides switchover times of less than 50 ms RP to RP (or IOS daemon [IOSD] to IOSD for the ASR 1002-F/ASR 1002/ASR 1004).

Stateful switchover (SSO) refers to the capability of the control plane to hold configuration and various states during this switchover, and to thus effectively reduce the time to utilize the newly failed-over control plane. This is also handy when doing scheduled hitless upgrades within the ISSU execution path. The time to reach SSO for the newly active RP may vary depending on the type and scale of the configuration.

Graceful restart (GR) refers to the capability of the control plane to delay advertising the absence of a peer (going through control-plane switchover) for a "grace period," and thus help minimize disruption during that time (assuming the standby control plane comes up). GR is based on extensions per routing protocol, which are interoperable across vendors. The downside of the grace period is huge when the peer completely fails and never comes up, because that slows down the overall network convergence, which brings us to the final concept: nonstop routing (NSR).

NSR is an internal (vendor-specific) mechanism to extend the awareness of routing to the standby routing plane so that in case of failover, the newly active routing plane can take charge of the already established sessions.

Table 12-1 shows the compatibility and support matrix for ASR 1000 IOS XE software 2.2, and outlines the various states that are preserved during FP/ESP failover.

Table 12-1. Protocols and Their State Preservation via NSF/SSO

Technology Focus



Routing protocols

Enhanced Interior Gateway Routing Protocol (EIGRP), Open Shortest Path First Version 2 (OSPFv2), OSPFv3, Intermediate System-to-Intermediate System (IS-IS), and Border Gateway Protocol Version 4 (BGPv4)

IPv4 services

Address Resolution Protocol (ARP), Hot Standby Routing Protocol (HSRP), IPsec, Network Address Translation (NAT), IPv6 Neighbor Discovery Protocol (NDP), Unicast Reverse Path Forwarding (uRPF), Simple Network Management Protocol (SNMP), Gateway Load Balancing Protocol (GLBP), Virtual Router Redundancy Protocol (VRRP), Multicast (Internet Group Management Protocol [IGMP])

IPv6 services

IPv6 Multicast (Multicast Listener Discovery [MLD], Protocol Independent Multicast-Source Specific Multicast [PIM-SSM], MLD Access group)

L2/L3 protocols

Frame Relay, PPP, Multilink PPP (MLPPP), High-Level Data Link Control (HDLC), 802.1Q, bidirectional forwarding detection (BFD)

Multiprotocol Label Switching (MPLS)

MPLS Layer 3 VPN (L3 VPN), MPLS Label Distribution Protocol (LDP)


SBC Data Border Element (DBE)

See the "Further Reading" section at the end of this chapter to find out where to look for complete route scale testing details.

Use Case: Achieving High Availability Using NSF/SSO

To command higher revenues and consistent profitability, service providers and enterprises are increasingly putting more mission-critical, time-sensitive services on their IP infrastructure. One of the key challenges to this is achieving and delivering high network availability with strict service level agreement (SLA) requirements. It is universally understood that availability of the network is directly linked with the overall total cost of ownership (TCO).

An enterprise has an ASR 1006 / ASR1000-ESP10 router used in the core of the network running OSPF as the routing protocol used to connect to multiple distribution hub routers, where distribution hub routers might not all be Cisco.

The goal is to reduce the route/prefix recomputation churn caused by RP switchover and reestablishment of OSPF peers.

To address the requirements, you need to implement Internet Engineering Task Force (IETF) NSF for OSPF because that is interoperable with all vendors that are NSF-aware (a term used for a neighboring router that understands the GR protocol extensions). In this case, when NSF-capable ASR 1000 switches over from active RP to standby RP, there will be no packet loss at all, and downstream neighbors will not restart adjacencies.

Figure 12-1 shows the ASR 1000 core router and its neighbors, which are all NSF-aware and can act as helpers during RP SSO.

Figure 12-1

Figure 12-1 Logical view of many regional WAN aggregation routers coming into a consolidated WAN campus edge router.

To turn on IETF helper mode on all the distribution hub routers, including the Cisco ASR 1000, you need to execute the following configuration steps:

  • Step 1. Configure NSF within the given OSPF process ID:
    ASR1006# configure terminal
    ASR1006(config)# router ospf 100
    ASR1006(config-router)# nsf ietf restart-interval 300
  • Step 2. Check that the NSF is turned on, for sure, on the helper router:
    Router-helper# show ip ospf 100
     Routing Process "ospf 100" with ID
     ----output truncated----
     IETF Non-Stop Forwarding enabled      
        restart-interval limit: 300 sec    
     IETF NSF helper support enabled       
     Cisco NSF helper support enabled      
     Reference bandwidth unit is 100 mbps
        Area BACKBONE(0)
    ASR1006# sh ip ospf 100
     Routing Process "ospf 1" with ID
     ----output truncated----
     IETF Non-Stop Forwarding enabled     
         restart-interval limit: 300 sec  
         IETF NSF helper support enabled  
         Cisco NSF helper support enabled       
  • Step 3. Now you need to verify that both RPs are active (using the show platform command) and OSPF neighbor relationships are established (using the show ip ospf neighbors command):
    ! active ESP:
    ASR1006# show platform software ip fp active cef summary
    Forwarding Table Summary
    Name       VRF id   Table id    Protocol      Prefixes    State
    Default    0        0           IPv4           10000       cpp:
    ! standby ESP:
    ASR1006# show platform software ip fp standby cef summary
    Forwarding Table Summary
    Name       VRF id   Table id   Protocol    Prefixes  State
    Default    0        0          IPv4        10000     cpp: 0x10e265d8

    You can also view the prefixes downloaded into both the active and standby Embedded Service Processor (ESP) before failing over the router.

    The preceding output shows that about 10K routes are created and exist in both ESPs before the failover.

  • Step 4. Now you'll induce the RP SSO failover (using redundancy force-switchover) from the active RP enable mode CLI. The following output shows the effects from the newly active RP:
    ASR1006# show ip ospf 100
     ----output truncated----
     IETF Non-Stop Forwarding enabled
        restart-interval limit: 300 sec, last IETF NSF restart 00:00:10 ago
    IETF NSF helper support enabled
     Cisco NSF helper support enabled
  • Step 5. RP SSO will not result in any packet loss, because forwarding continues during this entire process. During this switchover process, you can execute the show platform command to verify that the former active RP is booting ("booting" state).

In case of ASR1000-ESP10 failover, some small packet loss will occur (packets that are being processed inside the QuantumFlow Processor [QFP]), although that would account for much less than 1-ms worth of transit traffic loss.

NSF/SSO allows RPs to fail over without any packet loss, and ESPs can fail over with extremely small packet loss. The Cisco ASR 1000 shows core benefits of a carrier-class router where failover times beat even the Automatic Protection Switching (APS) gold standard of 50 ms.

In today's networks, where SLAs are enforced and networks are participating in life- and mission-critical scenarios, a robust infrastructure with faster failover based on modern architectures is a must.

  • + Share This
  • 🔖 Save To Your Account