This chapter gives an overview of the VRRP as a redundancy protocol without going into the details of used messages, the details of the states the protocol specifies, and the details to be taken into account while operating the protocol. Chapter 4 provides these details. Here we first put VRRP in context to articulate the significance of the function VRRP protects; then we talk about the special circumstances that necessitate a protocol such as VRRP. After establishing the context, we introduce basic VRRP concepts by using a series of simple configurations for didactic purposes. This approach helps us to explore some basic characteristics of the protocol: What is the typical VRRP configuration for establishing some level of load sharing? How is M-to-N redundancy established with VRRP?
The coverage of the basic elements of the protocol makes it possible to study some more realistic cases from the last section of this chapter. The last section discusses some typical VRRP deployment configurations before summarizing the covered topics.
2.1 The Case for VRRP
To provide a usage context for VRRP we consider an enterprise network connecting a corporate office to multiple branch offices in different regions. Figure 2-1 represents such an enterprise network.
FIGURE 2-1. An enterprise network
A cloudthat is, a network of unspecified type and topology, but typically a Wide Area Network (WAN)connects the branch offices to the corporate office, the headquarters of the enterprise. This WAN may consist of leased circuits, an X.25 packet network, a Frame Relay, or an ATM network. Another possibility is a traditional IBM's Systems Networking Architecture (SNA). But the cloud may also be representing an IP-based network of networks: a private internet or Internet with a capital I.
The data center containing all business-critical databases resides in the corporate office, as does the network management center providing services essential to the operation of the corporate network. Finance department, payroll, human resources, and some other business functions are centralized to the headquarters. Different departments in the corporate office have their own interconnected LANs and are all connected to the external world through a device that we shall temporarily call a gateway, since it opens the gate to the external world. In our example, a router of some kind provides the gateway function. You may also hear the term default router used for this purpose. Another name for routers in this role is first hop router, since this is the first router the hosts need to use to reach any destination they cannot reach through direct routing.
Given the centralized setup of our illustration, branch offices of the enterprise depend heavily on the computer resources residing in the corporate office. Without access to these resources, the branch offices may fail to perform even their most basic functions.
The network settings and computer equipment of the branch offices are more modest. Depending on the specifics of the business, it may just contain a series of locally networked PCs. Connection to the cloud is established through a router acting as a gateway device, that is, a first hop router.
A Closer Look at the Branch Office
Now given the importance of the corporate resources that branch offices can only access through the cloud, the availability of the network services is extremely critical to the business. In our scenario, first hop routers constitute single points of failure.
VRRP, the protocol we are going to discuss in this book, provides a scheme intended to avoid this specific type of single points of failure. To understand how the need arises for this specific scheme and how VRRP basically achieves this avoidance, we zoom in to one of the branch offices.
In this branch office configuration, we have a series of general-purpose computersPCs, workstations, laptopsinterconnected via a LAN (typically Ethernet) to exchange local information, to share local resources. In this configuration, it is extremely clear that the first hop router constitutes the single point of failure for the network access of the branch office. In order to avoid this problem, the first step would be to introduce a redundant first hop router. Figure 2-2 depicts a configuration with two routers positioned to act in the first hop function.
Now several questions arise: How do these hosts decide which one of the routers (R1 or R2) to use initially or in normal circumstances? How do they switch to the available one should the one they were using fail? Hosts thus need a mechanism to decide which one of the hosts to use. They also need a mechanism to decide when to perform a switch. Moreover, when there is more than one backup router, they also need a mechanism to decide to which one of the backups to switch. Thus the questions at hand are threefold:
How do the hosts discover or select the first hop routers?
How do the hosts switch to backups when the master fails?
How do the hosts decide which backup to switch to when there is more than one?
There are various ways of implementing a mechanism to answer the above questions. The solution depends first of all on a series of assumptions surrounding the questions at hand. Maybe the most fundamental one of these assumptions is that the network depicted in Figure 2-2 (network N1) is an IP network, that the communication between the hosts and the router relies on IP, and that the router under discussion is an IP router. Note that the cloud (network N2) may be running a variety of protocols: ATM, Frame Relay, and so on. Network N2 is most typically the network of a service provider, the transport mechanisms of which are transparent to the IP connectivity of the network N1.
FIGURE 2-2. ARouter pair
Furthermore, there are additional questions that are of prime importance to network administrators in particular:
What is the status of backups when they act as backups? Are they just on standby, or are they also handling some portion of the traffic? What are the advantages and disadvantages of both approaches?
If the backups are just on standby, how is this wasteful use of resources justified?
If the backups are incorporated in a load-sharing arrangement, what are the mechanisms for achieving that?
One category of solution consists of installing additional networking software within the hosts. Introducing dynamic routing software (such as RIP or OSPF) would come into this category. With the routing software enabled, hosts can behave as routers in addition to their end-system behavior. They can discover the first hop routers available in the LAN, for example, by listening to RIP updates on port 520. In case of failure, they can switch to alternate routers based on the routing protocol at hand. To follow the RIP example, they may receive a triggered event implying the failure of the current first hop router. Moreover, the metric mechanisms of the protocol can establish the switching priority among alternative first hop routers. By assigning different costs to different links, the network administrator can create a hierarchy of alternative first hop RIP routers.
This solution may be not the best or may not be feasible for several reasons. The hosts may not have the capacity to run routing softwarein particular, a sophisticated and therefore demanding software such as OSPF. But even RIP in its passive mode, where the host can just listen to the updates without sending messages of its own, can be too much for an underpowered PC. Even if they were able to run RIP, RIP might turn out to be too slow to adapt to topology changes. In some cases, the implementation of these protocols may not be available at all on certain platforms.
Administrative concerns constitute another major obstacle to the use of dynamic routing protocols in the setting under discussion: the installment, the configuration, and the overall management of these protocols may be totally undoable given the centralization of the network management in our illustrative enterprise. Many administrators consider running routing protocols at the desktops a management nightmare. Keeping track of the routing software on different machines, making sure that they are all configured properly, they are all interoperable....These are next to insurmountable challenges. Centralizing the routing intelligence at the edge of outbound pipe brings substantial simplicity to overall availability solutions in a way totally transparent to the end stations.
Security considerations may always be show stoppers in most networking issues, including running a dynamic protocol in a remote branch office ill equipped to handle basic protection requirements.
As an alternative to dynamic routing, we can look at ICMP and consider running an ICMP router discovery client on our hosts. As a matter of fact, in recent times, some of the newer IP hosts use Router Discovery Protocol (RDP) to find a new router when a route becomes unavailable. A host that runs RDP listens for hello multicast messages from its configured router and uses an alternative router when it no longer receives those hello messages. The default timer values of RDP mean that it's not suitable for quick detection of failure of the first hop, since the default advertisement rate is once every 7 to 10 minutes and the default lifetime is 30 minutes.
Moreover, this approach would require the active participations of all hosts. The increasing number of the hosts would require larger timer values to minimize the protocol overhead in the network. The larger timers, on the other hand, would lead to longer delays in the discovery of the failing neighbors, most important of the neighbors in the first hop router role. The result of these delays can be unacceptably long black hole periods.
Proxy Address Resolution Protocol (ARP)
Some IP hosts use proxy ARP to select a router. When a host runs proxy ARP, it sends an ARP request for the IP address of the remote host it wants to contact. A router, let us say R1 on the network, replies on behalf of the remote host and provides its own MAC address. With proxy ARP, the host behaves as if the remote host were connected to the same segment of the network. If the router R1 fails, the host continues to send packets destined for the remote host to the MAC address of R1, even though those packets have nowhere to go and are lost. You can either wait for ARP to acquire the MAC address of another routersay, R2 on the local segmentby sending another ARP request, or reboot the host to force it to send an ARP request. In either case, for a significant period of time the host can't communicate with the remote host, even though the routing protocol has converged, and R2 is prepared to transfer packets that would otherwise go through R1.
Dynamic Host Configuration Protocol (DHCP)
An interesting alternative is DHCP, a very popular and common method for providing configuration information to hosts on IP networks. A host running a DHCP client requests configuration information from a DHCP server when it boots onto the network. This configuration information typically comprises an IP address for the host and an IP address for a first hop router. Once configured, there is no mechanism within DHCP for switching to an alternative router if the default router fails. From the point of view of our stated problem, DHCP helps with discovery or selection of the first hop routers, but not with creating a redundancy scheme and establishing a switchover mechanism.
Static Configuration and VRRP
The final alternative under consideration would be reliance on the static routing. In this scenario, a network administrator would configure the IP address of the host as well as a first hop router as the default router for the host or use DHCP as discussed to obtain an IP address for the client and to learn the IP address of the default router.
This is by far the most feasible of the discussed alternatives. Static configurations are supported almost without exception by all TCP/IP implementations. This solution becomes even more attractive with the continuous deployment of DHCP clients and servers that substantially facilitates the configuration of the default routers.
The obvious shortcoming of this approach, though, is the same as the one we have discussed in our conclusion about DHCP. The static configuration helps with the discovery of the first hop router but does not help with the switchover or the selection of a master from the multiple backups. This very shortcoming leads to the creation of VRRP.
Having the first hop router configured as the default router establishes access to the external network, but another mechanism is needed to keep the access available. In the case of a failing default router, the hosts have no means for switching to an alternative router available in the network. Under these circumstances, unfortunately, the statically configured first hop router becomes the single point of failure for the network availability. VRRP enables additional routers to take over the role of a failing first hop router, thus helping them to avoid becoming the single point of failure for network services. Table 2-1 summarizes the advantages and disadvantages of the different approaches to protect first hop (gateway) routers.
TABLE 2-1. Different Approaches to Protect the Gateway Routers
flexible: different failover hierarchies can be established more failure points protected
requires routing software in the hosts demanding on the hosts challenging to administer security concerns
may come with new IP stacks in the hosts
requires discovery software in the hosts with older IP stacks-not responsive, may lead to black holes
no special software required
not repsonsive, may lead to
in the host
black holes-may require rebooting of the hosts
does not help with failover
VRRP with Static Routing
static route configuration commonly available with TCP/IP does not require any special software in the host based on an industry standard-responsive or can be fine-tuned to be responsive
protects only the default router's-local interface but mechanism extendable to trigger a switch over in the case of other interfaces applicable only to IP networks, -but other protocols such as IPX on a VRRP interface able to be piggy-backed with IP for VRRP