The Solution Space
A number of solutions have been developed over the years to limit control plane state, including summarization, aggregation, filtering, layering, caching, and back-off timers. All of these solutions fall into one of two different ways to limit control plane state—reducing the scope or the speed of control plane information. Each of these, in turn, solves a specific problem, such as
Reducing the scope of control plane information improves security by controlling the set of devices through which a view of the network can be obtained.
Reducing the scope of control plane information improves convergence by controlling the set of devices that must recalculate loop-free paths through the network because of any individual change.
Reducing the scope of control plane information reduces the chance of positive feedback loops by preventing state from “looping back” through the control plane.
Reducing the scope of control plane information reduces the chance of resource exhaustion in any particular device (and potentially lowers the cost of any particular device) by reducing the size of any tables held in memory and across which the set of loop-free paths must be calculated.
Reducing the speed of control plane information traveling through the network, or the velocity of state, reduces the chance of positive feedback loops forming and reduces the chance of resource exhaustion in any individual device.
The following sections consider several widely implemented and deployed techniques used to control the scope and velocity of state.
Summarizing Topology Information
Topological information can be summarized by making destinations that are physically (or virtually) connected several hops away appear to be directly attached to a local node, and then removing the information about the links and nodes in any routing information carried in the control plane from the point of summarization. Figure 19-6 illustrates this concept from the perspective of F, with E summarizing.
Figure 19-6 Summarization of topology information in the control plane
Before the topology is summarized (the upper network), F might (depending on the protocol) know A is connected to B, B is connected to C and D, and C and D are connected to E. If E begins to summarize the topology information (shown in the lower network), each of these other nodes appears, from F’s perspective, to be directly connected to E. The physical topology does not change, of course, but F’s view of the topology does change.
Summarization is a form of abstraction over the network topology; the set of reachable destinations is abstracted from the network and connected so that loop-free paths are preserved, but not detailed topology information. The way this is normally done is to remove actual link information while preserving the metric information associated with each destination, as the metric information alone can be used to calculate loop-free paths.
Distance vector protocols essentially summarize topology information at every hop, as they transmit each destination with a metric between devices. In Bellman-Ford, the local device examines its local view of the network to calculate the set of loop-free paths through the network. In Garcia-Luna’s Diffusing Update Algorithm (DUAL), the device keeps (in effect) one hop of topology information, the cost to each destination as seen from each of its neighbors, and uses this information to calculate alternate loop-free paths to each destination. Link state protocols carry full topology information, including links and metrics, within a single flooding domain.
Aggregating Reachability Information
If you take a trip to a distant city through a series of flights, you will need
Directions from your home to the local airport
Directions within the local airport to the correct gate to board the aircraft
Directions from gate to gate within the airport where each flight connection is made
Directions from the gate to the place where you pick up a rental car, or to a taxi, or to some form of public transportation
Directions to the hotel where you will be staying
Directions from the hotel to the site of the meeting or conference you will be attending
What would happen if you called your destination hotel and asked for full directions to its location from yours? Assuming the hotel staff even know how you are traveling, the directions would easily overwhelm you. Maybe they would look something like this:
Walk out your front door and get into your car.
Turn left out of your driveway, go to the first stop sign, turn left.
Proceed three blocks and turn right onto the entrance ramp onto the highway.
Merge into traffic and stay on this road for 4.1 miles.
When you disembark from the plane, turn left on exiting the gate.
Travel 400 yards to the internal airport transportation station.
Ascend the steps or escalator to the second level, turn left, and board the first train arriving there.
On the third stop, exit the train, turn left, and proceed down the steps or escalator to the first floor.
You can see how such a set of directions might be overwhelming in their scope. In fact, they would be so overwhelming as to be confusing.
The way travelers really navigate is in stages, or segments. A broad set of directions is given (board flight 123, which will take you to Chicago; then flight 456, which will take you to San Jose; rent a car; and drive to the hotel). At each of these steps, you assume there will be directions available locally to take you between any two points. For instance, you assume there will be signs on the local highway, or some software or map you can consult to provide you with directions from your home to the local airport, and then there will be signs within the airport where you are connecting between flights to guide you between the gates, etc.
This process of taking a trip in stages is, in reality, a form of abstraction. You know, when you travel, that information will become available as you proceed through the trip, and hence you do not need it right now. What you need is enough information to get you into a general area and then access to more detailed information when you get there.
This is precisely how aggregation in network protocols works. Aggregation removes more specific information about a particular destination as topological distance is covered in the network. Figure 19-7 illustrates.
Figure 19-7 Aggregation of reachability information
In Figure 19-7, there are three hosts connected to a single shared link (broadcast domain) attached to an interface on A. Each of these hosts has its own physical Media Access Control (MAC) address, which is related to an Internet Protocol (IP) address, which has been assigned either manually or through the Dynamic Host Configuration Protocol (DHCP). These addresses all fall within a single /64 range of addresses. A aggregates these host addresses into a single advertisement, traditionally considered the address of the “wire” in IP networks: 2001:db8:3e8:100::/64.
Two other routers, B and C, are advertising two other /64s; the three /64s advertised by A, B, and C fall within the same /60 address range. Router D is configured to aggregate these three /64s to the /60. E, in turn, advertises a default route (::/0) to F, which means “any IP address you do not know about, you can reach through me.” This is an aggregate sitting “above” 2001:db8:3e8:100::/60. Some useful terminology:
Supernet or aggregate: An address that covers, or represents, a set of longer prefix, or more specific, destinations
Subnet: An address that is covered, or represented by, a longer prefix, or less specific, destination in the routing table
Subnets and aggregates look identical in the routing table of any individual device. The only way you can see if a particular route is either a supernet or subnet is if the longer and shorter routes both exist in the routing table of the aggregating device at the same time. Without the subnet, you cannot tell whether a route is an aggregate or not.
A, in advertising 2001:db8:3e8:100::/64, does not remove any reachability from the network; rather it adds unreachable destinations that appear to be reachable to the control plane. Router A is advertising reachability to a large number of hosts, such as 2001:db8:3e8:100:4//64, even though this host doesn’t exist. In the same way, D is advertising unreachable address space into the network by advertising 2001:db8:3e8:100::/60, and E is advertising unreachable address space into the network by advertising ::/0.
Packets transmitted to a nonexistent host are normally just dropped by the first device with specific enough routing information to know the host doesn’t exist. For instance:
If a packet is forwarded by F toward E with a destination address of 2001:db8:3e8:110::1, E can drop this packet, as this destination does not fall within any of the available destinations in E’s routing table.
If a packet is forwarded by F toward E with a destination address of 2001:db8:3e8:103::1, D can drop the packet, as this destination does not fall within any of the available destinations in D’s routing table.
If a packet is forwarded by F toward E with a destination address of 2001:db8:3e8:100::100, A would need to drop the packet, as this destination is not in the local Address Resolution Protocol (ARP) cache at A’s connection to 2001:db8:3e8:100::/64.
There is another place where aggregation can be configured in a network: between the routing table (Routing Information Base, or RIB) and the forwarding table (Forwarding Information Base, or FIB), within an individual network device. This type of aggregation is fairly unusual; it is primarily used in situations where a device’s forwarding table is restricted to a particular size because of memory limitations.
Filtering Reachability Information
Filtering reachability information, unlike aggregation, does remove reachability information from the control plane; hence filtering is normally used as an aid or part of a layered defense for network security. Figure 19-8 is used to illustrate.
Figure 19-8 Route filtering
In Figure 19-8, A should be able to reach E within the organization (to the right of the organizational boundary line) and no destinations outside the organization. Host A definitely should not be able to reach G, for instance, or any of the transit links or routers within the organization’s network. There are several ways to accomplish this, of course. The network administrator could place a stateful packet filter at the edge of the network to block traffic that is not part of a session originating from inside the network, or the network administrator could configure a packet filter to block A from accessing any destination other than E. While these are, of course, good ideas, it is often best to combine such filters with some control plane filter to prevent any routers in the network that A is attached to (within the cloud) from learning about these destinations. To accomplish this, the network administrator can place a filter at B blocking the advertisement of any reachable destination within the network other than the subnet that E is attached to.
At D, all routes are also filtered toward F—except the default route. While this is configured as a route filter on D, it acts like route aggregation; the default still allows G to reach E, even though F does not have a specific route, by following the default route. It is important to differentiate between the two cases: a route filter being used like aggregation and a route filter being used to prevent or block reachability to or from a particular device (or set of devices).
Layering Control Planes
In Chapter 9, “Network Virtualization,” the case for building virtual topologies was laid out from the perspective of the data plane: primarily to provide traffic separation, reachability separation, and to provide “over the top” network services, particularly encryption and tunneled protocol support. There is an entirely separate case to be made for layering control planes, either with virtualized topologies, or without. Consider the security example set out previously in Figure 19-8; another way to solve the same problem might be to provision an overlay network, as shown in Figure 19-9.
Figure 19-9 An overlay as control plane information hiding
In Figure 19-9, A needs to access H and K, but not M; N needs to access all three. Router B is a smaller device, perhaps a small home office router, which can support just a handful of routes. It is possible, of course, to filter routing information at C such that B has just the one or two routes it needs, but this may not be scalable from a network management perspective. Nor does this provide traffic separation, which is a requirement in many places where overlay networks are used. Meeting any traffic separation requirements would necessitate building packet filters at every device along the path, adding further to the network management load.
A better option, in many cases, is to create a virtual overlay network including just the devices that need to communicate. In this case, the dashed gray lines represent the virtual overlay network created to fulfill the requirements given. From an information hiding perspective, what is important to note is the following:
B does not need to know about D or G, the links connecting them, nor the 2001:db8:3e8:102::/64 subnet; information about these topology elements and reachable destinations are hidden from the control plane at B by building a tunnel, or virtual topology, with one end at B and the other ends at E and F.
The second control plane can run as a different process on C, E, and F; this second control plane also does not need to know about these topology elements or reachable destinations.
Some information about topology and reachability, then, is hidden from B entirely, and some processes on C, E, and F, without reducing the required reachability. To connect this back to the concept of failure domains, routers that do not know about specific topology elements and/or reachable destinations do not need to recalculate the set of loop-free paths through the network when those (hidden) elements change. Because of this, B can be said to be in a different failure domain than D and G. Virtualization, then, can often be treated as another form of information hiding.
Caching begins with a simple observation: not all forwarding information is used all the time. Rather, particular flows pass along particular paths in a network, and particular pairs of devices (typically) only communicate for short periods of time. Storing forwarding information for short-lived flows, and in devices far off the path any particular flow might use, is a waste of resources. Figure 19-10 is used to illustrate.
Figure 19-10 Considering why caching works
In Figure 19-10, the path from A to 2001:db8:3e8:100::/64 does not pass through C, E, or F; if A is the only device that ever originates paths toward this destination, it is a waste of memory and processing power for C, E, and F to calculate shortest paths to the 100::/64 destination. But how would E know no host attached to 101::/64 is going to send traffic to some device connected to 100::/64? There is no way, from a control plane perspective, to know this.
Instead, E must rely on traffic as it passes through the network. For instance, E could calculate a route toward 100::/64 when some packet is transmitted from a locally attached host toward some destination on the 100::/64 subnet. This is a reactive control plane. Caching is not restricted to reactive control planes, however. It is possible for E to calculate a loop-free route to 100::/64, but to not install this information into its local FIB. This is another form of FIB compression, which can be used when the size of the RIB is not limited, but the size of the FIB is (for instance, when there is a limited hardware forwarding table). FIB compression was once quite common in network devices but has generally fallen out of favor as the cost of memory has decreased and other techniques to store more forwarding information in smaller amounts of memory have been developed and deployed.
The key question in any caching scheme is: how long should the cached information be held? There are at least two answers to this question:
Remove a cache entry some specific time after it has been installed, or some specific time after its last use to forward a packet; this is timer based.
Remove the oldest or most specific cache entries when the cache reaches some percentage of its capacity; this is capacity based.
Normally these are combined, with the first being the “normal” process for removing stale cache information, and the second used as a “safety valve” to prevent the cache from overflowing. Caches normally rely on the number of forwarding table entries in use being some small percentage of the reachable destinations. Generally, the rule of thumb is somewhere around 80/20—80% of the traffic will be directed at 20% of the destinations, or, in other situations, about 20% of the total reachable destinations will need to be stored at any given time.
There are a number of problems designers face when caching forwarding information in this way. Figure 19-11 is used to illustrate one interesting failure mode.
Figure 19-11 An interesting cache failure mode
In Figure 19-11, E has 100 hosts attached; at the same time, C and D can support 70 entries in their forwarding table and will start removing items from cache when their forwarding table is 80% full (so when the cache reaches 56 entries, the caching algorithm begins removing the oldest entries to bring the cache under some number of total entries, say 50 for the purposes of this example). Assume caching is taking place at the individual destination IP address level, rather than at the subnet level (the reason for this will be explained in a following example). The situation that caching solutions normally assume is that A will communicate with a limited number of the 100 possible destinations at once. If A builds sessions with 20 of these destination devices for one minute, then another 20 the next minute, and so on, the cache can be “tuned” to carry information about any particular reachable destination for just a few seconds after its last use.
The worst possible case, from a caching perspective, is that A attempts to communicate with all 100 reachable hosts at once, or the cache timers are set long enough to cause every one of these destinations to remain in the cache at all times. Two problems are going to develop in this case. First, the cache at B is going to overflow. When B receives a packet that triggers caching of the 57th destination, it will begin removing older cache entries in order to protect the cache from failing entirely. The flow dependent on the removed cache entries will, of course, continue sending packets (or perhaps reset, and begin sending packets again), again causing the cache to reach the 57th entry, and hence the oldest entries to be removed again. This is a straightforward problem, easily detected, even if it is not easily mitigated.
Second, the caches at C and D are likely to develop problems. It is possible to build a stable system if B splits the load perfectly between C and D. However, this is rarely going to happen in real life. Instead, what is likely to happen at A is, at best, a 60/40 split; so traffic sent by B toward 40 of the destinations is sent to C, while traffic sent by A toward the other 60 destinations is sent toward D. The result is the cache on D overflows (there would need to be 60 cache entries, which is more than the 56 allowed by the caching algorithm), causing D to start removing cache entries. The removal of this caching information will cause the session to reset, as well.
The cache churn at B, C, and D can easily develop into a positive feedback loop, where dropped packets and sessions cause a refactoring of where traffic flows in the network, in turn causing different caches to overflow, in turn (again) causing dropped packets and session resets. There are few ways to resolve this sort of problem other than the obvious ones: increase the cache size, or reduce the number of concurrent flows through the network.
One apparently obvious answer—caching to the subnet level, rather than individual hosts—will not work. Figure 19-12 is used to explain why this will not work.
Figure 19-12 Caching to the subnet level
Figure 19-12 shows two networks: one (the upper) labeled before and the other (the lower) labeled after. Assume B, C, D, and E cache to the subnet of the destination, rather than the individual host information. What happens in this network is
A sends a packet to 2001:db8:3e8:101::1.
B receives this packet and discovers (through some mechanism—it does not matter what this mechanism is) that the destination is reachable through C and D.
B determines (perhaps based on load sharing) that the traffic should travel through C; it builds a cache entry toward 2001:db8:3e8:100::/60 through C in its local forwarding table.
A now sends a packet to 2001:db8:3e8:100::1.
B forwards this traffic along the path toward 100::/60, so the traffic is sent to C, then forwarded to E, where it is dropped.
Why does E drop this traffic? The packet destined to 100::1 “lives” in two different network address spaces: the 100::/60 and the 100::/64. E knows about the 100::/60 address space, so it should know about every reachable destination in this space. Because E believes it knows about every destination in this address space, there is no reason for E to ask any of its neighbors about 100:1; it should already know about this specific destination. This destination, however, is connected to D, so there is no way for E to have 100::1 in its local forwarding table. In effect, E believes it knows 100::1, as an individual host, does not exist, so it will drop any traffic destined to this address.
Because of this, A has no effective way to reach any device attached to 100::/64 network; it might be that when (or if) the cache entry times out at B, the next packet will happen to be for a destination within the 100::/64 network, causing the correct set of cache entries to be built at B. Whether or not this is likely to happen, it is never a good thing for control planes to have possible states, such as this one, where reachability is variable or unpredictable.
There are a number of ways this problem could be fixed, none of which appear to be deployable in the real world. For instance, you could dictate that every prefix in the network must have the same prefix length, but this would rule out aggregation, which is problematic.
Everyone in the modern world should know the value of slowing down sometimes—it can reduce information overload. It is no different for a control plane; slowing down the pace at which information is presented to a device does not really reduce the processing and memory requirements so much as spread them out over time. Another point in favor of slowing down state velocity is that it can allow multiple state changes to be “gathered,” or “bunched,” into a single processing cycle. Figure 19-13 illustrates these concepts.
Figure 19-13 Examples of slowing down state velocity
In Figure 19-13, timeline 1 illustrates the actual order in which the links between F and another router fail; [A,F] and [B,F] fail relatively close to one another, and the remaining links fail a bit farther apart (or spread out in time). In timeline 2, F waits to advertise the control plane state change for a fixed amount of time. Because of this delay between the event occurring and reporting the event, the failures of the [A,F] and [B,F] links are reported at the same time, or in the same update. This allows G to process both events at the same time, which (should) require less processor and memory resources.
Finally, in timeline 3, an exponential backoff timer is shown. Essentially, the first time an event occurs, a timer is set, and the event is reported after the timer has expired. In timeline 3, this timer is set to 0 seconds, so the event is reported immediately (a common configuration for exponential backoffs). Once the event has been reported, a separate timer is set that must expire (or wake up) before the next event can be reported. Each event occurring after this increases this timer exponentially, causing the reporting of events to be spread out over ever-increasing amounts of time.