2.3 Predictable Edge-to-Edge Behavior
As noted earlier, any end-to-end service is constructed from both the concatenation and layering of edge-to-edge and per-hop behaviors. Network operators, focusing on the edge-to-edge capabilities of the networks under their control, have a range of possible per-hop behaviors to mix and match together. Over the years a number of solution spaces have emerged, each one reflecting a different set of assumptions and compromises with respect to the CQS and routing capabilities of routers within the network.
The first and most important observation is that network designers face a trade-off between the number of traffic classes carried by their networks and the number of traffic classes that their router's CQS architectures can handle. A number of solutions are based on distributed edge-and-core architectures, where the cores are fast routers with limited CQS capabilities and the edges are slower but with more advanced CQS capabilities.
A second observation is that the Internet's existing shortest-path routing algorithms are not necessarily optimal for different classes of traffic across an arbitrary mesh of routers and links. A single metric may not be appropriate for all traffic traversing a particular section of the network. In addition, the destination-based forwarding paradigm itself makes it difficult to force subsets of available traffic into following alternative, non-shortest paths across any given network topology.
2.3.1 Edge-and-Core Models
Whether in hardware or software, the design of a good CQS architecture is generally nontrivial. In many software implementations, tight processing budgets make classification, queue management, and scheduling difficult to introduce without affecting the overall peak performance of the box. Hardware implementations have only just started to become commonplaceand until recently the development of a CQS implementation for IP locked into hardware was too commercially risky.
The edge-and-core model allows core routers to leverage hardware implementations (for speed), while leaving complex (but slower) processing to software-based edge routers. The edge routers might be able to classify and independently queue hundreds or thousands of traffic classes, whereas the core routers are assumed to be limited to a handful of queues.
Limited numbers of queues in core routers leads to a new requirement that edge routers be able to smooth out the burstiness of traffic entering the network. In the preceding discussion of per-hop QoS control, individual traffic classes were permitted to be completely unpredictable on the assumption that you could accurately isolate and reschedule them at every potential congestion point. However, although a smart edge/dumb core model may have the requisite isolation granularity at the edges, it does not in the core. Multiple traffic classes will find themselves aggregated into shared queues within the core routers. The potential for unpredictable mutual interference is high unless the network imposes some level of predictability before the traffic reaches the core routers. The solution is for edge routers to manipulate the temporal characteristics of individual traffic classes (and, hence, the aggregate of those traffic classes) before they enter the core. The Internet Engineering Task Force (IETF) Differentiated Services model is one example, and it will be discussed in Chapter 4, "Edge-to-Edge Network Models."
Shaping and Policing
The primary focus of a CQS architecture is the protection of traffic in each queue from the burstiness of traffic in another queue. On a per-hop basis, it is clear that, given appropriate isolation of all QoS-sensitive traffic into distinct queues, a scheduler needs to guarantee only a certain worst-case servicing interval (or minimum bandwidth). If spare capacity is available, you might expect "good" scheduler behavior to allocate that capacity to any queues having packets waiting to be forwarded. However, this practice is not always desirable from a network-wide perspective.
Simply emptying a queue as fast as the line rate allows (in the absence of traffic in other queues) can increase the burstiness perceived by routers further downstream. As a result, a serious problem can develop if the downstream routers do not differentiate traffic with as much granularity as the local router does. In addition, service providers may wish to cap the maximum rate that a customer can send packets through the network. If the customer frequently gets significantly better bandwidth than the guaranteed minimum (perhaps because the network is new and/or under loaded), a perception issue surfaces: The customer begins to associate the typical performance with what he or she is paying for. If the spare capacity ever shrinks, the customer will receive edge-to-edge performance closer to the guaranteed minimum. However, the customer simply perceives the service to have degraded and is likely to complain loudly. Managing customer expectations is an important part of running a business, and in this case preemptive rate capping is one of the technology-based tools that may be employed.
Placing an upper bound on the maximum bandwidth (or minimum inter-packet interval) available to a traffic class is known as "traffic shaping." A shaping scheduler is configured to provide both a minimum service interval (the time between pulling packets from the same queue) and a maximum service interval (to guarantee the latency bound or minimum bandwidth). Packets arriving with a shorter inter-packet interval than allowed by the scheduler are queued until transmissionsmoothing out the original burstiness. Figure 2.6 shows a scheduler that never samples the top queue more frequently than once every T secondsno matter how closely bunched up the packets arrive, they are transmitted with at least T seconds between them. (A simple form of shaping scheduler is sometimes referred to as a "leaky bucket," because no matter how fast packets arrive they can only "leak out" at a fixed rate.)
Figure 2.6 Shaping requires a minimum scheduler time on certain queues.
A real-world example of shaping can be seen in some freeway entrance ramps. In California, for example, some entrance ramps are equipped with stop/go lights that alternate every few seconds. The net effect is to allow cars onto the freeway with a known minimum time interval between themregardless of how bursty the car arrival times may have been to the entrance ramp itself. This input traffic shaping improves the capability of cars already on the freeway to interleave with the new traffic.
Shaping is not a simple function to introduce into a Best Effort router because this function presumes the existence of an appropriate CQS architecture. Although not quite so elegant, an alternative solution has been to introduce a packet-dropping behavior that is sensitive to excess burstiness of a traffic class. When too many packets arrive in too short an interval, packets are simply dropped. This process is known as policing.
Policing can be implemented without queues or schedulers, although it typically needs some form of classification to differentiate between the policing rules imposed on different traffic classes. In its simplest form, each traffic class has an associated counter. The counter is incremented regularly every T seconds and decremented whenever a packet (belonging to the counter's class) is forwarded. If a packet arrives to be transmitted when the counter is zero, the packet is dropped instead. When no packets are being transmitted, the counter increments up to a fixed limit L. The net effect is that a packet stream arriving with an average inter-packet interval of T seconds (or greater) passes through untouched. However, if a burst of more than L packets arrive in less than T seconds, the counter reaches zero and extra packets are dropped. The value of L affects the burst tolerance of the policing function, and T sets the rate below which traffic is safe. This practice is a severe, yet effective, way to modify the burstiness of traffic downstream from the policing router!
The utility of policing is based on the assumption that most bursty traffic originates from applications using adaptive end-to-end transport protocols such as TCP. Packet loss is assumed to indicate transient congestion, and TCP reacts by slowing down the rate at which it injects packets into the network. Policing allows the network operator to fake the existence of transient congestion for a particular traffic class before it actually begins to occur further along the packet's path. Even if the traffic class is not using an adaptive end-to-end transport protocol, policing protects the rest of the network by continuing to drop packets that exceed the allowed parameters.
Both shaping and policing are extremely useful tools for network designers who face a trade-off between the number of traffic classes carried by their networks and the number of traffic classes their network's CQS architectures can handle. The basic issue is that individual traffic classes can be permitted to be unpredictable only if you can accurately isolate them at every potential congestion point. If you lack that isolation capability, you must attempt to impose some level of predictability prior to the potential congestion point. In the smart edge/dumb core model, the solution is for each edge router to preemptively shape and/or police the individual traffic classes before they enter the core to impose some overall order, smoothness, and predictability within each traffic class (and, hence, the aggregate of those traffic classes). Shaping may also be useful on the egress from a network in situations where the next network's aggressive policing would be otherwise detrimental.
Marking and Reordering
Although shaping can be a sophisticated solution to smoothing out bursty traffic, simple policing is a blunt instrument. A number of variations have been introduced to soften the effect of edge-router policing. A policing node may choose to only "mark" packets (rather than discarding them immediately) if they exceed a burstiness threshold. Routers further along the path recognize these marked packets as having a lower priority than unmarked packets. If transient congestion begins to fill the queues in a downstream core router, its queue management algorithm can begin dropping marked packets before it begins dropping unmarked packets.
As an additional refinement, the original policing node may implement a staggered set of burst thresholdsif a packet burst exceeds the lower threshold, subsequent packets are marked and transmitted; if the burst continues and exceeds a higher threshold, packets are dropped. Alternatively, the policing node might implement multiple levels of "allowed" average packet arrival ratesa lower rate below which packets are forwarded unmarked, an intermediate range of rates within which packets are marked and forwarded, and an upper threshold above which packets are dropped.
The impact on the core of the network is "softer" than would be achieved by a simple policing because many of the packets in the burst will have been marked instead of dropped. The advantage of such a scheme is that, in the absence of other network congestion in the core, this particular traffic class can utilize more of the available bandwidth.
Many algorithms can be invented to provide multiple marking levels and threshold calculations. However, network designers who plan on using edge marking of traffic also need to carefully choose their core routers. The main point of concern is potential reordering of marked packets relative to unmarked packets within a traffic class. This situation can happen if the core router uses two separate queues to differentiate between marked and unmarked packets in the same traffic class (see Figure 2.7). Because marked packets are of "lower priority," an implementation might choose to effect this relative priority by assigning more scheduler bandwidth to the queue of unmarked packets than for the queue of marked packets.
As a consequence a marked packet arriving before an unmarked packet in the same traffic class may find itself scheduled for transmission after the unmarked packet (or vice-versa). Assuming the marked packet makes it all the way to the other end, the receiving application perceives the traffic to contain out-of-order packets.
Although the IP specifications do not preclude packets being reordered by the network, this practice should be avoided because most end-to-end protocols do not handle this case efficiently. In networks where marking is intended to increase a packet's drop probability, the solution is not too difficult. Let the core router initially ignore the policing marker when classifying packets into queues, ensuring all packets in a traffic class are placed in one queue regardless of drop priority. Then modify the packet drop threshold for that queue on the basis of whether the packet is marked or not. The core router's packet-dropping algorithm, thus, activates more aggressively for marked packets, achieving the desired edge-to-edge behavior.
Figure 2.7 Separate queues for marked and unmarked packets can lead to reordering.
2.3.2 Edge-to-Edge Routing
No particular restrictions dictate how routers and links are interconnected to form an IP network. As discussed in Chapter 1, the Internet's shortest-path routing mechanisms are based on the assumption that a network's topology is rarely static and must be tracked dynamically. In any realistic network, each router may have more than one output interface over which it could send a packetthe role of routing protocols is to establish a single interface over which a packet should be sent. To make the calculations tractable, the choice of appropriate interface has largely been driven by algorithms using only a single metric to define the shortest path.
However, two general concerns have been raised with this approach when it comes to supporting QoS. First is the argument that a single metric may not be appropriate for all traffic traversing a particular section of the network. Second, the destination-based forwarding paradigm itself makes it difficult to force subsets of available traffic into following alternative, non-shortest paths across any given network topology.
QoS-based routing protocols attempt to take multiple metrics into account when building the network's forwarding tables. These protocols have been studied for years and often begin with an assumption that the network is built from conventional Best Effort IP routers. Starting from this assumption, single-metric routing is seen to have a number of limitations when attempting to meet the mixed QoS demands of a multiservice environment.
A metric can be considered a type of cost with each link (hop) having a cost associated with it. The routing protocols attempt to find paths with minimal total cost summed over all the links to possible destinations. However, this cost cannot represent the interests and needs of all traffic types. Should it represent the link's latency, its available bandwidth, its packet loss probability, or perhaps the actual expense of sending packets over the link? Pick one. You will end up with some traffic finding the choice appropriate, whereas for other traffic the choice is wasteful of resources.
For example, consider a network where latency is the metric. Certainly the shortest path now suits applications with tight real-time requirements. But they are not alone. The network is most likely also being used by traditional, bursty data applications that care significantly less about latency. The traffic from these other applications also follows the minimum latency shortest paths, adding to the load on the Best Effort routers along the path. An unfortunate side effect is that the bursty traffic consumes the same buffer space being used by the real-time traffic, increasing the jitter and average latencies experienced by all traffic through the routers. This approach also affects the accuracy of the latency costs that the routing protocols use to determine the shortest paths.
QoS-based routing creates multiple shortest-paths trees, covering the same actual topology of routers and links with each tree using different combinations of parameters as link metrics. The goal is to minimize unnecessary coexistence within routers of traffic with widely different QoS requirements. Packets with strict latency requirements are then forwarded by using the tree built with latency as a metric. Packet's with non-real-time requirements might have a different tree built (for example, to minimize the financial cost of the path). Several practical issues exist with implementation of QoS-based routing:
Each router needs to have multiple forwarding tables (or their functional equivalent) on which to perform each packet's destination-based next-hop lookup, one for each type of shortest-path tree. Additional fields in the packet header are used to select one of the possible next hops associated with the packet's destination address. This situation complicates the design of the next-hop lookup engine.
An increase in routing protocol overhead occurs because the router's CPU must support an instance of each protocol for each unique shortest-path tree. This requirement causes an increase in the time it takes for a network of such routers to converge after a transient in the network topology (for example, when links come or go or their costs somehow change). The convergence time increases further if the routing protocol is being asked to calculate trees based on multiple metrics simultaneously.
Metrics such as latency or available bandwidth are highly dependent on the actual traffic flowing across the network. A shortest-path tree built with statically configured latency values could become outdated when traffic begins to flow across the network. The alternative, of updating each link's cost with regular real-time measurements, poses a real control-theory nightmareevery cost update would result in a recalculation of the associated shortest-path tree, leading to continual processing load on all the routers.
Interestingly, the development of routers with CQS architectures somewhat reduces the need for QoS-based routing. For example, consider the example that uses latency as a cost metric. Now consider that every router has at least two queues per output interfaceone for latency sensitive traffic and the other for all remaining traffic. All traffic is routed along lowest-latency paths. Assuming the routers appropriately classify traffic into the two queues, the service received by latency-sensitive traffic is independent of the burstiness of all other traffic types. Arguably then, any conventional, single-metric IP routing protocol, when coupled with routers based on a CQS architecture, can support multiple levels of service differentiation. The main caveat here is that sufficient capacity exists along the single tree to provide adequate service to all participants.
Explicit Path Control
The internal topologies of many networks are such that multiple paths can be found between most points. A major limitation of conventional IP forwarding is that single- metric, shortest-path trees use only one of the possible paths toward any given destination. Because lightly loaded alternative paths are not utilized, routers that exist on the shortest-path trees for many different network destinations can be subjected to high-average loadthey become hot spots, potentially limiting the capability of the network to provide adequate service differentiation even if the router itself has a CQS architecture.
As the average load on a hot-spot router rises, the probability of random packet losses and jitter increases. Although this observation is most evident for networks containing regular Best Effort routers, it also holds true (albeit to a lesser degree) when the network consists of multiple queue routers. To combat this problem, a network operator has two alternatives:
Upgrade the routers and links to operate faster
Utilize additional packet-forwarding mechanisms that allow the traffic to be split across alternative paths (some of which may be just as short as the "official" shortest path and others that may be longer according to the prevailing metric)
When the network itself is built from cheap, low- to middle-bandwidth technology, the former approach may be entirely suitable. This description is most likely going to apply to enterprise environments where traffic growth has outpaced the deployed technology and a successor technology is easily deployable (for example, a 10Mbit per second Ethernet environment, where the upgrade to 100Mbit per second or 1Gbit per second Ethernet solutions are available).
Simply upgrading equipment and/or links may not be an option when your network is already pushing the limits of available technology. High-performance IP backbones have this problemtheir routers are usually pushed hard to support OC-12 and OC-48 rate interfaces, and to buy or provision such circuits across today's traditional carrier infrastructures presents a serious problem. In addition, although prices are dropping for OC-12 and OC-48 circuits, they remain an expensive resource.
A preferable alternative is to build the equivalent aggregate capacity through parallelisman IP topology rich in routers and lower-speed links across which the aggregate load can be distributed. Overriding shortest-path routing to more optimally utilize the underlying infrastructure of routers and links is often referred to as traffic engineering. Figure 2.8 shows a simplistic example.
Figure 2.8 Overriding a shortest-path route to balance the load.
Access networks A1 and A2 both source traffic to destination D, which is reachable through Access network A3. A3 has two attachment points to the IP backbone, through R6 and R5. Conventional IP forwarding would cause packets from both A1 and A2 to converge to the same forwarding path at interior/core router R3, and be forwarded to R6 (because the path is shorter than R3[→]R4[→]R5). A good way to reduce the average load on R6 is to force some portion of the load (for example, the packets coming from network A1) to follow the R3[→]R4[→]R5 path instead. A network operator may also want to override shortest-path forwarding for policy reasons (for example, the external link between R6 and A3 may have been funded solely by A2 and A3, and therefore A1's traffic must not be allowed to traverse it).
Traffic engineering through explicit path control is an important part of any solution to providing QoS, although the main impact is on the overall efficiency of the network itself, rather than directly impinging on the end-users. This approach also raises an interesting routing questionhaving discarded the information being provided by the existing IP routing protocols, network operators need to supply an external source of information to control the traffic-engineered routing within their networks.
Explicit path control can be achieved in a variety of ways, either avoiding or permuting every router's conventional destination-based forwarding decision. The methods available at the IP level include
Strict and loose source routing options
Forwarding tables with lookup on the destination address and other fields in the IP packet header
Multiprotocol Label Switching (MPLS)
In theory, an IP packet can have optional header fields added that specify (either explicitly or approximately) the sequence of routers through which the packet must pass on its way to the destination. However, most routers do not efficiently process packets carrying such optional header fields (the peak performance "fast path" through a router is typically optimized for packets having no additional headers). Packets with optional headers are processed in a parallel "slow path"making this a poor choice if consistent QoS control is desired.
A slightly more feasible method is for the forwarding table to be constructed with regard not only for where the packet is going but also for where it has come. In this manner, it becomes possible to return different next-hop information for the same destination address just by taking the source address into account. However, this approach works only for a very constrained set of topologies and traffic-engineering scenarios. It is also expensive in terms of memory space in the forwarding tables.
Traffic Engineering with IP Tunnels
IP-IP tunneling forces the desired traffic patterns through the use of logical links. An IP packet is tunneled by placing it into the payload of another IP packet (the tunneling packet, as shown in Figure 2.9), which is then transmitted toward the desired tunnel endpoint. When the tunneling packet reaches its destination, the tunnel endpoint extracts the original IP packet and forwards it as though it had arrived over a regular interface.
Figure 2.9 Packet encapsulation for IP-IP tunneling.
Taking the network example in Figure 2.8, R1 would be configured to encapsulate traffic for D inside tunneling packets addressed to R5, and R2 would be configured to encapsulate traffic for D inside tunneling packets addressed to R6. Figure 2.10 shows the effective topology resulting from this arrangement.
Figure 2.10 Traffic engineering with IP-IP tunneling.
Several problems exist with this solution.
Routers do not necessarily perform tunneling encapsulation and decapsulation in their "fast path"this can be a major performance hit at the tunnel endpoints.
The tunneling encapsulation (shown in Figure 2.9) adds overhead to each packet, reducing the Maximum Transmission Unit (MTU) that can be supported by the virtual link represented by the tunnel if fragmentation within the tunnel is to be avoided.
The effective traffic engineering is very coarsean IP-IP tunnel only allows control over the tunneled packet's final destination (the egress routers R5 or R6 in this example). The tunneling packet takes the shortest path across the backbone to R5 or R6 as appropriate.
Traffic Engineering with Label-Switched Paths
Multiprotocol Label Switching (MPLS) is discussed in some detail later in this book, but it is worth noting here that the primary role of MPLS for service providers is traffic engineering. MPLS is a connection-oriented form of IP networkingpackets have labels added and are forwarded along preconstructed label-switched paths (LSPs) by routers modified to switch MPLS frames (label-switching routers, LSRs).
LSPs can mimic the IP-IP tunnels in Figure 2.10one LSP between R1 and R5, and another LSP between R2 and R6. R1 would be configured to label all traffic for D with the label corresponding to the first hop of the LSP from R1 to R5. R2 would be configured to label all traffic for D with the label corresponding to the first hop of the LSP from R2 to R6.
The effective topology in Figure 2.11 between the edge LSRs is identical to that in Figure 2.10, but the solutions are different. First, the overhead per packet is reduced (an MPLS header is 4 bytes, compared to 20+ bytes for a complete encapsulating IP header). Second, the packet's actual hop-by-hop path within the backbone is under the control of the network operator when the LSP is established.
Figure 2.11 Traffic engineering with explicitly routed label-switched paths.
LSP and ATM VC are similar in many ways. Backbone operators who use ATM to transport their wide area IP traffic already utilize explicitly routed permanent virtual connections (PVCs) between the edges of their ATM networks and rely on the edge routers to map the correct traffic onto the appropriate PVCs. For many service providers, the move to MPLS is simply a generalization of ATM but with variable-length packets instead of fixed-length cells.