Home > Articles > Networking

  • Print
  • + Share This
This chapter is from the book

3.4 Survivability at the Transmission System Layer

Next up from the bottom in Table 3-2 is the "system layer," named in reference to the transmission systems found at this layer. This is the level at which, historically and prior to consideration of mesh-based survivability, almost all active measures to react against single-channel failures or cable cuts have been implemented. This includes mainly linear APS schemes, ring schemes, and the recent p-cycle technique, of which we give an overview in this section.

It is important to note that survivability techniques at the system layer also include basic equipment-level design redundancy. Built-in equipment redundancy includes dual power feed connections and power converters, usually dual maintenance and control processors and may often include 1+1 redundant high-speed optical transmit/receive interface cards as well. Standard design methods may also include error-correction coding, in-service bit-error-rate monitoring, loss of signal detectors, laser bias current monitors, and so on. These are all measures that increase the system availability by ensuring its ability to carry on performing its function in the face of faults that may arise within the system itself. This kind of designed-in redundancy is extensive in telecommunications equipment: redundant power supplies, processors, tape drives, frequency allocations, antennas, lasers, etc. All this goes in to achieve basic operational availability levels that are often matched only in space, military, and nuclear applications.

Generally system layer survivability schemes for combating cable cuts are protection schemes (as opposed to later restoration schemes). The main characteristics of a protection scheme is that the protection route and standby capacity are predefined and the mechanism is self-contained within the transmission system layer. It usually involves redirecting the composite optical line signal as a whole without processing or identifying any of its constituent tributaries. The distinction between protection and restoration is not wholly black and white, however, and we later refine the basic categorizations.

3.4.1 "Linear" Transmission Systems

Historic examples of linear transmission systems include point-to-point digital carrier systems operating at the DS-1 rate on twisted pair, up to DS-4 on coaxial cable, and PDH-based transmission systems operating at 12 and 24 x DS-3 rates (565 Mb/s and 1.2 Gb/s) per fiber using proprietary higher-than-DS3 framing structures. Such systems included a 1:N automatic protection switching (APS) subsystem to protect single-fiber or regenerator failures, but had no designed-in measures to combat a complete cable cut. SONET not only defined standards for the higher-than-DS3 line rates but also extended the capabilities of the transmission systems to include a "nested" 1:N APS configuration which allowed for add/drop multiplexing within the span of the protection channel. Effectively the one protection system would be shared by all the subtending add/drop segments by breaking into and out of the protection channel where needed, as opposed to switching end-to-end over to protection. [Wu92] provides more details. Using any of these so-called "linear" transmission systems (in contrast to later ring systems), the only way to withstand cable cuts was to use 1+1 APS and route the protection system over a physically diverse route. This would establish a "1+1 DP" APS arrangement, DP standing for diverse protection. 1+1 DP is effectively two parallel instances of the same linear transmission system.

3.4.2 Automatic Protection Switching (APS)

Let us now give meaning to the notations 1:1, 1+1, etc. which are widely used in the context of APS systems. 1+1 denotes a dedicated standby arrangement: one working system and a completely reserved backup system in which the transmit line signal is copied (called head-end bridging) and drives both signal paths. 1+1 DP implies that the protection channel is routed over physically diverse rights-of-way from the working system. The fastest possible switching speed is obtained with 1+1 because the receivers need only monitor both receive signal copies and switch from one to the other if either fails. The receiver switch-over is called "tail-end-transfer." 1:1 APS is like 1+1 but the transmit signal is not kept in a bridged state. The resources needed are the same as in 1+1, but 1:1 can allow other uses for the protection channel when not needed by the working channel. 1:1 operation can be of advantage in providing "extra traffic" services (below) or in maintenance and trouble testing because one of the two signal paths can be taken off line for intrusive testing, then the signal "force switched" and the other path separately tested. The switching spped of 1:1 APS is, however, slightly slower than 1+1 because the transmit signal is not bridged at all times and receiver detection of a working channel failure must be signalled back to the head-end to request the head-end bridge establishment to effect protection.

1:N APS denotes that N working systems share one standby "protection" system in an arrangement such as in Figure 3-5. The intent and ability of a 1:N APS system is only to protect against single channel or fiber failures by using a standby channel within the system itself. The standby need not be diverse routed because there is no ability or intent with 1:N APS to protect against a complete cable cut. The protection or spare channel system in 1:N APS is inherently shared since N > 1. In 1:N APS the receiving end of a failed channel detects the failure and checks if the spare span is available. If so, it signals to the other end of the system to request a head-end bridge of the failed channel onto the spare span. Return signaling confirms the number of the selected channel and once the spare-span receiver is in-lock at suitably low BER on the new spare-span signal, a tail-end transfer relay substitutes the spare span signal for the original working channel signal at the system output port. A converse head-end bridge and tail-end transfer is also set up for the other direction of transmission on the failed channel. In SONET this is implemented in a simple state-driven protocol using the K1-K2 overhead bytes on the protection channel. The same protocol also drives the SONET BLSR ring switching mechanism. Although the K1-K2 byte signaling protocol is simple, it is significant conceptually in that it stands as an alternate paradigm to explicit inter-processor data messaging to perform certain time-critical rerouting functions in the most robust and reliable real-time way possible. The distributed restoration algorithms (DRAs) mentioned later for span restoration (Chapter 5) use an extension of this form of signaling to realize generalized mesh rerouting as opposed to APS. The K1-K2 byte protocol involves two finite state machines at each end of the APS system, one associated with its transmit direction, for which it has head-end bridge responsibility, the other for its receive direction, for which it has tail-end transfer responsibility. If the spare span is not already in use, the protocol is:

03fig05.gifFigure 3-5. Head-end bridge and tail-end transfer functions illustrated in a 1:N APS system.

1:N APS Protocol

Tail-end role:
   {state = idle; event= receive failure on Ch x ;
   action = transmit "x" on K1 byte on spare span; (bridge request)
   next state: wait}
   {state = wait; event= receive "x"  on spare K2; (bridge confirm)
   action = tail-end transfer (substitute spare span output for Ch x system output);
   next state = protected}
   
   Head-end role:
   {state = idle; event= receive "x"  on spare span K1 byte;
   action = {set up head-end bridge for Ch x;
   transmit "x"  on K2 byte on spare span (bridge confirm);
   transmit "x"  on K1 byte on spare span (bridge request);}
   next state: wait}
   {state = wait; event= receive "x"  on spare K2 byte(bridge confirm);
   action = tail-end transfer (substitute spare span output for Ch x system output);
   next state = protected}
   

The extension of 1:N APS protection to k:N APS where k > 1 follows directly except that the protection control logic must then manage allocation of single channel failures on N working channels to k available protection channels.

A possibly confusing industry trend is to refer to shared mesh restoration schemes as achieving "m for n protection," and even denoting this m:n. In this context, however, people do not mean to suggest that m:n APS systems are literally being employed. Rather they are referring in general to the attribute that mesh networks achieve a certain characteristic sharing of protection capacity an overall network basis; that is to say that for every n units of working capacity there are m units of spare capacity (n >m) in the network on average. This will come to have more meaning as we look at mesh restoration schemes and capacity design; it will become apparent how such sharing of all protection channels occur over all working channels of the network without implying specifically established m:n APS subgroups or systems.

Note that if APS methods are intended to protect against cable cuts they have to be either 1+1 or 1:1 DP APS, not 1:N APS. 1:N APS is usually employed as a high-availability equipment level system design method to protect against internal failures of the system and all channels—the N working and one standby—are routed together. But protection against externally imposed failures requires 1+1 or 1:1 DP (or other schemes which follow). Typically therefore one finds equipment designs employing combinations of 1:1 APS on the high-speed "line" side (where the risk is of a complete cable cut) and something like 1:7 APS on lower-speed circuit packs or cross-office interconnection interfaces etc. (where the risk is primarily of a single electronics or connector failure). Figure 3-6 illustrates 1:N APS versus 1+1 DP APS.

03fig06.gifFigure 3-6. (a) 1:N (N=2) co-routed APS and (b) 1+1 diverse-protection (DP) APS.

3.4.3 Reversion

In a 1:N protection switching system, and more generally in restoration or protection of any kind using spare capacity sharing, the protected signal path must be returned to its normal working path (or another working path) following physical repair so that the protection capacity is accessible again for the next failure that might arise. This is usually done with care to minimize subsequent "hits" on the customer payload. To minimize such a reversion hit the usual procedure is to set up a head-end bridge to supply a copy of the signal to the repaired signal path. This does not interrupt the restored signal path. Tests are done while in the bridged state to validate the receive signal quality and then a tail-end transfer switch substitutes the repaired working path signal for the restored signal. A hit, usually only of 10 ms or so in duration, arises only due to the tail-end transfer. This is called a "bridge and roll" process.

In systems such as 1+1 APS or a UPSR, reversion is not actually necessary. If done at all it is only for administrative reasons. Generally in any mesh protection or restoration scheme reversion will be required after physical repair, because we always want to return the system to a state of readiness for a next failure. Since we are always using shared capacity for efficiency, this means returning signals to their pre failure routing. This practice also avoids the accumulation of longer-than-required working routes that would result from a non-reversion policy in a mesh restorable network. Unlike the failure that triggered a restoration event, however, reversion itself is a process whose timing and pace the network operator controls following physical repair and can be scheduled late at night and/or coordinated directly with the service users to minimize customer impact.

3.4.4 "Extra Traffic" Feature

1:N or 1:1 APS systems, including their extensions into BLSR rings (to follow shortly) support a practice called "extra traffic." This is a feature that allows the network operator to transport any other lower-priority traffic (in compatible format for the APS or ring's line-rate signal) over the protection channel. Extra traffic is bumped off if the APS or ring system switches to protect its own working channels.5 The ability to access the protection channels of a ring via the extra traffic inputs of an APS or ring terminal is later—in Section 11.7 (and Section 11.8)—employed a part of a strategy called "ring-mining" for ring to mesh (or ring to p-cycle) evolution.

3.4.5 AIS Concept

One of the features of transmission systems that helps isolate the location of failures is the generation of Alarm Inhibit Signal (AIS), also called Alarm Indication Signal. The idea is to suppress the propagation of the loss of valid signal in one section of a transmission system from setting off the alarms all the way down the rest of the path. OXC nodes can also perform AIS insertion. To illustrate the concept, consider two OXC nodes in a mesh network connected by a WDM transmission system with several OAs. If one of the OAs has a catastrophic failure, or if a cable cut occurs, the adjacent OXC nodes will register the physical loss of signal and insert an "AIS" on the surviving directions on the failed paths. AIS is a standardized dummy payload structure such as a fully framed SONET signal, but with "all-ones" payload that is easily recognized. As a dummy payload of proper framing, timing, power level, etc., it suppresses downstream alarms and indicates to each downstream node that the signal path has failed upstream but that another node has already realized this. AIS is relevant to survivability strategies in at least three regards:

  1. For span restoration AIS techniques ensure that only one pair of custodial nodes are activated.

  2. In end-to-end path-oriented survivability schemes, the appearance of AIS at path end nodes is what initiates restoration.

  3. In some strategies for mesh restoration, the appearance of AIS on a working channel anywhere in the network can indicate that the channel can be taken over as equivalent to spare capacity for restoration. This is part of the later concept of "stub release" in Chapter 6.

3.4.6 Hitless Switching

"Hitless switching" is a special technique that can be used in conjunction with 1+1 DP APS so that a cable cut would not even cause a single bit error. This is not often implemented in practice as the costs are high and seldom is such an extreme performance guarantee really required. However, hitless switching has often been specified in a "wish list" sense in at least the first drafts of the APS subsystem requirements in transmission system designs. More often it has been implemented on digital radio to hide the effects rather frequent 1+1 APS switching actions in combatting multipath fading. It is of interest to recognize this scheme, as it defines the ultimate quality of system-level hiding of physical layer disruptions.

The technique uses an adaptive delay buffer switched into the shorter path of the two (1+1) signal paths at each receiver. In conjunction with a suitably long masterframe alignment sequence, the receiver in Figure 3-7 frames on both incoming signals and adaptively delays the signal copy that is arriving with less propagation delay (A) to bit-align it in time with the later arriving signal copy (B). If signal A is considered the normal working signal and if the buffer delay is greater than the alarm detection time for a loss of signal on either signal feed, then it is possible to switch from signal A to signal B at the buffer output before damaged bits from A reach the output. Alternately an error checking code on each signal path can give a byte, column, or frame-by-frame level selection between delay aligned A and B outputs. Both of these schemes would be realizations of hitless switching in which not one bit error occurs during protection switching. The delay alignment process is called differential absolute delay equalization (DADE). DADE has been employed in digital radio systems to combat fading but seems not to have been used on fiber systems to date.

03fig07.gifFigure 3-7. The ultimate: 1+1 DP / DADE: "Hitless switching" (only one direction shown).

3.4.7 Unidirectional Path-Switched Rings (UPSR)

Ring-based transmission systems are an evolution from APS systems. It is perhaps easiest to see this with the UPSR which can be seen as packaging up of several logical 1+1 DP APS systems to share a common higher-speed transmission system. A drawback of "standalone" 1+1 DP APS is that the optical line signal (single-fiber OC-n or a DWDM waveband for instance) on each fiber is delivered to the other end as a complete unit, without the possibility to add or drop individual STS or wavelength channels at intermediate locations. Another issue is that unless a large point-to-point demand exists, a dedicated 1+1 APS transmission system may not be justifiable economically. We need some way to fill these large transmission capacities to exploit the economy of scale in cost vs. capacity for transmission. This is in essence what the UPSR [Bell95] does. To understand the role of the UPSR, imagine a number of nodes exchanging, say, single STS-3c demands. Conceptually one could serve each demand with its own 1+1 DP arrangement with OC-3 transmission systems. But if an OC-48 transmission system costs, say, only four times that of an OC-3 system, then it would be better to combine all the individual 1+1 DP requirements to take advantage of a single proportionately cheaper OC-48 transmission technology. This is precisely the idea of the unidirectional path-switched ring (UPSR).

Thus UPSRs comprise a number of logical 1+1 APS systems on a set of nodes aggregated onto a common closed-loop path that provides each with the disjoint A and B signal feeds for 1+1 DP APS operation. Nodes in a ring are connected by equal-capacity working and protection fibers (or fiber pairs) in a closed loop or cycle. The diverse route for every working system is the remaining part of the ring of which it is a member. Each unit-demand is transmitted in opposite directions around the ring on both the working and protection fibers. As in 1+1 APS, the receiving node independently selects the better of the two received signals and has no need for signaling to any other nodes. A UPSR also requires just two fibers. Nodes X,Y exchange a bidirectional working demand pair by virtue of X sending clockwise to Y and Y doing likewise, sending to X around the remainder of the same-direction ring. Thus, each bidirectional signal exchange (X to Y plus Y to X) completes a path around a whole unidirectional ring. ADM nodes are connected by a single working and protection fiber, each of which transmits the line signal in the opposite direction. (Really under UPSR the distinction between one fiber as working the other as protection is arbitrary and it is also a per-channel attribute, not an overall system attribute.) Figure 3-8 illustrates. Under normal conditions, the demand between pairs of nodes in the ring is transmitted on the working fiber in one direction around the ring. A copy of each demand is also transmitted on the protection fiber in the opposite direction. At the receiving node, a path selector continuously monitors the working and protection signals and switches from the working to the protection fiber when the working signal is lost or degraded. Protection switching decisions are made individually for each path rather than for the entire line. SONET standards call for a protection switching time less than 50 ms after detection of signal loss or degradation in the UPSR. This UPSR specification seems to be the general source of the belief that all rings give 50 ms switching notwithstanding that BLSR systems in general do not necessarily provide 50 ms switch times.

03fig08.jpgFigure 3-8. UPSR protection switching operation.

An important capacity-planning principle about the UPSR is that because (when considering both A and B signal feeds) the working signal for each demand pair is transmitted all the way around a UPSR, the total demand on any span is the sum of all the demands between all nodes on the ring. This implies that the UPSR line transmission rate must be greater than the sum of all demands served by the ring (regardless of the specific demand exchanged by each node-pair). That is:

Equation 3.1

03equ01.gif


where D is the demand matrix and dij is the demand from node i to node j. Conversely this means the total demand served cannot exceed the number of channels (i.e., time-slots or wavelengths) provided by the ring. SONET OC-48 and OC-192 UPSR rings have been fairly widely deployed for metro area customer access applications where the hubbed nature of the demand pattern on such a ring makes it basically as efficient as otherwise generally more efficient the BLSR. Recently the UPSR logical structure has been implemented in DWDM technology where each channel comprised a lightpath or a waveband. The optical version is called an Optical Path Protected Ring (OPPR). In Europe the UPSR logical structure is often called an SNCP-ring standing for subnetwork connection protection ring.

3.4.8 Bidirectional Line-Switched Rings (BLSR)

A more efficient arrangement under general demand patterns is obtained when a linear multi-point SONET 1:1 nested APS system is closed on itself. One then obtains what was initially called a shared protection ring (SPring) and now referred to in North America as the bidirectional line-switched ring (BLSR) and in Europe as multiplex-section protected ring (MSPRing). A basic reference documenting the BLSR is [Be1195b]. Unlike the UPSR which uses receive path selection, any line-switched ring protects demands by looping the entire working line signal back onto the protection fiber system at both nodes adjacent to a failure. This is a bit more like a 1:N APS in that access to the protection facility must be coordinated at both ends of the failure and signaling is involved. Unlike 1:N APS, however, the protection fiber system has to have equal capacity to the working system so as to be 100% restorable. In a 4-fiber bidirectional line switched ring (BLSR/4), a separate pair of bidirectional fibers is used for working and for protection. Working demands are not permanently bridged to the protection fiber. Instead, service is restored by looping back the working demand from the working fiber to the protection fiber at the nodes adjacent to the failed segment, as shown in Figure 3-9.

03fig09.gifFigure 3-9. BLSR protection switching operation.

A failed segment may include a span, a node, or several spans and nodes. The SONET K1, K2 line-level overhead bytes perform signaling in the SONET BLSR. Because the protection fiber passes through one or more intermediate nodes before reaching its destination, addressing is required to ensure that the APS message is recognized by the proper node and protection switching is initiated at the right pair of nodes. For this purpose, the SONET reserves four bits in the K1 byte for the destination node's ID and four bits in the K2 byte for the originating node's ID. Thus, the maximum number of nodes in a SONET BLSR is 16. A two-fiber bidirectional line switched ring (BLSR/2) operates in the same logical fashion as a BLSR/4 except over predefined working and protection channels groups on each bidirectional fiber pair. As an example which also shows the direct extension to DWDM, a logical BLSR/2 system using 12 wavelengths per fiber might define wavelengths 1 through 6 to protect wavelengths 7 through 12 on the reverse direction fiber. Standards for DWDM optical rings can be expected to emerge just as for SONET rings but the DWDM version of the BLSR has so far been called the optical shared protection ring (OSPR).

An advantage of BLSRs over UPSRs is that channels can be reused around the ring and the protection bandwidth is shared among all working span sections. Because a demand occupies a channel only between its entry and exit nodes, and is usually routed on the shortest path between nodes on the ring, the same channel can be reused for other demands on unused spans of the given path. Since the demands travel directly between the entry and exit nodes (and not all the way around the ring, as in a UPSR), the load on any one span is the sum of demands that are routed over that span. Thus, because the line rate of the BLSR is the same on all spans, the required capacity (i.e., the line transmission rate) of the working and protection rings (or channel groups in the case of a BLSR/2) must be:

Equation 3.2

03equ02.gif


wk is the total demand routed over span k based on the choice of bidirectional routing made for each demand, i.e., either clockwise or counterclockwise around the ring. For Equation 3.2, the indicator parameters 03inl01.gif allow calculation of these span-wise working capacity totals, which unlike the UPSR, may differ on each span. 03inl02.gif if the route chosen for demands on node pair (i,j) crosses span k, otherwise it is zero. Simply put, the capacity of the protection ring has to meet or exceed the largest total of the demand flow crossing any span of the ring. A consequence of this is that (unlike the UPSR) the total demand serving capacity of a BLSR is dependent on the demand pattern and the routing choices for each demand on the ring. In a pure-hubbed demand pattern, BLSR efficiency is no better than a UPSR. At the other extreme, the ideal (but essentially fictitious) pattern of demand for a BLSR is where all non-zero demands are exchanged only between adjacent nodes, and all exchange equal demand totals. Under these ideal conditions the BLSR reaches its best possible redundancy of 100%. In random mesh-like demand patterns BLSRs can be between 200 to 300% redundant (i.e., the ratio of total protection plus unused working span capacity to the total working capacity required using the shortest paths on the graph for the demands served).

Two key issues in planning and operating networks based on BLSR rings are the capacity-exhaustion of a span and the related concept of "stranded capacity." When a single span exhausts (i.e., reaches full working capacity utilization) the whole ring is in effect exhausted with respect to its ongoing usefulness to take up more growth. The condition of equality in Equation 3.2 is then reached on at least one span of the ring. At that point no further demands can be routed through the ring in a way that would cross that span. The side effect of this can be that other spans that have remaining working capacity which is "stranded."

For more details of both UPSR and BLSR as well as APS systems, in both SONET and ATM conexts, see [Wu92] [WuNo97].

3.4.9 Resilient Packet Rings (RPR)

With the dominance of IP packet data as the primary traffic type, a packet-oriented evolution of the BLSR has been developed [Cisc99]. When considering packet data flows, the channelized nature of the line capacity of SONET rings, where capacity units are parceled out to each node pair, restricts the maximum burst rate available to any one data connection on a rigidly channelized ring. It is also fairly management intensive to establish a full set of logical point-to-point OC-12 connections within, say an OC-192 BLSR ring. Data-centric applications may also want to have many more than 16 nodes on the ring and leave the total line transmission capacity available for sharing among all data sources. The 100% reservation for protection capacity is also seen as inefficient when data applications could be using that capacity as well, simply for added performance, during non-failure times. These are some of the motivations for the IEEE 802.17 initiative to develop a Resilient Packet Ring (RPR) standard.

An RPR is a kind of "hollowed out" 2-fiber BLSR. It uses OC-n, wavelength, dark fiber or other physical layer options for bidirectional transmission on each span and a line-level loopback mechanism for protection. It is "hollowed out" in the sense that circuit-like channelization of the line capacity is abandoned and the entire line-rate transmission capacity is available at each node for packet access. The BLSR attribute of capacity reuse on spans is also achieved by the RPR under a spatial reuse protocol (SRP). Under SRP, destination nodes strip packets off the ring rather than the source node following a full ring transit, as in prior token-ring and FDDI ring standards.

The nodes of an RPR are essentially routers that connect LANs, enterprise data centers, web server farms, etc. to the ring via an SRP Media Access Control (MAC) interface that terminates each bidirectional span. Figure 3-10 illustrates. The access routers can decide to send new packets onto the ring via either of its line transmit interfaces. This inserts new packets onto the ring in either the long or short directions to their destination so load leveling is facilitated. In the receive direction each MAC unit can either receive a packet destined for it or forward packets en route. Normally a received packet is stripped off the ring but for multicast it can be received and forwarded as well. A pass-through mode also allows express flows to be defined that are essentially invisible to the RPR node, passing directly through at the link layer.

03fig10.gifFigure 3-10. Resilient Packet Ring: a combined Layer 2–Layer 3 packet ring survivability technique (amenable to the over-subscription based capacity planning method of Chapter ).

Failures are handled by "wrapping" the ring at both nodes adjacent to the failure. This is conceptually the same as the BLSR's loopback mechanism but the wrapping happens by re-direction at the packet level, inboard of the MAC interfaces (rather than at the line signal level as in the BLSR). This allows protection status to be allocated selectively by priority or by other traffic attributes. Obviously, if both counter-rotating fibers are in use for traffic in normal times, the wrapping for protection imposes a sudden additional packet load on the surviving ring spans. Unlike a BLSR, the performance of an RPR under a fiber cut is therefore a matter of oversubscription based planning of protection capacity, a topic introduced in Chapter 7. At first instance, it appears that there could be up to 100% oversubscription of capacity in the wrapped state, but the packet-level congestion effects depend on the actual utilization of capacity at the failure time, the mix of protected and unprotected flows, and the adjustments that the SRP fairness protocol will make, as well as the backoff effects that user application protocols may undertake. RPR can thus exploit the soft degradation of services to avoid needing a 100% reservation of strictly unused protection capacity.

For the same reasons, RPR is not easily classified as purely a system layer survivability scheme. It uses a link-level packet loopback mechanism as well as service-level means to accomplish the overall recovery from failure. In reference to the classical data protocol layering it is therefore often referred to as a combined layer 2 and 3 (L2/3) scheme.

3.4.10 Ring Covers

A single BLSR ring is often said to be 100% redundant in the sense that each pair of bidirectional working fibers (or channels) is exactly matched by a protection fiber or channel pair. This simple definition is not entirely adequate because it disregards the fact that because of the ring construct asserted on the routing of demands, it may not be possible to usefully fill each span of such working fiber. In practice, when whole transport networks are designed with multiple interconnected rings, the total installed capacity is usually much more than two times the capacity needed only to route all demands via shortest paths over the graph (i.e., the standard working capacity in the sense of Section 1.5.3). Such networks are thus much more than 100% redundant as a whole. Three of the main reasons for this are:

  1. Demand routing has to follow ring-constrained paths, not shortest paths over the graph, so demands take longer routes in the first place than they otherwise would, even before their subsequent matching with 100% protection is considered.

  2. A set of rings that overlies each span with at most one ring only on each span is usually not possible. In other words, ring covers usually involve some span overlaps.

  3. When the working capacity on one span of a ring is filled, it blocks still available working capacity on other spans of the ring from being used for routing.

Unidirectional ring covers are a way of addressing the second of these contributing factors: the problem of span overlaps in trying to lay out a set of rings on the graph. A span overlap is a span whose working capacity could be handled by one ring alone but for purely topological layout reasons the solution requires two rings (each with their own protection) to both overlie that span. The inefficiency of such overlaps can be avoided using unidirectional rings instead of bidirectional ring covers. Formally the technique involves finding an oriented cycle double- cover (O-CDC) of the graph [ElHa00]. If we temporarily ignore the other two factors above which bear on the true redundancy of ring networks, the O-CDC principle can achieve ring-type network protection with exactly 100% redundancy in the restricted sense we mentioned of that being the simple ratio of working to protection fibers or channels over the network as a whole.

To explain this let us start by considering an ordinary (i.e., bidirectional) cycle cover, which essentially represents a ring network design based on the span coverage principle. The network demands are routed over working fibers on each span and an overlay of dedicated protection bidirectional fiber pairs, connected in cycles, "covers" each span. If that span fails, the working fiber pair is looped back onto the protection fiber pair, just as in a BLSR. Figure 3-11 shows an example. To cover every span with at least one BLSR ring (or more generally any type of bidirectional cycle which would include a UPSR), the two required cycles unavoidably overlap on span (B-C), in Figure 3-11(a). It is easy to see in general that anywhere an odd-degree node is involved, an ordinary bidirectional cycle cover (where the two directions are locked together on the same cycle) will not be possible without at least one span overlap. The problem with such overlaps is they lay down two working fibers and two protection fibers on a span where (as previously postulated) working demand flow requires at most one of each. With a single overlap, such a span is effectively 300% redundant instead of 100% redundant (in the simple sense of counting working to non-working fiber or channel ratios).

03fig11.gifFigure 3-11. Showing how oriented unidirectional cycle covers can avoid the span overlaps that occur in bidirectional cycle covers or BLSR multi-ring network designs.

The motivation behind directed or "oriented" cycle double covers is to at least improve the situation relative to conventional ring covers, by avoiding such overlaps in planning the cycle covers, so as to get to exactly 100% redundancy (in the sense above) over a network as a whole. Figure 3-11(b) shows how the overlap can be avoided if a set of three unidirectional cycles are used instead of two bidirectional cycles. The total capacity provisioned in (a) is 12 fiber-hops whereas in (b) it is only 10 because the double coverage of span (B-C) is avoided by the unidirectional cycles. Thus with no overlaps and exactly one protection link for each working link on each edge of the graph (or wavelength or waveband level as applicable) we arrive at a class of networks that are exactly 100% redundant in terms of protection to needed working fiber counts. If planned at the whole-fiber level, this gives rise to "4-fiber" networks which have exactly two working and two protection fibers on each span. The practical idea behind this is that with, say, up to 128 or 256 wavelengths per fiber it becomes possible to consider networks that are based entirely and uniformly on just four fibers per span. The 100% redundancy is not retained, however, if some spans need multiple fiber pairs, while others do not.

The key result in [ElHa00] is to show that with an oriented cycle double cover the resulting designs can be exactly 100% redundant at the fiber level. More detailed explanation is deferred to Chapter 10 where OCDCs are compared to p-cycles. For reference, other works on the problem of ring-covers include [GHS94], [KeNa97].

3.4.11 Generalized Loopback Networks

"Generalized loopback" networks (GLBN) were introduced in [MeBa02] and are conceptually related to OCDCs. As with OCDCs the basic idea is to eliminate the use of bidirectional rings or bidirectional cycle covers, while arriving at an overall design that is exactly 100% redundant at the fiber level (or waveband or wavelength level) on every span. Like OCDCs, a GLBN also assumes a uniform "4-fiber" logical span model: one bidirectional working fiber pair is assumed adequate for all capacity on each span, and a matching protection fiber pair is also provided on each span. Thus, each span level cross-section is 100% redundant in the same (limited) sense as we would say a 4-fiber BLSR is 100% redundant. The difference from OCDCs is that the protection and working fibers of a GLBN are not preconnected into (unidirectional or bidirectional) rings. Instead, a simple flooding-type protocol finds and cross-connects a single replacement path through the protection fibers upon failure. Figure 3-12 shows a small network on which it is impossible to avoid having at least one span-overlap under a bidirectional ring cover, and illustrates how a GLBN works without any predefined cyclic structures.

03fig12.jpgFigure 3-12. How generalized loopback works to avoid needing more than 100% redundancy in a 4-fiber span protection environment.

The idea is to divide the bidirectional flow of working demands over the basic graph into two directed graphs (digraphs) where each graph has only one directed working flow on each of its edges. To do this each direction must be assigned (appropriately, not arbitrarily) to one or the other working digraphs, as in Figure 3-12(b), where the digraphs are denoted "dashed" and "solid." Once this is done, a protection copy of each working digraph is identically defined. Each of the directed "primary" (i.e., working) graphs is then protected by the protection copy of the other primary digraph. For the failure in Figure 3-12(b) the node that was normally transmitting on the "dashed" working link, now loops back onto the "solid" protection digraph (Figure 3-12(c)), and vice-versa at the other end of the failed span (Figure 3-12(d)). The transmission from the nodes next to the failure is actually a flooding copy of the working signal into all outgoing fibers of the anti-directional protection digraph. Other nodes also flood but under a protocol that stems off arrivals of duplicate copies so that the single shortest replacement route in the directed protection graph is all that results.

Not every initial assignment of the two directional working links of each original bidirectional link into the two primary digraphs works to protect all failures, however. In fact the directional decomposition of Figure 3-12(b) works for the failure shown, but not for span (B-C). The key to assignment of the working link directions into the two primary digraphs is that each digraph must remain a connected graph, meaning that at least one directed path must exist between all nodes. In Figure 3-12(b) the "solid" digraph is not connected because node B has no path from itself to other nodes on "solid" edges. Medard et al. [MeBa02] give an algorithm for the assignment of directions that ensure the required properties based on finding a directed cycle that visits all nodes.

The result is a network with four fibers on every span with exactly two working and two protection fibers which are like the spans of a single 4-fiber BLSR in that every span consists of a bidirectional working fiber pair and a matching pair of backup fibers. Unlike a BLSR, however, protection inherently takes more generalized routes over the equal-capacity backup network, rather than being restricted to following a particular ring structure. Generalized loopback networks are thus in effect "BLSN"s, where "N" stands for network instead of ring and the efficiency they offer relative to a typical 4-fiber BLSR-based network is the removal of the overlapping ring spans that are usually unavoidable in ring-planning. In other words, exactly 100% matching of working and protection resources is achieved, but not worse.

Table 3-3 summarizes the schemes so far discussed, all of which are "ring-like," as defined by virtue of having 100% or higher redundancy. In Table 3-3 logical redundancy refers to the ratio of total non-working fibers or channels to working.

Table 3-3. Overview of ring-like schemes for network protection at the system layer

Logical Redundancy

Scheme or Principle

Logical Equivalences

Notes

> 100%

1+1 DP APS

 

Basic parallel hot standby redundancy model (head-end bridged)

> 100%

1:1 DP APS

UPSR, SNCP

Permits extra traffic on standby

> 100%

UPSR

OPPR, SNCP

Modularized assembly of 1+1 DP APS arrangements

> 100%

BLSR

OSPR, SRING, MSCP ring

Nested linear APS arranged in a closed loop

> 100%

FDDI

ULSR

Unidirectional ring LAN with "wrapping"

> 100%

Cycle cover

Protection fiber pair ring overlay

BLSRs of dark-fiber pairs protecting working fiber pairs

exactly 100%

Generalized Loopback

Unidirectional fiber-level span protection

Like directionally planned SR (without flow spreading) in 1+1 ("4-fiber") span model

exactly 100%

Oriented CDC

Unidirectional planning of shared dark-fiber rings

Assumes "4-fiber" network model

3.4.12 System Layer Protection Without 100% Redundancy: p-Cycles

p-Cycles provide another technique that is applicable at the system level using nodal devices that are counter-parts to the ADMs of conventional rings, but p-cycles break below the 100% "redundancy barrier" that characterizes other system level techniques. We introduce it here as a system level protection option, but due to its mesh-like capacity efficiency and its applicability at either logical or system layers, we later summarize it with the family of mesh-type schemes. p-Cycles seem to be a unique approach in the sense that they have applicability as a system layer solution, but are mesh-like, not ring-like, in their spare capacity requirements. It is the only system level protection technique that does not require a direct matching of working and protection fibers on each span and is typically well under 100% redundant in both logical and true redundancy measures (i.e., the "standard redundancy" of Section 1.5.3).

The easiest way initially to think of p-cycles is as a BLSR to which the protection of straddling failure spans is added. A straddling span is one that has its end-nodes on the p-cycle, but is not itself part of the p-cycle. Rather, it is like a chord on a circle. The usual ring-like loopback protection of spans on the ring itself still applies but are called "on-cycle" failures, to distinguish them from straddling span failures. Upon the failure of a straddling span the p-cycle (which is undamaged in such circumstances) provides two protection paths. The simplest and most important distinguishing feature of a p-cycle-based network, compared to rings or cycle covers of any kind and generalized loopback networks, is the protection of straddling spans which themselves may bear two units of protected working capacity for each unit of capacity on the p-cycles and require zero spare capacity on the same spans.

Figure 3-13 illustrates how a p-cycle provides single-span failure protection to a small network without covering all spans directly. Figure 3-13(a) shows the example network and we assume that all spans shown bear working capacity to be protected. Figure 3-13(b) shows how rings or any (conventional) fiber-level cycle cover can provide such protection, although we note in this example that it is impossible to use fewer than three simple cycles to do this. The particular ring cover shown is actually an efficient one because it matches up odd-degree nodes to share the same span overlaps (four odd-degree nodes are handled with two span overlaps). An O-CDC or GLBN solution can do better than the bidirectional cycle cover shown but will still be 100% redundant because each span failure is protected by a cycle in a ring-like way or by a directional loopback that forms a cycle with the failed span. In each case the protection is by a direct covering of cycles.

03fig13.jpgFigure 3-13. A network (a) with a (bidirectional) cycle cover or ring cover (b) in comparison to a single p-cycle providing the same protection against single-span failures (c).

But the same set of single-span failures can be protected by a single p-cycle, as in Figure 3-13(c). (Optimal p-cycle designs will not in general use only one cycle, although that is adequate using the Hamiltonian cycle shown in this example.) Should any of the spans on the p-cycle fail, the p-cycle acts just like a BLSR, protecting against on-cycle failures through loopback to protection on the same cycle. The failed signals reverse away from the break and go the other way around the cycle. If any of the three straddling spans shown fail, the same p-cycle is broken into at the end points of the straddling failure span and actually can provide a protection path in both directions around the p-cycle. For that reason, the efficiency of protecting straddling failures is twice that of an on-cycle failure.

In a sense the addition of straddling failure protection is only a minor technical variation on BLSR rings: Chapter 10 shows the nodal structure in more detail, but the nodal elements remain almost as simple as ring ADMs and the switching function needed is exactly that of the BLSR. Nonetheless, even in a network as simple as Figure 3-13(a) it can be seen that the difference in protection efficiency can be dramatic. The ring cover is assuredly over 100% redundancy, because it has three (unavoidable) span overlaps. The ring cover shown actually protects nine working hops with 5 + 4 + 3 = 12 ring protection hops. Purely on a fiber-count basis the cycle cover is then 12 / 9 = 133% redundant.

In contrast the p-cycle uses six hops of protection capacity, but protects up to 6 + 2(3) = 12 hops of working capacity making it only 50% redundant. The true efficiency relative to rings is actually higher than in this simple comparison because with p-cycles the working demands also go via shortest paths over the graph; only the protection structure itself is formed in a cycle. It has been shown in general that p-cycle based networks are essentially as efficient as span-restorable mesh networks to follow in Chapter 5 yet, as will be detailed further in Chapter 10, they require only ring-like, fully pre-configured switching actions known in advance of any failure. Chapter 10 is dedicated to p-cycle based network planning and implementation at either the system or logical layer where they can be hosted on OXC-nodes and dynamically reconfigured for demand pattern adaptation.

  • + Share This
  • 🔖 Save To Your Account