- Transport Network Failures and Their Impacts
- Survivability Principles from the Ground Up
- Physical Layer Survivability Measures
- Survivability at the Transmission System Layer
- Logical Layer Survivability Schemes
- Service Layer Survivability Schemes
- Comparative Advantages of Different Layers for Survivability
- Measures of Outage and Survivability Performance
- Measures of Network Survivability
- Network Reliability
- Expected Loss of Traffic and of Connectivity
3.14 Expected Loss of Traffic and of Connectivity
Let us now return to look at the two most commonly used ROF-type measures of survivability. These are the expected annual loss of traffic (ELT) [ArPa00] and the annual expected downtime of connection (AEDC) [T1A193]. An appreciation of these measures is aided by the background above on availability and network reliability. ELT is like a traffic-weighted path availability and AEDC is a certain type of two-terminal network reliability measure. Both of these are path-oriented measures in that they pertain to a transport signal path between a stipulated pair of nodes. The path orientation follows the philosophy of using hypothetical reference paths (HRP) for various performance assessments. This type of measure allows comparisons between network designs and/or various restoration techniques to be made on the basis of one or a few path models of specific relevance and that are agreed by stakeholders to be representative of either worst or average cases. In principle the corresponding path measures can be computed for all possible end-node pairs to obtain statistics of all individual paths in the network. A path in this context is constituted by a demand unit between nodes i,j that is a contiguous transmission signal.
3.14.1 Expected Loss of Traffic (ELT)
For a given pair of nodes exchanging demand over possibly several paths through a network, ELT asks what the expected number of lost demand-minutes will be over a year. The total demand between a node pair need not follow a single route. The total demand between i and j, denoted dij may be realized by routing over a set of diverse routes P as long as . One way of implementing the ELT calculation is then:
where is the amount of (i,j) demand assigned to the pth route and if span k is in the pth route for demand pair (i,j), and zero otherwise. The constant Mo = 5.26x105 gives ELT units of demand-minutes/year of "traffic" loss. Thus ELT is the sum of the demand-weighted unavailability of each distinct (not disjoint) path employed to bear the total demand between nodes i,j. In the inner sum every node and span in the network is indexed by k and the routing function answers the 1 / 0 question: is network element k a constituent of the pth diverse route used for demands between i and j? If so, the unavailability of that element, Uk, contributes to the ELT. The benefit of ELT beyond a single path availability analysis is that it reflects the size and number of demand units affected, allowing apples-versus-apples comparison across alternatives involving different signal levels, diversity routing and/or restoration techniques. Note that Equation 3.38 has some of the same numerical approximations and assumptions as mentioned above for availability analysis. In particular it strictly overestimates the total by virtue of the addition of unavailabilities for each distinct path as if they were independent.
Also, as written, the ELT formula is most applicable to "passive" point-to-point transmission networks where the network elements in each path directly contribute their true Uk values to the total (although the Uk for a transmission span may include the built-in benefit of co-routed APS against internal failures). In a network that embodies active restoration mechanisms and designed-in spare capacity, however, the Uk values should be those already reflecting any net benefit in equivalent unavailability terms of the designed-in survivability methods. For instance in a span-restorable mesh network, the Uk values for spans could be the equivalent unavailability of spans defined in Chapter 8. Alternately the native path availabilities could be presented but a simple extension to Equation 3.38 used so that unavailability is not contributed to the sum if either a path or its known backup are available. This implicitly recognizes the switch-over from working to protection that happens to avoid "lost traffic" in a survivable network with active restoration or protection. Ultimately, however, if the details of modeling the effects of active restoration measures into the framework of Equation 3.38 become unmanageable, Equation 3.38 still guides how one would approach the calculation of ELT by simulation.
3.14.2 Annual Expected Downtime of Connection (AEDC)
AEDC is a Random Occurrence of Failure (ROF) measures of survivability defined in [T1A193]. AEDC is more like a network reliability measure. It asks how often during a year would one expect total disconnection of all the paths for communication between nodes i and j. [T1A193] states:
"Connection between two nodes is lost if, for all paths, there exists no working or protection channels able to carry the demand. The consequences of losing connection between two nodes can be more serious than the consequences of losing an equivalent amount of traffic throughout the network without losing connectivity. Loss of connectivity can lead to the loss of important emergency and high priority traffic or create a situation of isolation."
Thus AEDC specifically relates to the two-terminal reliability of a network. The main difference is that classical network reliability addresses whether there exists any topologically possible route between nodes, but the intent with AEDC is to consider details or limitations of the actual restoration mechanism and network capacity constraints that would be involved in determining how many, individual prefailure paths between i and j could feasibly be restored. The point is that while the graph may remain connected it is possible that specific rerouting mechanisms may be starved of capacity or constrained in routing so that they cannot emulate the routing generality in classical network reliability. It is safe to say, however, that the two-terminal reliability of the network graph (multiplied by the number of minutes in a year) would be a lower bound on the AEDC (in minutes).
A more exact evaluation of the AEDC can also be conducted with the availability analysis methods above. The approach is as follows for a given node pair:
Identify all distinct routes between the nodes which would be eligible for use for either the normal working path or a protection or restoration path.
Represent the set of distinct routes as a series-parallel availability block diagram.
Apply series-parallel availability reductions to the current availability block diagram.
When step 3 halts, but the reduction is not complete, select a cross-over element in the block diagram and apply a conditional decomposition.
Repeat steps 3-4 until the block diagram is completely reduced.
The AEDC is the resulting unavailability value times the number of minutes in a year.
The basic process is illustrated in Figure 3-25 for a small network where the AEDC for nodes A to D is calculated. In the availability block diagram the dashed block names (such as node B') represent the unavailability of the network node and the physical span leading up to it. i.e., B' represents node B and link A-B as a single block. A way to develop the availability block diagram (c) from the network diagram (a) is to start from the list all distinct routes. The first route becomes the pure series path model seen at the top (A B C D) in Figure 3-25(c). If the next route listed was fully disjoint from the first then it too is drawn in full, in perfect parallelism to the first route represented. More generally where the routes are not disjoint, one tries to draw the path in parallel but has to obey a rule that if an element has been drawn already it cannot be drawn again. Thus, for the second route listed we drop down from A to represent E and F, then return to up to D. The third route above also adds no new elements, just the vertical link between rows joining the output of E to the input of C. Note this link is in effect unidirectional in that it represents the route option going from E to C but does not imply a corresponding direct linkage between B and F. The last route lays down the link from the joining the output of C back to the input of E to represent the route (ABC)-(EFD).
Figure 3-25. Example using availability analysis to estimate AEDC.
As the example shows, the primary complication is that the "diverse" paths between A and D are not disjoint paths. There is thus a correlation of failures affecting each path. This is true for ELT as well, but ELT is an expected sum of demand weighted outage on all paths, a probability union type of construct which is numerically insensitive to this at typical Uk values. In contrast AEDC is a question of probability intersection. That is, all paths down defines the condition for i,j "disconnection." In this case any common elements among one or more paths can drastically alter the probability of all paths failing together. Consider for instance a case of |P|=5 but all paths sharing on node in common (span disjoint but not node disjoint). Then the AEDC would be almost literally just the Uk of the one node common to all paths. The general case of the paths not being fully disjoint does not lend itself to direct analytical expression. Rather, the methods of series-parallel reduction and decomposition can be used to address the question and any specific limitations to the rerouting capability of the restoration method are reflected in the set of distinct routes represented.
If a set of fully disjoint paths between (i,j) is identified (or span disjoint paths where node failures are not being considered) then the AEDC can be lower-bounded (i.e., an optimistic bound) as:
This is "exact" (i.e., except for the series addition of elemental unavailabilities on each path) for the fraction of time during the year that the set of all mutually disjoint paths between i and j would all be down simultaneously.