Technology Support for Business Needs
In Chapter 3, "The Generic Provisioning Problem," we looked at the business needs and configuration information of different network environments. The business needs of the network environment drive the high-level policies in the environment, and the configuration information drives the low-level policies within the environment. We also looked at the structure of a generic policy management tool which can be used to translate the high-level policies into low-level policies.
Figure 3.1 showed the policy application matrix, which demonstrated the different business needs along the vertical axes and the different technologies along the horizontal axes. Chapter 3 discussed the appropriate policy specification along each of the axes. However, it did not discuss how to deploy the technologies (shown along the horizontal axis) in order to satisfy the business needs (shown along the vertical axis). The details of the deployment are discussed in this chapter.
Support of Business SLAs in the Enterprise Network
Within the enterprise environment, I use an example to show how different technologies can be used to support business SLAs. Consider a small grocery chain. The network consists of several campuses interconnected by a wide-area network. This environment is shown in Figure 4.1. One of the campuses (campus D) is the data center of the enterprise and hosts its application servers. Campuses A and B represent the outlets of the grocery chain that access the data center for its various transactions. Campus C is the administrative office that houses the various administrative and accounting departments. Two main types of applications are deployed in this environment. One application is used to look up the prices of different items as they are scanned for a customer purchasing the item. The other application is used to process credit card transactions. One form an SLA can take in this environment is that of assured response time for a given application. For example, all price lookups must complete within 100ms, and all credit card transactions must be completed within 500ms. Various types of office applications are deployed by the employees at campus C, but no specific SLAs are in place for these applications.
The following sections look at some of the ways such an SLA can be satisfied.
SLA Support Using Capacity Planning
The basic idea behind supporting SLAs using capacity planning is fairly simple: Provide enough link bandwidth and processing capacity that the SLA requirements are satisfied. If an SLA is feasible at all—that is, if it can be met on an unloaded network with unloaded servers—it should be possible to determine the link bandwidth and processing capacity that satisfied the performance objectives under normal operating environments.
The response time of an application in the network is determined by many factors, but the most important ones are the following:
The processing capacity at the routers used in the network
The link bandwidth on the wide-area network
The processing capacity of the server at the data center
The goal of capacity planning is to make sure that none of these three components are overloaded.
In order to ensure that overload does not occur, you need to measure and monitor the usage of the link bandwidth at the different links, as well as the server processing capability. Most routers export information in the format of SNMP MIB (see the overview of SNMP in Chapter 2, "IP Architecture Overview" for details) variables such that the number of packets and bytes transmitted over any link can be estimated. Most servers also provide information and logs that give an estimate of the transactions processed per second and the percentage utilization of the processor. You should also measure the performance of the actual applications to ensure that the SLA limits are being met.
In addition to measuring the current utilization of the various components, you need to estimate their utilization over the course of the next few months. If it is estimated that one of the components might become the bottleneck point with the estimated utilization, that component is replaced with a faster one.
If the load generated by the applications is reasonably stable, and it can be predicted with reasonable accuracy, capacity planning works well to ensure satisfactory operation of the network. In an enterprise environment, where the set of applications running on the network is limited, the traffic load is dependent on the number of installations. This number would be relatively stable. Under these conditions, planning for adequate capacity is usually the best approach.
Applying the concepts to the grocery store example, we need to predict the expected traffic and server load generated by the two key applications, credit card processing and price lookups, as well as the other applications that are expected to share the network or the servers. A network configuration is defined as the speed of the links, routers, and servers that are deployed in the network. If the number of customers expected at the various stores can be predicted accurately, the expected system response for any specific network configuration can be determined. You can then select a configuration where the desired SLA parameters can be satisfied.
However, there are some situations in which capacity planning fails to work properly. If the traffic load on the servers and the networks is very erratic, capacity planning is hard to do. Many capacity planning tools and algorithms depend on the expected mean rate of the traffic. If the variance of the traffic is too high (there are many fluctuations in the traffic load, so at any point in time, the existing load can differ significantly from the mean traffic), the values obtained from capacity planning tools become suspect. For example, in anticipation of a sudden snowstorm, the number of customers in the grocery store might increase dramatically, and the response times might degrade. Similarly, some event on the Web might cause many of the office employees to access the Internet, and the resulting traffic surge might affect the SLAs that are in place.
Similarly, capacity planning does not work well if the network traffic growth rate is very high. When you expect a modest growth rate in the capacity requirements, you can plan so that an installed network would operate properly for a long period of time, such as until the next year. However, if the traffic growth follows the pattern of the Internet traffic, which has been doubling itself every few months, the delays inherent in upgrading a network can make obsolete any capacity growth plans you have developed.
In cases where capacity planning fails to solve the problem, the techniques of rate control, DiffServ, or IntServ can be used to address SLA concerns.
SLA Support Using DiffServ Networks
The basic thought as far as SLA using DiffServ [KILKKI] is as follows:
You can't ensure specific performance levels for all applications when capacity is scarce; however, a preferred set of applications can have their performance objectives satisfied at the cost of other applications.
The bottleneck in capacity can be reached either in terms of network bandwidth or in terms of the processing capacity at the servers. If the capacity bottleneck is in the net-work, a DiffServ approach to networking can be enabled to protect the performance of the preferred key applications. If the capacity bottleneck is in the servers, a similar differentiation can be supported in the end-servers. The upcoming sidebar describes some of the techniques that are available for this purpose.
Using the DiffServ approach in the network, all the traffic in the network is divided into multiple classes. Each of these classes is marked with separate code points (see Chapter 2) by an access router. Then, the appropriately marked packets obtain a higher priority in the queuing that occurs at the router, or they get a proportionately larger share of the bandwidth.
The DiffServ architecture consists of core routers that process marked packets and edge routers that mark the packets according to their perceived priority. The core routers need to be told how much bandwidth to allocate to each type of packet marking. The edge routers need to be told how to mark the various packets as they see them in the network.
In the grocery store example, let's assume that the network is the bottleneck and that we need to preserve the performance of the two key applications—namely, price lookups and credit card processing. You can adopt the approach that the DiffServ network supports two classes of traffic on the network. If you prefer DiffServ jargon, we would deploy the class-selector PHBs in the network. One of these traffic classes is the one that is used for the two key applications, and it is given the higher priority queuing behavior in the network. The other one is used for the rest of the applications, and they are transmitted at a lower priority.
Note that the use of DiffServ networks protects the key applications from a surge in the traffic from the other applications. However, the capacity planning and prediction techniques for the key applications still need to be in place if their performance needs, as specified by the SLAs, are to be satisfied.
Operating Systems Differentiation Mechanisms
The ability to assign different priorities to executing processes is available in some form in most operating systems. In most operating systems, the assignment of pro-cesses/applications to a priority level is done automatically by the operating system on the basis of processor utilization, and so on. However, some level of user control is also permitted. UNIX systems permit the option to execute some processes at a priority different from the default one by using the nice command. Any user can use nice to lower the priority of his processes. The superuser can use nice to increase the priority of any process. A more sophisticated mechanism to differentiate among applications is found in the IBM mainframes. This feature, called Work Load Management (WLM) in OS/390, allows the system administrator to specify business objectives for specific applications. The system administrator can set goals for each application, which can be specified in terms of relative importance and performance goals. An example of a performance goal would be the target response time for completion of an application. Another type of performance goal specifies how often the task must be executed. This goal, which the OS/390 calls velocity, is defined for long-lived processes that need to be run periodically. For any application, the goals might be defined differently over different time periods during which the application is active.
As soon as the administrator has specified the performance goals, the system uses these to divide each of the processes into internal service classes. The service classes are associated with specific performance, and the system tracks the performance of each service class to see if they are meeting their goals. At 10-second intervals, the allocation of system resources (processor, disk I/O) to each of the service classes is updated on the basis of how close the class is to meeting its service goals.
A more detailed description of WLM describing its use in a cluster of OS/390 processors is found in the paper by Aman et al. [AMAN]. WLM is one of the few commercial general-purpose operating systems that offers a complex level of support for meeting a task's requirements. However, many experimental and operating systems provide support for different types of QoS.
The support of different priority levels is almost standard across embedded operating systems. These are lightweight operating systems that are typically used in small processors embedded within larger systems, such as routers, sensor controllers in industrial complexes, or in digital televisions, cars and other consumer equipment. These systems typically have to execute tasks in real-time. Different task priorities are used to ensure that the real-time constraints are met for the different functions that the embedded system is expected to do.
SLA Support Using IntServ Networks
The use of DiffServ in the network only ensures that a set of preferred applications receives better performance than the set of nonpreferred applications. However, if too many preferred applications are active, the relative priority ordering might not be adequate to deliver the required level of performance.
Continuing with our grocery store example, let's assume that a sudden anticipated winter storm forces several customers to stock up on the essentials. As a result, each chain has more checkout counters active than the average number active in the network. If the servers and network were designed for a smaller number of active checkout counters, the different requests from the various checkout terminals would cause the performance to degrade. This degradation would occur even if the grocery store managers ensured that none of the non-key applications were active in the network.
One way around this situation would be to reserve the capacity within the network (as well as the servers) prior to actually initiating any transactions in the network. Before a clerk starts to check out a customer's groceries, the system would make an RSVP exchange to make sure that there is adequate capacity in the network to ensure a satisfactory performance. If the RSVP request succeeds, the transaction will complete in a reasonable amount of time, and the count of satisfied customers will increase.
The tricky thing would be dealing with transactions when the RSVP reservations fail within the network. In these cases, the reservations could be re-tried in the hope that they would eventually succeed. Of course, a customer would have to wait while these attempts are made. An alternative would be to let the transaction proceed without the reservation, where delays are unpredictable. Either of these situations would probably result in an unhappy customer. It is hard to say which one is the better option overall.
If there would be a subset of unhappy customers, is the reservation process really helping anyone? With the reservation process, at least some subset of all transactions is getting carried out with a reasonable overall performance. Without any reservation, chances are high that most of the transactions would experience some performance degradation, and you run the risk of making all customers unhappy.