The Road to Effective Capacity Management
Of all of the best practices recommended by the I/T Infrastructure Library (ITIL), capacity management has the most promise for immediately saving your organization money. Effective capacity management can easily save 20-30 percent of your current hardware and software budget. So why do so few organizations manage capacity effectively? Unfortunately, this seems to be a discipline that was lost to the IT industry in our conversion from mainframes to distributed servers.|
Fortunately, capacity management is making a comeback due to world economic conditions and the need for every organization to eliminate cost wherever possible. IT organizations are under pressure to make fewer purchases while increasing the utilization of the hardware and software they already have. This article describes how your organization can manage capacity effectively to keep IT costs to a minimum.
Good capacity management won’t happen all at once. There are incremental steps you must take to grow this important capability. Each step will bring you closer to maturity, but more importantly, each step will help you save money.
Step One: Define Capacity Pools
The first step in any project is to define your scope. The scope of your capacity management program is defined by the set of resource types you wish to manage. If you want to manage disk space, that is one capacity pool. Want to manage the number of VM Ware guests? That is another pool. Each category of item you wish to manage must be listed as a separate capacity pool.
Of course you’ll quickly realize that you should not extend your scope beyond those capacity pools that will have value to your organization. Just because you can manage the pool of mainframe logical partitions (LPARs) doesn’t mean you should. The pool doesn’t change often, and the planning involved in creating a new LPAR is usually complex enough that the acquisition of new capacity doesn’t slow the plan. You need to understand your organization well enough to manage only those things that return real value.
Step Two: Start Gathering Trend Data
After you’ve determined what you will manage, it is time to understand how you will manage it. In the best case, you already have some kind of tooling that can gather significant capacity statistics and forward them to a central database. For example, if you are running Solaris servers, you might have some scripts that execute the iostat command and gather the data it provides into a centralized server. This kind of tooling will help you gather the capacity data you need.
You need to find some way to collect technical level details for each capacity pool. Disk allocation and utilization can be gathered from the SAN management tools. Server CPU and memory utilization might come from various operating system utilities or from your monitoring tool set. Network packet rates and utilization numbers can come from a network management solution. They key is to get data on each capacity pool and to ensure that the data flows to a centralized location.
Once you have data, you can start to observe the trends in the data. For each capacity pool, document the trend over time so you can understand whether utilization of that pool is growing or shrinking.
Step Three: Make Predictions Using Trends
Once you have good trend data for each of your capacity pools, you are well on the way to a solid capacity management program. The next key step is to start using trend data to make predictions. It is one thing to say that the server utilization has been growing at 3 percent per month, but quite another to step out and predict that the server will be out of capacity in nine months at the current rate.
While predications can certainly made with simple mathematics, it is much more useful to make predictions based on a combination of the past history and knowledge of what might happen in your IT environment. For example, even if your disk space is only growing at a rate of 2 percent per month, you might know that a major new program is beginning, which will need a significant number of new databases that will take much more storage. So your prediction for storage should not simply be based on 2 percent each month, but also on your knowledge that the new program will make that rate jump to 6 percent for the next three months. The better your ability is to predict future resource utilization, the more valuable your capacity management program will be.
Step Four: Scale Up and Scale Out
Making accurate predictions is important, but capacity managers really start to earn their money when they can do analysis to indicate what should be done with the predictions. If you know a server will run out of memory in four months, the logical question will be how much more memory you should add and how long that additional memory will last before you run out again. This is the science of scaling your resources.
IT architects like to talk about “scaling up” and “scaling out”. Scaling up generally means adding more resources within an environment. You scale up when you add more memory to a server, bandwidth to a network link, or disk drives to a SAN array. Scaling out generally means duplicating part of the environment to bring in additional resources. You scale out by adding another server, another network link, or another SAN array. These are the tools that a capacity manager has to increase the capacity of a system, and each technique impacts the way you make predictions about future capacity utilization.
Learning when you scale up and when to scale out takes experience, but over time you will develop the ability to use both of these techniques when appropriate and be able to predict the future behavior of a system based on which scaling model you choose to implement. This kind of prediction enables you to know exactly how much capacity to add, how to add it and, most importantly, when it is really needed.
Step Five: Establish the CMIS
If you’ve followed the road through the first four steps, you now have a lot of data about each of your capacity pools and you may be wondering how to organize all of this data into something useful. ITIL answers this question with the capacity management information system or CMIS. The CMIS should become your single source of truth for raw capacity readings, summarized capacity trends, and predictions of future utilization. By establishing this CMIS, you allow others outside the capacity management team to start using capacity data effectively to make business decisions.
As an example, before an effective CMIS is implemented, an IT portfolio planner may have to get access to the server monitoring tool, the network management console, and the disk array management tool to get enough raw data to determine whether the a particular application will require new hardware in order to add users. Most portfolio planners or project managers don’t have the necessary skills to find this data in multiple sources and then make use of it to formulate a business decision.
With a good CMIS in place, however, that portfolio planner can see exact dates when the current environment will run out of disk space, server capacity, and network bandwidth if nothing at all is done. They can then use this data to make an informed decision about what will happen if more users drive up transaction rates and utilization of the infrastructure.
Step Six: Build and Manage Capacity Plans
Once effective data is available throughout your organization, you will find people asking for analysis of that data. The analysis of the data, along with a set of recommendations for what to do about the data, makes up a capacity plan. Typically, a capacity plan is built for important applications, IT services, and even infrastructure components. The goal of a capacity plan is to avoid surprises such as running out of capacity at an inopportune moment but also to avoid purchasing and deploying more capacity than is needed.
A capacity plan typically consists of a summary of the known data trends, predictions on where the trend is going, and recommendations of what needs to be done when. For example, a capacity plan might say that a network link is using 8 percent more bandwidth each month and in six months will be completely saturated. The recommendation might be to reroute some network traffic, increase the bandwidth of the link, or implement a second, parallel link and balance the load between them. This kind of detailed capacity plan can then flow into the budgeting or portfolio management project to recommend projects that need to be initiated.
Step Seven: Grow Into Service Capacity Management
Thus far all of your steps have been about managing what ITIL calls component capacity. That is, you are managing discrete IT things at the lowest level. As your ability to effectively manage components improves, you might want to start managing IT services instead. An IT service is a complex set of components that creates business value which can be consumed by the organization outside of IT. For example, a single server can be a component, but the set of servers and applications that support product design might together be grouped into an IT service.
Measuring utilization, tracking trends, and predicting future utilization is fairly easy at a component level, but becomes much more complex for an IT service. You need to consider how various elements of the service contribute to the overall capacity of the service. For example, if you have two web servers, one database server and one application server all sharing a common network, how can you determine the capacity of the overall service that all of these components support? When one component is completely full, the service can be said to be out of capacity, even though other components could still take on additional workload.
Service capacity management is a challenge in many parts of the IT industry today. The most successful techniques are using mathematical models to combine the capacity statistics for the various components making up a service. A lack of quality tools in this area has caused many organizations to decide that true service capacity management is too expensive to implement.
Step Eight: Strive for Business Capacity Management
Finally, you might want to consider business capacity management. Just as a set of components can make up an IT service, so too can a set of IT services combine to provide a business service. If you have effective management of the IT services, you can probably use similar techniques to track and report on the capacity of the business service. It is a very powerful tool when IT can approach the VP of manufacturing with real data about the capacity of IT to support that business service.
Because it is abstracted two layers above what most tools can directly measure today, business capacity management is considered a nirvana that many organizations will never obtain. Granted, there is a lot of maturity that must be obtained in the component and IT service capacity management spaces before business capacity management can even be attempted. But if your journey takes you past effective management of the basic components and you are serious about becoming a service aligned IT organization, the benefits of business capacity management might well be worth the effort it takes to achieve.
Summary
Starting from the very basics, I’ve described how to build an effective ITIL-aligned capacity management program. The journey isn’t easy, but each step will add incremental value as you avoid early purchases, retire unused equipment, and enable critical projects to have the capacity they need to proceed on schedule. Capacity management is talked about in almost every organization, but only a few have a solid, systematic program to realize its ultimate potential. Hopefully now your organization can join that elite group.