Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Becoming a DBA
- DBA Levels
- Becoming a Data Professional
- SQL Server Professional Development Plan, Part 1
- SQL Server Professional Development Plan, Part 2
- SQL Server Professional Development Plan, Part 3
- Evaluating Technical Options
- System Sizing
- Creating a Disaster Recovery Plan
- Anatomy of a Disaster (Response Plan)
- Database Troubleshooting
- Conducting an Effective Code Review
- Developing an Exit Strategy
- Data Retention Strategy
- Keeping Your DBA/Developer Job in Troubled Times
- The SQL Server Runbook
- Creating and Maintaining a SQL Server Configuration History, Part 1
- Creating and Maintaining a SQL Server Configuration History, Part 2
- Creating an Application Profile, Part 1
- Creating an Application Profile, Part 2
- How to Attend a Technical Conference
- Tips for Maximizing Your IT Budget This Year
- The Importance of Blue-Sky Planning
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Creating a Disaster Recovery Plan
Last updated Mar 28, 2003.
As I write this tutorial the United States is still in the throes of cleaning up after one of the most devastating natural disasters to strike our shores — hurricane Katrina. A category five hurricane, it slammed into the Gulf States (Mississippi, Alabama and Louisiana) causing massive damage. Watching the scenes of destruction on my TV caused me to think about the many storms my own state (Florida) has faced, and how we've learned to cope over the years with uncertain weather. Over time we have learned, sometimes the hard way, that a disaster is just that — especially when you lose more than you thought you would.
One of the main criticisms regarding the response for Katrina from the city, state and federal governments was their lack of planning, even when it became apparent that the storm was headed towards a vulnerable area.
That's what I will cover today — planning. Specifically, I'll explain what you need to do to create an effective Disaster Recovery Plan. As the DBA in your shop, you need to take front and center on this effort for your organization, since you have the responsibility of securing the organization's data. That responsibility doesn't stop with just having a recent backup.
Disaster response isn't something that you should do alone. You'll need to involve business representatives and the other members of the IT team. Chances are that your firm will already have such a plan in place, but if it doesn't, you should lead the charge.
Do not think because you are located in an area that doesn't see a lot of natural disasters that you are off the hook or that you aren't vulnerable. A good DR plan is like insurance: worthless until you need it, invaluable when you do. Often a DR exercise will expose other weaknesses in your IT infrastructure, and that's a good thing.
I've developed five overall steps to follow that you can use to create your DR plan, but this is just one method. In the Online Resources section below I'll point to a site that explains the COBRA method, another popular way of creating a plan.
The method I'll show you will encompass broad areas. If you're in a small firm your plan may be much less complex than those for a large enterprise. In the larger applications, make sure you break out the steps into larger tasks, and get input from everyone you can. Do this now — while the sun is shining.
Step One — Identification of Risk
The first part of creating a disaster recovery (also called disaster response) plan is to analyze the risks your company faces. These risks include not only natural disasters but man-made disasters and machine failures.
In this section of the plan you'll break out all of the assets that your company has. There is normally a document of this type already completed somewhere in your company, for tax or legal reasons. What that current list may not include are the Information Technology assets, above and beyond the physical computer assets. As part of the IT staff, you need to provide management with a general and specific list of what they are storing, and where they are storing it.
You'll want to make sure that you know what each application does, and if it is the source of record for that data. What this means is that the data for this application is stored only on this system, and couldn't be derived from any other source.
For instance, you may have your personal financial records stored on your home computer, but you could recreate some or all of that data from the bank, credit card companies and so forth. Losing that data might be painful, but by and large you could get it back. Follow the same decision process for your firm's data, and list where and how that is stored.
Now that you've identified all of the applications and their data locations, list the possible risks to that data. For instance, if your data is stored in a single location in your company, identify the risks to the room, building, area and state where the servers live.
To find these risks, you'll need to contact the power company to find out how probable a brown-out is and what the restart times for a disaster they face and so forth. Check the weather risks for an area, such as flood-plane levels, tornado frequencies, and other meteorological data. If you're in an area that is susceptible to earthquakes, include that as a risk as well.
How far is the building from a fire station? The police? Note that information, too.
Moving on to man-made disasters, detail the ways data could get lost because of human-based activities, such as a disgruntled worker hacking into the system or just negligence on the part of an operator who erases an entire year's worth of entries with a single click.
Now detail the hardware failures that you face. Servers and lines are physical objects, and they will eventually fail. You might be tempted to say "we've got that covered. I have backups." but don't do that — write them down anyway.
With the assets and risk to assets detailed, assign a probability to each of these risks. If you are in an earthquake area, for instance, but there hasn't been one for 200 years, the risk may be low — or maybe you feel you're due!
You should create these numbers with the other members of IT and the business representatives. The reason you want to rank the risks this way is that protecting against that risk is going to have a cost — so you want to cover the most probable events first.
Step Two — Determine Business Impact
Now that you have the assets listed, the possible risks and their probabilities, the next step is to explore how much they would affect your business if they happened. You're examining the worst case scenarios here, so once again you'll need to involve everyone as you write down the possible business effects of each risk coming true.
In this step, write the information down in a vacuum, assuming that you have no backups or other recovery methods. You're documenting what will happen to the business if that data goes away, not what you've done or will do to prevent that from happening. That comes later.
The organization won't always know what is affected if a certain system goes down. They will need you to do a dependency analysis, which is basically a data path for an operation.
For instance, a financial transaction might start out in the inventory control system, pass to the project planning system, on to the HR system, and finally to the financial system. Taken alone, each one of the principles in each area would only know or care about their own system, so it's up to you to explain what touches what. This is often an eye-opening experience for the business.
Step Three — Securing the Assets
Now you're on to the task of adding columns to your spreadsheet or other document that explain what you need to do to secure each of the areas affected by a risk. These include backups, sending tapes off-site, perhaps even relocating the data in case of emergency. Some of the plans I've worked on also have an expense ratio to show low-cost, medium or high-cost options, but I don't like doing this, because sometimes the business takes the lowest cost option regardless of the impact.
Everything can be secured — for a price. You have to take the risks and their probabilities and decide with your company what you can do with the budget you have. But know this right at the outset: it will cost money, and it will be overhead. But also know that not having the recovery methods in place could lead to your company's demise.
Step Four — Enabling Business Continuity
Many firms already have something similar to what I've described above, but stop at step three. That won't work, unless you plan to cease operations until the disaster is corrected. In the case of Katrina, years are predicted for some businesses. That's proven true here in Florida for some business that didn't plan.
You need to plan to continue the business. At one firm I worked for, we had a rotation of core business people and technical representatives from my staff that would relocate to New York to continue operations there in the case of an emergency. We had a reciprocal agreement with a firm there to be able to stage smaller servers and duplicate backup tape hardware in their building, and 50-100 of us would go there when a warning was given. We took tapes with us, and began to run parallel operations immediately. Just before the disaster was confirmed to be heading our way, we would transfer control to the northern site and run operations from there. That allowed our Tampa employees to evacuate and care for their families, but also allowed business to continue, albeit in an abbreviated form.
The point is that you need to provide an exit strategy so that after the disaster hits you know how your organization will survive.
Step Five — Running Audits
You're not through yet. Although it is very painful to do, you simply must practice the plan, as realistically as business permits. I can't emphasize this enough.
The first time we practiced our recovery plan at one company I worked with, it was an absolute circus. Almost nothing worked as planned. But we took notes, made adjustments, and ran the exercise again. By the third time, management was convinced we could keep the business running in a disaster.
Don't neglect this step. All that hard work you've done in creating your plan needs to be proved.
Good luck on creating your plans. Whatever you do, wherever you work, remember that there are a lot of people counting on their jobs, and your plan helps the business contribute to society by guaranteeing that they can continue in almost any catastrophe. Here's hoping you never have to use your plan.
Informit Articles and Sample Chapters
You can get a free chapter of the book "Disaster Recovery Planning: Preparing For The Unthinkable" by Jon Toigo here.
One of the better DR sites I've seen is here. It also explains the COBRA approach to creating your analysis. Warning — they also selling a product.