Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

The SQL Server Runbook

Last updated Mar 28, 2003.

In the "old days" of computing, meaning the days when mainframes ruled the earth, each of the shops I worked in set up a "Runbook." Our Runbooks were usually a set of three-ring binders containing a written history and record of the configuration and state of the system, and more importantly, instructions for running the systems (hence the name). We did this for a couple of reasons. For one, computers were so expensive, and the processes that depended on them were as well. Another reason was that we had an army of people working on the systems, around the clock. When someone came on-shift, they needed to know where the other person left off. Even if someone left the company, the next person could read the Runbook and know just what to do.

The Runbook helped us stay consistent, and it also helped us to avoid error. It was a little like a recipe — as long as you follow the recipe exactly, the meal turns out perfectly. Of course, that only works when the recipe is correct in the first place! But when we took our time, and documented the processes properly, we created a Runbook that helped us all get the job done.

It seems this practice has gone out of vogue. That's understandable, since much of the information we had to store in documentation is now inherent in the software. That means that the software guides you through many of the processes you need to keep the system running — you just follow whatever Wizards or property panels the graphical interface of the system provides, and everything will work as it should.

But in some cases that's a little more difficult to do. Any system that acts as a platform, such as an operating system, a mail service, and yes, a database system doesn't have a comprehensive Wizard that guides you through running it. That makes sense when you think about it, since a platform doesn't do one single thing — it gives you the ability to do lots of things. For instance, there is no "one way" to set up and run the maintenance required for all systems — it's highly dependent on what the system needs are.

Even in today's modern IT environment, your shop can benefit from a Runbook. Interestingly enough, I think they are still around — although in a different format. Web sites, like InformIT, have many processes, procedures, checklists and information that will help you run your IT systems. Walk into any IT professional's office, and you'll no doubt see lots of books on various platforms, including SQL Server. That doesn't even include the documentation that comes with each product, such as Books Online.

So what sets a Runbook apart? What might you put in a Runbook that you don't already have in these other sources?

Rather than just covering sample sections that I include in my Runbooks, it's probably better to discuss the kinds of things to include in one, and why. Sure, I'll mention some sections in a moment, but rather than just copy mine, you need to think about what you want to include, and most importantly why you would record it. You certainly don't need another task at work with all you have to do, so anything new you take on should help you be more efficient, not get in your way. Keep that in mind as you develop your Runbook.

The strict definition of a Runbook is that it includes information needed to run a system. I'll agree to that definition, but broaden it a bit to include sections that help do that. For instance, I include a section on the system's configuration, since knowing that information can help get the system back into shape should a problem arise.

And that is the driver for a Runbook. The point is that a Runbook will help with Business Continuity — a fancy term that just means making sure the business keeps going. If you don't have a Runbook, ask your IT manager about the Business Continuity plan. Often this is the same thing. While many shops have a Business Continuity plan, but most of them I've seen don't go far enough, at least for the IT shop.

So the first place to start when you're designing your Runbook is to decide what keeps your business running. There are some fairly basic things that keep all businesses running (from an IT perspective), so those certainly become sections in your guide, right at the beginning.

First, you need a building, power, connectivity and so on to house and run the hardware, and then you need the hardware to install software on. You'll need to configure the software, and then you need to start the processes that make it work. But that isn't all — you need to talk about how to stop the software, and how to restart it. You'll want to record the checklists you use to govern the system, and any calendar-based activities you schedule. Finally, you need to document how to handle special requests, and perhaps most importantly, how to recover from a disaster.

Let's take a look at these areas, which actually comprise my Runbook. I break them into three sections — Facilities, Systems and Processes. The first two, Facilities and Systems, and usually more historical in nature — they don't change that often. I still document them, as I mentioned, since I think that information is crucial to be able to recover a system.

Facilities

In my Runbook I document where the physical plant(s) of the business is/are. You might think, "well, of course I know where my business is! Why would I write that down?" Well, in the case of a fire or other emergency, it might not be you that calls the authorities. By documenting your layout, especially the rooms where the servers and wiring closets are, others can help you when a problem strikes.

I include in my Runbook how to access the physical plant. No, I don't document the room codes or key locations, just who has access to the rooms and how that is controlled.

I also document the power subsystems my servers and network equipment uses. I document the power path, and the power company, along with the phone numbers of those companies. This becomes important when there's an outage, since you might not have the Internet handy to look up the phone number to call! I document when the batteries in my UPS systems were purchased, when they need to be replaced, and where I (or someone else) can buy them.

Along with the power documentation, I make sure to include the same kind of connectivity information as well. I also include one more thing in addition to the contact information for the current Internet provider: the name and numbers of two rivals. Once when I couldn't get a network system back, I was able to leverage another vendor to switch me over in less time than the first one could fix the system.

Systems

I always define the servers I have in my system, including information about the physical setup, the replaceable components like memory and hard drive types and so on. A very important piece of information is the definition of the tape drives — this is huge, because if you ever lose the system to a catastrophe such as a fire or hurricane, you'll have tapes, but will you be able to find the hardware those tapes use?

I do the same with the software installed on the server. This is the part of the document that probably turns over the fastest — I document every service pack, hotfix or even driver updates I install. Although this changes often, it's pretty easy to document by running software.

Not only do I document the software and hardware, but I also document the configuration of each, from the BIOS to the database settings. Again, this makes it easy for the person following me to put things back in case of an emergency.

Processes

The most important part of the Runbook is the processes and procedures section. This is where you define all the steps you need to run the system. But where do you start with this? Surely everyone knows how to turn on a system, right? Certainly that doesn't need to be documented. Or does it?

Here's the rule of thumb: if you had to learn to do it once, you need to write it down. Is the power switch in a weird place? Does one service need to start before another?

I normally document everything anyone might need to know to stop, start, or restart a component or an entire system. This includes not only the steps you need to take on the server, but who do you need to notify on the business side, or in facilities.

Most of us have lots of checklists we use, and I have even published a few here at InformIT. Make sure you include those in your Runbook so that others can use them. Along with that you need to include any scheduled activities such as periodic backups, maintenance, and other calendar-based activities.

It's also important to think about and document how your organization handles special requests. Having buy-in from your boss and other bosses will help the poor junior DBA that is on call when a senior executive demands that they hand over sensitive data to someone else.

Of primary importance is the section on disaster recovery. I've got an entire set of articles here at InformIT that will help you do that, so make sure and check those out.

Don't forget to include your on-call lists and pager rotation. It might not be someone in your group that faces an issue, so having this information generally available is invaluable.

One final word here — the information in your Runbook can be stored physically or electronically. In either case, you need to control who has access to the information. Knowing the information inside would be very useful to a hacker, so make sure you protect it like you would your backup tapes.

InformIT Articles and Sample Chapters

ITIL is one way you can standardize your IT systems, and Runbooks play right into it. Read more here.

Books and eBooks

I didn't mention security here, since I don't always include it in my Runbooks. If you do, make sure you check out this free book section.

Online Resources

If my checklists aren't enough, here's another.