Home > Articles > Software Development & Management

  • Print
  • + Share This
From the author of

Nightmare Incidents with Disaster-Recovery Plans

During my 20 years of managing and consulting on IT infrastructures, I have experienced directly—or indirectly through individuals with whom I have worked—a number of nightmarish incidents involving disaster recovery. Some are humorous, some are head-scratching, and some are just plain bizarre. In all cases, they totally undermined what would have been a successful recovery from either a real or simulated disaster. Fortunately, no single client or employer with whom I was associated ever experienced more than any two of these, but in their eyes even one was unacceptable. These incidents, listed below, illustrate how critical the planning, preparation, and performance of the disaster-recovery plan really is.

  • Backup tapes have no data on them.

  • Restore process has never been tested, and eventually was found not to work.

  • Restore tapes are mislabeled.

  • Restore tapes cannot be found.

  • Offsite tape supplier has not been paid and cannot retrieve tapes.

  • Graveyard-shift operator does not know how to contact recovery service.

  • Recovery service to a classified defense program is not cleared.

  • Recovery service to a classified defense program is cleared, but individual personnel are not cleared.

  • Operator cannot fit tape canister onto the plane.

  • Tape canisters are mislabeled.

The first four incidents all involve the handling of the backup tapes required to restore copies of data rendered inaccessible or damaged by a disaster. Verifying that the backup and—more importantly—the restore process is completing successfully should be one of the first requirements of any disaster-recovery program. While most shops verify the backup portion of the process, more than a handful don't test that the restore process also works. Labels and locations can also cause problems when tapes are marked or stored improperly.

Although rare, I did know of a client who was denied retrieval of a tape because the offsite tape-storage supplier had not been paid in months. Fortunately , it was not during a critical recovery. Communication to, documentation of, and training of all shifts on the proper recovery procedures are a necessity. Third-shift graveyard operators often receive the least of these due to their off hours and higher than normal turnover. These operators especially need to know who to call and how to contact offsite recovery services.

Classified environments can present their own brand of recovery nightmares. One of my classified clients had applied for a security clearance for its offsite tape-storage supplier and had begun using the service prior to the clearance being granted. When the client's military customer found out, the tapes were confiscated. In a related issue, a separate defense contractor cleared its offsite vendor to a secured program but failed to clear the one individual who worked nights when a tape was requested for retrieval. The unclassified worker could not retrieve the classified tape that night, delaying the retrieval of the tape and the restoration of the data for at least a day.

The last two incidents involve tape canisters used during a full dry-run test of restoring and running critical applications at a remote hot site 3,000 miles away. The airline in question had just changed its policy of carry-on baggage, preventing the canisters from staying in the presence of the recovery team. Making matters worse was the fact that they were mislabeled, causing over six hours of restore time to be lost. The lesson-learned debriefing had much to talk about during its marathon postmortem session.

  • + Share This
  • 🔖 Save To Your Account