Emergency Boot and Recovery
The events that orchestrate a proper, clean boot sequence are numerous. Proper daily care of your server prevents many of the potential pitfalls. Murphy’s Law, which states that if anything can go wrong, it will at the most inopportune time, certainly applies to the computing environment. Before this happens, you should have a plan.
Backups and disaster recovery will be covered in depth in Chapter 10. Before recovery can happen, you will need a bootable system and access to the devices. In most cases, you will need to get access to the system to investigate the cause of the outage before you worry about possible recovery.
The simplest cases are usually self-inflicted. For example, a kernel name was mistyped in the Grub configuration file (see Figure 3.2) . Because you can edit this information at boot time, you can correct the information, boot the system, and edit the proper files to ensure the correction is made permanent. Though GRUB is gaining in popularity, LILO is not subject to this type of misconfiguration. With LILO, the kernel file must be accessible in order for it to be added to the map file. This eliminates the chance an invalid kernel will be presented at boot time.
Figure 3.2 Mistyped kernel name in the GRUB configuration file.
In more difficult cases, something has happened to impede the normal boot process, and the server cannot invoke the normal boot loader. More commonly, the boot process does not complete successfully. In these instances, a separate boot environment is required. The YaST configuration tool can be used to create a rescue floppy, module floppies for nonstandard drivers, as well as a set of standard SUSE boot floppies.
It is strongly suggested that once a system is properly configured, a rescue floppy is created. The recovery process for your server will depend on it. Every time a significant change is made to your system configuration, it is important to create a new rescue floppy. It is a good idea to retain the original should your new configuration prove problematic.
Similarly, if your configuration requires specialized drivers or additional modules to function properly, you will need to make floppy versions of them as well.
In the event of a disaster, booting from the distribution DVD/CD or the YaST boot floppies should be possible. During the boot process, you are prompted for the type of installation you would like to perform. One of the options presented is to rescue the system. Once initiated, the rescue system prompts for the rescue disk and additional module disks as required.
At the conclusion of this phase, access to the disk subsystem of the server should be accessible. Investigating the cause of the failure can now begin. In the simplest case, you can get into the appropriate configuration file and fix your error. In most cases, however, things tend not to be so simple.
Boot failures on static, stable systems are caused by hardware failures. It is beyond the scope of this chapter, and possibly this book, to explore the various causes of a system outage. Chapter 10 will explore in more detail the recovery process.