- Table of Contents
- Copyright
- About the Lead Authors
- About the Contributing Authors
- Acknowledgments
- Tell Us What You Think!
- Introduction
- I. Red Hat Linux Installation and User Services
- Chapter 1. Introduction to Red Hat Linux
- Chapter 2. Installation of Your Red Hat System
- Chapter 3. LILO and Other Boot Managers
- Chapter 4. Configuring the X Window System, Version 11
- Chapter 5. Window Managers
- Chapter 6. Connecting to the Internet
- Chapter 7. IRC, ICQ, and Chat Clients
- Chapter 8. Using Multimedia and Graphics Clients
- II. Configuring Services
- Chapter 9. System Startup and Shutdown
- Chapter 10. SMTP and Protocols
- Chapter 11. FTP
- Chapter 12. Apache Server
- Chapter 13. Internet News
- Chapter 14. Domain Name Service and Dynamic Host Configuration Protocol
- Chapter 15. NIS: Network Information Service
- Chapter 16. NFS: Network Filesystem
- Chapter 17. Samba
- III. System Administration and Management
- Chapter 18. Linux Filesystems, Disks, and Other Devices
- Chapter 19. Printing with Linux
- Chapter 20. TCP/IP Network Management
- Chapter 21. Linux System Administration
- Chapter 22. Backup and Restore
- Chapter 23. System Security
- IV. Red Hat Development and Productivity
- Chapter 24. Linux C/C++ Programming Tools
- Chapter 25. Shell Scripting
- Chapter 26. Automating Tasks
- Chapter 27. Configuring and Building Kernels
- Chapter 28. Emulators, Tools, and Window Clients
- V. Appendixes
- A. The Linux Documentation Project
- B. Top Linux Commands and Utilities
- C. The GNU General Public License
- D. Red Hat Linux RPM Package Listings
Recovering from Faulty Kernels
It happens. You execute an orderly shutdown and reboot, the monitor flashes (or your connection goes dead), and you wait for the boot, only to be greeted with a partial LILO prompt…or worse.
Typically, a faulty kernel will exhibit one of the following behaviors:
- The machine cycles through repeated rebooting.
- You see some substring of the LILO prompt, such as LIL- followed by a halt.
- Linux begins to load but halts at some point during the kernel messages.
- Linux loads but ends in a kernel panic message.
- Linux loads, runs, lets you log in, and then dies when it is least convenient.
If you are prepared, your prognosis for a full recovery is very good. If you can get up to the LILO: prompt, the most convenient recovery is to load your backup kernel by specifying its label to the boot loader:
LILO: backup
This will boot from your previous kernel and allow you in so you can fix the problem and try your luck again. If you cannot get to the LILO: prompt, your only alternative is to use your boot disk or to use a rescue disk. The boot disk makes life much easier because the running system will be identical to your normal system. If you use a rescue disk, you must manually mount your system partitions and enable any extra modules.
Where alternate kernels and boot disks are not practical, for instance on thin clients with limited diskspace, and if you can reach the LILO: prompt, you can try to start your system in single-user mode to prevent the probing and loading of many modules, such as your network card (a frequent culprit). The default configuration for single-user (a.k.a. runlevel 1) mode is specified by the files in /etc/rc.d/rc1.d, and it is a good idea to double-check the symlinks in that directory after each system upgrade to ensure that the choices are intelligent for the purpose. Single-user mode will put you directly into a system shell; once the problem has been corrected, you can either reboot the system or exit the shell to return to multiuser mode.
Repeated Rebooting
Nine times out of ten, repeated rebooting is caused by changing the kernel file and forgetting to run lilo to register the new image with the boot loader. lilo needs the raw sector location of the kernel; copying a kernel image will move it to a new disk sector and leave the previous pointer stored by lilo dangling over an abyss.
This problem can be corrected by booting from the boot floppy and running lilo, or by using a rescue disk, mounting the boot partition under /mnt, and running lilo with the options to use a relative path:
lilo -r /mnt
Partial LILO Prompt
A partial LILO prompt is the most terrifying of all kernel boot errors. Each letter of the L-I-L-O signifies a stage in the boot process:
- L- or LIL: Usually a media error or failure to include the boot partition or filesystem support (or including it as a module).
- LI or LIL?: /boot/boot.b is missing, moved, or corrupt. The solution is the same for all: re-run lilo.
More information on using lilo and the diagnosis of lilo error codes can be found in /usr/doc/lilo-0.x/TechnicalGuide.ps .
Kernel Halts While Loading
Device probing is risky business and a frequent cause of kernel halts while loading. For example, if you are configuring for a gateway/firewall machine with two network interfaces, the second probe may cause the kernel to halt. Other causes of kernel halts are IRQ conflicts, memory conflicts, and mismatched devices selecting similar but not-quite-identical drivers.
You can avoid probing, memory, and IRQ conflicts for most kernel modules and devices by supplying configuration parameters in the /etc/lilo.conf append line or at the LILO: prompt. The exact parameters to use depend on your device, but you can find advice in the README files, either in linux/Documentation or in the subdirectories of the driver source code.
Kernel Panic
A kernel panic message has a certain cryptic poetry to it, like a robotic haiku, a snapshot testament to the last moments of a Linux kernel. A kernel panic usually has the following form:
unable to handle kernel paging request at address C0000010 Oops: 0002 EIP: 0010:XXXXXXXX eax: xxxxxxxx ebx: xxxxxxxx ecx: xxxxxxxx edx: xxxxxxxx esi: xxxxxxxx edi: xxxxxxxx ebp: xxxxxxxx ds: xxxx es: xxxx fs: xxxx gs: xxxx Pid: xx, process nr: xx xx xx xx xx xx xx xx xx xx xx
For most practical purposes, knowing where the panic occurs is more useful than interpreting the message itself. The leading text tells what triggered the event, and this is followed by the addresses held in various registers. Intrepid readers can find detailed instructions on decoding this message in the linux/Documentation/oops-tracing.txt file.
In production kernels, panic messages are rare and are usually due to a misconfiguration problem, missing modules, failure to load a module before using some essential feature, or using hardware not supported by the current kernel. With development kernels, kernel panics can become a way of life.
Kernel Oops and Bug Reporting
Linux is highly stable and resilient to application failures, but when you start experimenting with odd kernel combinations or experimental editions, hardware, and configurations, stuff happens. In the parlance of the kernel developers, an "oops" is a kernel panic message that occurs spontaneously, often mercilessly, and for no apparent reason. The message reported is similar to the kernel panic that can occur during the boot, but it may not be visible if you are running X Windows. The cause of both the boot halting and a spontaneous oops are is the same: the kernel has reached an impasse.
When an oops occurs during a user session, the kernel panic message may be displayed on one of the Linux Alt consoles (seen by pressing Ctrl+Alt and a function key) or by checking the system log file in /var/log/messages (another good reason to echo all system messages to /dev/tty12). If you can see the panic report, the activity just prior to this in the log may give some clues to the cause of the panic.
Linux is maintained and developed by volunteers, so the first advice for reporting problems and bugs is to be polite. Chances are good someone will take personal interest in this bug, and you will have a fix or a workaround in record time. You are far less likely to get a timely response if you take your frustrations out on the developers. Unlike other proprietary systems, when dealing with the Linux community, you are not dealing with underpaid droogs. You are dealing with the masters themselves, the people who take personal ownership and pride in their work. Show some respect and they will repay your thoughtfulness.
Before you report any bug, you should always check to see if this bug is known. If you have access to a Web browser, look into the linux-kernel archive (http://www.tux.org/lkml/). If you have IRC access, you can ask directly on one of the #linux, #linuxOS, or #kernelnewbie channels.
If you still think you have found a new bug in the kernel, the kernel development community will be more than interested…providing you can supply enough information to lead to a fix. If you can isolate the module where the oops occurred, you can locate the author of that module either in the linux/MAINTAINERS file or in the source code of the module itself. You can also post your report to the linux-kernel mailing list.
When reporting a suspected bug, you should specify which kernel you are using, outline your hardware setup (RAM, CPU, and so on), and describe the situation where the problem occurred. If there is a kernel panic message, copy the message exactly as displayed on your screen.
Summary | Next Section

Account Sign In
View your cart