Linux is a fast-growing operating system with power and appeal, and enterprises worldwide are quickly adopting the system to utilize its benefits. But as with all operating systems, performance problems do occur causing system administrators to scramble into action. Finally, there is a complete reference for troubleshooting Linux–quickly! Linux Troubleshooting for System Administrators and Power Users is THE book for locating and solving problems and maintaining high performance in Red Hat® Linux and Novell® SUSE® Linux systems.
This book not only teaches you how to troubleshoot Linux, it shows you how the system works–so you can attack any problem at its root. Should you reinstall if Linux does not boot? Or can you save time by troubleshooting the problem? Can you enhance performance when Linux hangs or runs slowly? Can you overcome problems with printing or accessing a network? This book provides easy-to-follow examples and an extensive look at the tools, commands, and scripts that make Linux run properly.
Gone are the days of searching online for solutions that are out of date and unreliable. Whether you are a system admin, developer, or user, this book is an invaluable resource for ensuring that Linux runs smoothly, efficiently, and securely.
Download the Sample Chapter related to this title.
Chapter 1 System Boot, Startup, and Shutdown Issues 1
Chapter 2 System Hangs and Panics 51
Chapter 3 Performance Tools 79
Chapter 4 Performance 107
Chapter 5 Adding New Storage via SAN with Reference to PCMCIA and USB 159
Chapter 6 Disk Partitions and Filesystems 185
Chapter 7 Device Failure and Replacement 229
Chapter 8 Linux Processes: Structures, Hangs, and Core Dumps 253
Chapter 9 Backup/Recovery 285
Chapter 10 cron and at 315
Chapter 11 Printing and Printers 345
Chapter 12 System Security 383
Chapter 13 Network Problems 423
Chapter 14 Login Problems 495
Chapter 15 X Windows Problems 527
My good friend, James Kirkland, sent me an instant message one day asking if I wanted to write a Linux troubleshooting book with him. James has been heavily involved in Linux at the HP Response Center for several years. While troubleshooting Linux issues for customers, he realized there was not a good troubleshooting reference available. I remember a meeting discussing Linux troubleshooting. Someone asked what the most valuable Linux troubleshooting tool was. The answer was immediate. Google. If you have ever spent time trying to find a solution for a Linux problem, you know what that engineer was talking about. A wealth of great Linux information can be found on the Internet, but you can't always rely on this strategy. Some of the Linux information is outdated. A lot of it can't be understood without a good foundation of subject knowledge, and some of it is incorrect. We wanted to write this book so the Linux administrator will know how Linux works and how to approach and resolve common issues. This book contains the information we wish we had when we started troubleshooting Linux.
Greg and Chris are identical twins and serious Linux hobbyists. They have been Linux advocates within HP for years. Yes, they both run Linux on their laptops. Chris is a member of the Superdome Server team (http://www.hp.com/products1/servers/scalableservers/superdome/index.html). Greg works for the XP storage team (http://h18006.www1.hp.com/storage/xparrays.html). Their Linux knowledge is wide and deep. They have worked through SAN storage issues and troubleshot process hangs, Linux crashes, performance issues, and everything else for our customers, and they have put their experience into the book.
I am a member of the HP escalations team. I've primarily spent my time resolving HPUX issues. I've been a Linux hobbyist for a few years, and I've started working Linux escalations, but I'm definitely newer to Linux than the rest of the team. I try to give the book the perspective of someone who is fairly new to Linux. I tried to remember the questions I had when I first started troubleshooting Linux issues and included them in the book. We sincerely hope our effort is helpful to you.
Chapter 1: System Boot, Startup, and Shutdown Issues
Chapter 1 discusses the different subsystems that comprise Linux startup. These include the bootloaders GRUB and LILO, the init process, and the rc startup and shutdown scripts. We explain how GRUB and LILO work along with the important features of each. The reader will learn how to boot when there are problems with the bootloader. There are numerous examples. We explain how init works and what part it plays in starting Linux. The rc scripts are explained in detail as well. The reader will learn how to boot to single user mode, emergency mode, and confirm mode. Examples are included of using a recovery CD when Linux won't boot from disk.
Chapter 2: System Hangs and Panics
This chapter explains interruptible and non-interruptible OS hangs, kernel panics, and IA64 hardware machine checks. A Linux hang takes one of two forms. An interruptible hang is when Linux seems frozen but does respond to some events, such as a ping request. Non-interruptible hangs do not respond to any actions. We show how to use the Magic SysReq keystroke to generate a stack trace to troubleshoot an interruptible hang. We explain how to force a panic when Linux is in a non-interruptible hang. An OS panic is a voluntary shutdown of the kernel in response to something unexpected. We discuss how to obtain a panic dump from Linux. The IA64 architecture dump mechanism is also explained.
Chapter 3: Performance Tools
In Chapter 3, we explain how to use some of the most popular Linux performance tools including top, sar, vmstat, iostat, and free. The examples show common syntaxes and options. Every system administrator should be familiar with these commands.
Chapter 4: Performance
Chapter 4 discusses different approaches to isolating a performance problem. As with the majority of performance issues, storage always seems to draw significant attention. The goal of this chapter is to provide a quick understanding of how a storage device should perform and easy ways to get a performance measurement without expensive software. In addition to troubleshooting storage performance, we touch on CPU bottlenecks and ways to find such events.
Chapter 5: Adding New Storage via SAN with Reference to PCMCIA and USB
Linux is moving out from under the desk and into the data center. An essential feature of an enterprise computing platform is being able to access storage on the SAN. This chapter provides a detailed walkthrough and examples of installing and configuring Fibre Channel cards. We discuss driver issues, how the device files work, and how to add LUNs.
Chapter 6: Disk Partitions and Filesystems
Master Boot Record (MBR) basics are explained, and examples are shown detailing how bootloader programs such as LILO and GRUB manipulate the MBR. We explain the partition table, and a lot of examples are given so that the reader will understand how the disk is carved up into extended and logical partitions. Many scenarios are provided explaining common disk and filesystem problems and their solutions. After reading this chapter, the reader will understand not only what MBA, LBA, extended partitions, and all the other buzzwords mean, but also how they look on the disk and how to fix problems related to them.
Chapter 7: Device Failure and Replacement
This chapter explains identifying problems with hardware devices and how to fix them. We begin with a discussion of supported devices. Whether a device is supported by the Linux distribution is a good thing to know before spending a lot of time trying to get it working. Next we show where to look for indications of hardware problems. The reader will learn how to decipher the hexadecimal error messages from dmesg and syslog. We explain how to use the lspci tool for troubleshooting. When the error is understood, the next goal is to resolve the device problem. We demonstrate techniques for determining what needs to be done to fix device issues including SAN devices.
Chapter 8: Linux Processes: Structure, Hangs, and Core Dumps
Process management is the heart of the Linux kernel. A system administrator should know what happens when a process is created to troubleshoot process issues. This chapter explains process creation and provides a foundation for troubleshooting. Linux is a multithreading kernel. The reader will learn how multithreading works and what heavyweight and lightweight processes are. The reader also will learn how to troubleshoot a process that seems to be hanging and not doing any work. Core dumps are also covered. We show you how to learn which process dumped core and why. This chapter details how cores are created and how to best utilize them to understand the problem.
Chapter 9: Backup/Recovery
Creating good backups is one of if not the most important tasks a system administrator must perform. This chapter