This is the most complete book available on performance optimization—featuring coverage of UNIX, networking (TCP/IP), hardware architecture, and program optimization—all in one volume. KEY TOPICS: Covers performance basics; understanding UNIX; BSD instrumentation; System V instrumentation; system tuning; optimizing user programs written in high-level languages; and making accurate measurements. Explains in detail the output from each command—along with “real-life” rules of thumb on what value is “good” and what is not. MARKET: For System Administrators, application programmers, MIS managers, and general users of UNIX systems who are interested in learning about and/or optimizing the performance of their UNIX system and networks.
General style. Organization. Terminology.
Why a System Runs Slowly. Amdahlís Law. Estimating percentage speed-up. The bubble effect. Response time versus throughput. Psychology of the user response time. The Heisenberg Uncertainty Principal. Smoothed versus peak statistics. Caches and the Proximity Principal. Summary.
The basic components. The CPU.
The Clock Speed. Semiconductor Technology. Performance Characteristics. Integer Performance. Superscalar Execution. Floating- Point Performance. Impact of Memory Speed on FP Operations. The Bus Interface. Manufacturing and Cost Factors.
The Memory Controller.
DRAMs. DRAM Alternatives. Memory Subsystem Speed Specification. SRAMs.
Virtual Memory Management.
The Translation Lookaside Buffer.
Direct Mapped Caches. Set Associative Caches.
Cache Write Policies.
Write Buffers and FIFOs. The I/O Subsystem.
Polling. DMA Devices. Bus Mastering.
Expansion I/O Buses.
Sun SBUS. ISA. MicroChannel. EISA. High-Performance Local Buses. VESA VL. PCI. Other Buses.
Caching Disk Controllers. Dumb Disk Controllers. RAID Controllers. Software Disk Striping.
Disk Drive Performance Metrics. Disk Drive Interfaces. The IDE Interface. The SCSI Interface.
Other Storage Peripherals. The Graphics Subsystem.
Video Update and Refresh. Graphics Operations.
The Basics. Determining the Kernel Size. User Processes. System Calls. The Buffer Cache.
Buffer Cache Memory Allocation.
The File System Layer.
The UNIX Directory Structure. The Inode Structure. The Name Lookup Cache. Traditional UNIX File System. The BSD Fast File System. Symbolic Links. Journalled File Systems. Raw Partitions.
BSDís vfork. System Vís Copy-on-Write Optimization.
Process Termination. Process scheduling.
The Idle Loop. Process Suspension and Restart.
The BSD Scheduler.
Priority Migration. System Level Priorities.
The SVR4 schedulers.
Global Priorities in SVR4. The Time-Sharing Scheduler. Priority Migration Under the TS Scheduler. The Impact of the Nice Parameter. The Priocntl Interface. The Real-Time Scheduler.
The IBM AIX scheduler. Memory Management.
Demand Loading of User Programs. Zero-Fill-on-Demand. Shared Memory. Page Caching.
Paging and Swapping.
Performance Implications of Swapping and Paging. Swapping and Paging in SCO UNIX. Swap Allocation. The Paged Buffer Cache.
Multiprocessing (MP) Support. Threads. Kernel tables.
The CPU Statistics. Memory Statistics. Paging and Swapping. Disk I/0. Process Information. System Call Statistics. Hardware Interrupts. Context Switch Statistics. Sun Name Look-Up Cache Information. Other vmstat Features.
DEC vmstat. SCO vmstat. iostat.
Detailed CPU Usage. Disk Statistics. Terminal Activity.
Per Process CPU Utilization. Per Process Memory Utilization. Memory Sorted Process Display.
DEC UNIC ps. uptime. pstat. Sun multiprocessor statistics.
CPU Utilization. Buffer Cache Hit Rates. System Call Rates. Disk Statistics. pageout Activity. Pagein Activity. Swapping Activity. Free Memory and Swap Space. CPU Run Queue Statistics. Table Sizes. Name Cache Statistics. Solaris Kernel Memory Allocator Statistics.
System V ps.
DEC UNIX ps.
The Top Utility. Printing System Configuration. Conclusions.
The Kernel Tty Support.
cbreak and raw Character Processing.
Serial Port Input Mechanism.
Serial Port Output Mechanism.
Monitoring Terminal Traffic.
The Streams Facility.
Identifying the Memory Shortfall. Solving Memory Shortages. Streamlining the Kernel.
Tuning the Kernel Tables. Tuning the MAXUSERS Parameter. Removing Kernel Modules. Limiting the Paged Buffer Cache Size.
Reducing the User Memory Usages.
Changing the Work Load. Tuning the Paging/Swapping Subsystem.
Lowering CPU Usage.
Postponing ro Suspending Program Execution. Changing Process Priorities. Tuning the Scheduler. Optimizing Programs. Reducing The exec and fork Rates. The Console Output Overhead. The Inode and Name Lookup Caches. Reducing the Interrupt Rate.
Increasing CPU Cycles.
When to AddMore CPUs to a System. Degree of Speed-Up in Multiprocessor Systems.
Improving the File System Performance.
Balancing Multiple Drives. Using Disk Striping or RAID to Balance Disk Traffic. Optimizing Single-Drive Systems. Sunís tmpfs File System. Tuning the Paged Buffer Cache Size. Tuning the Metadata Buffer Cache Size. Adjusting the Buffer Cache Size in SCO UNIX. Optimizing the Cache Flush Parameters. Choosing Between a RAM Disk and the Buffer Cache.
Tuning the System V File System. Tuning the Fast File System.
Optimizing the Block and Fragment. Using tunefs to Set FFS Parameters. Rotational Latency Optimization.
Upgrading the Disk Subsystem.
ITCP/IP basics. Sockets.
Reading and Writing Sockets.
The UDP Protocol. The Internet Protocol. The Transmission Control Protocol.
TCP Flow Control and Acknowledgment Scheme.
Ethernet. Performance Considerations of Ethernet. FDDI. 100-Mbit Ethernet. ISDN.
Networking Applications. Monitoring the Network with netstat.
Real-Time Network Monitoring. Network Summary Information.
The ping Command. Monitoring Burst Response Using spray. Network Optimization Basics. Breaking Up the Network to Improve Performance. Performance Summary.
The Basics. RPC Performance Considerations. Impact of the NFS Block Size. NFS Caching.
Attribute Caching. Server Processing. Monitoring NFS Performance Using nfsstat. Optimizing NFS Servers. Optimizing the Network for NFS Usage. Optimizing NFS Clients. Automounted File Systems. Other Network File Systems.
X-Window Implementation Under Unix. Client-Server Communication in X. The Window Manager and Toolkits. Performance Considerations in X. Optimizing Your System for X. Optimizing X and Motif. X Terminals.
Benchmarks Versus Real Applications. The Megahertz Rating. Simple MIPS. The Whetstone Benchmark. The Linpak Floating Point Benchmark. Dhrystone and the New MIPS. The SPEC CPU Benchmark. SPEC SDM Multiuser Suite. SPEC SFS NFS Benchmark. 10X Window Benchmarks. Database Benchmarks. Proprietary Benchmarks. Conclusions.
Understanding Design Cycles. Survival of the Fittest. Selecting the Best Architecture. X86 PCs. Selecting RISC Systems. References.
Basic Time Measurements. Profiling Programs. Simple Frofiling Using Prof. Call-graph Profiling with gprof. Opportunities for Optimization. The Optimizer. Other Optimization Techniques. Dynamic Shared Libraries. Process Timing Mechanism. Results and Conclusions. Additional Information.
Optimizing the performance of computer systems has always been an art relegated to a few individuals who happen to have the "right skills." UNIX systems have not escaped this syndrome. It is rare to find anyone who knows how to instrument the system, let alone tune it. This is by no means a fault of the general user community. The problem turns out to be rather complex, requiring good knowledge of computer architecture, UNIX design, and performance-monitoring tools.
Due to a lack of standards in the system performance management area, vendors often take liberties with substituting, enhancing, or altogether removing system-monitoring tools. Even when a familiar command does exist on a system, it may have subtle differences that can easily mislead you. One such example is the unit for some of the fields. In a typical manual page, you see frequent references to units of "blocks" or "pages." Yet there rarely is an indication of how big these things are. As you will see later in this book, a page can be anywhere from 512 bytes to 8 kilobytes, making it very hard to interpret such data correctly.
Beyond the tools, there are also a number of limitations in the UNIX architecture itself. Without knowing about these deficiencies, you could easily chase the wrong problem. A classic example is when people blame the hardware instead of UNIX and vice verse. In the end, we hope that you do not misinterpret our criticisms of one of the best operating systems around. Perhaps our only excuse for pointing out these deficiencies stems from a wise saying that states:
If you cannot criticize something, you do not understand it well enough!
I.1 General Style
In this book, we take a system approach to performance optimization by covering everything from user applications all the way down to the hardware. At the same time, we try not to assume that you have a strong background in either hardware architecture re or UNIX internals or, for that matter, extensive experience with UNIX itself. Just in case you have dabbled seriously in any of these areas, we explain each topic in a separate chapter, making it easy to skip over them. You will probably also notice that we have dedicated considerably more space to analysis than to simple cookbook procedures. While cookbook procedures do have their place (and we have included a fair number in this text), they do not have any use unless you know when to use them. Armed with an in-depth knowledge of what is going on inside your system, you will be better able to identify the true nature of performance bottlenecks in your system. As a bonus, you will be in a position to solve a wider set of problems than what is covered here. In a departure from other texts on this topic, we have taken a very pragmatic view by emphasizing modern techniques for tuning UNIX systems. Had this book been written in the early 1980s, we would have focused heavily on how to modify the operating system parameters to either squeeze the last byte out of it or save a few CPU cycles. The advice would have been sound in that time frame due to the fact that the average machine was well under 5 MIPS and had around 8 megabytes of memory. Any amount of savings would have seemed significant. Current CPUs are orders of magnitude faster with tens or even hundreds of megabytes of memory. The result is that the benefits of many of these optimization techniques are simply "lost in the noise." So, rather than relying on obsolete advice, we focus on higher-level approaches to system optimization. These tacks include optimization of the system hardware, general techniques for resource utilization, and more optimal usage of the system and network. Alas, old habits die hard, and users have a fondness for "poking" values into their system. For this reason, we also cover those parameters and tuning methods that have at least some noticeable impact on the system performance. But we would like to recommend again that you stay away from them if for no other reason than portability. Higher-level techniques work across different UNIX implementations and, for that matter, other operating systems. With their larger impact on system throughput and response time, they are also more rewarding to implement.
We start this book by covering the basic principles behind performance monitoring and optimization. They are helpful in forming a strategy for attacking performance problems and steering you clear of potential pitfalls. Although the information presented in this chapter may seem simple in nature, its impact is significant.
Chapter 2 is aimed at giving you a pragmatic overview of the hardware architecture. We are not too worried about the theoretical aspects of this field about which there are many excellent texts. Instead, we cover the major components in a high-performance computer system and show how design decisions made by the system and chip vendors have an impact on the performance of your system. The information should help you determine when a performance problem is a result of the inherent design of the hardware and not UNIX.
Chapter 3 is dedicated to an architectural overview of modern implementations of UNIX as it relates to system performance. Our focus is not to teach you the entire operating system (which would occupy a book larger than this one) but to point out those aspects that have an impact on monitoring and optimization of the system. As a result, the topics are presented in fairly terse form, which may be hard to understand. We have made sure, however, that all the necessary facts are highlighted so that complete understanding of the material is not necessary.
Armed with basic knowledge of the hardware and UNIX, you are now ready to start instrumenting your system and to look for performance bottlenecks. We have opted to divide the material into two chapters each dedicated to the traditional implementation s of UNIX today, namely System V and BSD. Alas, vendors routinely mix and match BSD and System V tools, so it may be necessary to read both chapters. To make it easier, we have listed the tools available in most common versions of UNIX in Table I.1 a long with the relevant chapter in this book. Because there are still large number of users connected to UNIX systems through serial ports and ASCII terminals, we have dedicated.
Chapter 6 to UNIX terminal support. Because the same code deals with modems and sometimes networking, the information should also be useful to those who use workstations and other UNIX systems. Also included is the coverage of tools that let you instrument the terminal subsystem. Once you find the system bottlenecks by using the monitoring tools, it is time to eliminate or reduce their impact on system performance.
Chapter 7 covers the best techniques for dealing with typical shortages such as memory, disk bandwidth, and CPU resources. Again, we cover both high-level techniques for reconfiguration of the system and detailed fine-tuning of each subsystem.
Because it is rare to find UNIX systems that run stand-alone these days, Chapter 8 focuses on basic UNIX networking. This includes the complete suite of TCP/IP along with coverage of various networks and topologies. Because the networking implementation in UNIX is very monolithic with very little room for fine-tuning, we have focused the material on best ways to configure the network and system to avoid performance problems at the start.
Given the widespread usage of NFS, we have dedicated Chapter 9 to its operation and optimization techniques. We point out some major deficiencies in the NFS design and ways to side step them.
The X window system is covered in Chapter 10 starting with an in-depth overview of its architecture. True to our form, we point out its deficiencies as implemented on top of UNIX. Even though X is not generally tunable, we have nevertheless uncovered a few techniques for optimizing it.
Computer marketing is full of buzz words describing the speeds and feeds of various components of the system. Invariably, these terms are derived from some set of benchmarks. To prepare you for your next computer purchase, Chapter 11 covers the most popular industry standard benchmarks. We not only describe what the benchmarks purport to measure but also what the results actually reflect. Because benchmarks are based on pieces of code that are bound to be different than your application, we also attempt to correlate the results to real-life applications.
Chapter 12 is dedicated to the ins and outs of selecting systems and hardware components for best performance. We cover a broad range of systems from PCs to high-end RISC systems. With the information in this chapter, you should be able to select the best hardware for your application so that performance problems do not surface later.
Chapter 13, which covers optimization of the UNIX programs, may seem out of place in such a text. However, these techniques give you an additional and powerful tool in getting the most performance out of your system and applications. We cover the standared UNIX profiling and timing tools, which help you identify what parts of an application can benefit from optimization. This discussion is followed by some common techniques for speeding up typical code sequences and algorithms. The coverage remains brief in this area because of the necessity of keeping from filling the entire text. References are provided, however, for those interested in more detailed information.
Throughout this book, we use the System V and SysV designations in reference to all variants of System V from Release 3.2 to 4.X. Even though these releases share many components, we make sure to point out if a feature is specific to a particular version of System V. A case in point is System V Release 4.X (commonly abbreviated to SVR4), which is quite a departure from older releases of System V. Unless we state otherwise, the SVR4 designation applies only to versions of UNIX that are "pure" imp lementations of System V Release 4.X.
As of this writing, Sun (with Solaris), SONY, TANDEM, NEC, Pyramid Technology, and Novell (with UnixWare) are some of the vendors that fall in this category. Others, such as SGI, have operating systems that are compatible with SVR4 from a user point of view, but their kernel does not necessarily match the SVR4 sources. Although there is nothing wrong with their approach, the algorithms in these operating systems may not match those used in SVR4.
Being fairly picky about preciseness of units, we use the designation Kbytes, Mbytes, Gbytes, and Tbytes to refer to kilobytes, megabytes, gigabytes, and terabytes, respectively. Likewise, Kbits, Mbits, Gbits, and Tbits refer to kilobits, megabits, g igabits, and terabits. We stay away from terms such as MB and Mb, which are easily confused with each other.