In this section, we examine just a few of the tools that are likely to be useful to the email administrator. Many possibilities are availablemany more than are listed here. Some are very specific, whereas others have broad applications. The tools discussed here are both generally useful and widely available. Email administrators interested in tuning, or just understanding, an email system should not restrict their studies to just the utilities mentioned here. Magazine articles, books, Web sites, and other system administrators can all provide insight into very helpful tools.
Each tool discussed has different options and displays slightly different information on each operating system version. While this inconsistency is annoying, some of the differences are tied to the internal workings of the operating system and are unavoidable. Also, some of the less common options are the most useful. Its just not practical to limit the use of these utilities to their common flag and output subsets, so that wont be done here. Instead, this section will generally provide examples using the FreeBSD (version 4.5) operating system utilities, throwing in some examples specific to other operating systems.
A final note: As we know from science, it is impossible to measure a system without affecting it. Just by running a tool we necessarily change the behavior of the very computer were monitoring. These utilities consume memory and CPU time, they open sockets and files, and they read data off disks. Therefore, we can never be entirely sure that a problem that we observe on a system isnt at least partially influenced by the fact that were monitoring it. Although this is rarely the case, its a good idea to not go overboard by continually running topor by having scripts run ps every five seconds to capture the state of the machine. A much more modest approach to capturing data (running psevery five minutes, for example) will provide equally useful information without adding substantially to the servers load.
The venerable psutility comes in two flavors: the Berkeley flavor (found on BSD-based systems and Linux) and the System V flavor (found on AIX, HP-UX, and other systems). Solaris provides the System V flavor in /usr/bin, and the Berkeley flavor appears in /usr/ucb. My preference is for the Berkeley-style output of ps; I like the information it provides and the way that the Berkeley ps -u sorts the data. Essentially the same information is available from either version, however, so other than remembering which option does what, one shouldnt be handicapped by any particular flavor.
A lot of information is available from ps, and its especially useful for such tasks as tracking the number of certain types of processes running on a machine or seeing which processes are the largest resource consumers. A great deal of information is available from this program, which varies depending on the option flags selected. Everyone performing system troubleshooting would be well advised to become very familiar with the psman page for the operating system that runs on their email server.
For both varieties of ps, some command-line flags require more processing to resolve than others. On Berkeley-type systems, it is more computationally intensive to resolve commands with the -u flag than without it. For System V versions, adding the -l flag requires more computational resources than if the command is run without it. Therefore, these flags, which produce extra output, should be used only when they relate important information. One thing that psprovides is rough process counts, for example:
ps -acx | grep -c "sendmail"
These sorts of data are useful, and periodic counts are often scripted. Especially in automated systems, its worthwhile to make sure that they produce minimal strain on the server. Determining which options are more resource intensive than others isnt always straightforward, but the timecommand or shell built-in can aid in this calculation. On quiet servers, the response time for this command might be too fast to measure, so the aggregation of several commands may provide a more precise measurement. For example:
usr/bin/time sh -c for i in 1 2 3 4 5 6 7 8 9 10; \ do ps -aux > /dev/null; done
Some psoutput from the CPU-bound test server during one of the tests cited earlier in this book appears in Table 7.1. At the moment that this snapshot was taken, syslogdwas the most active process. While it is a busy process on an email server, it rarely does the most work at any given time. However, unlike the MTA and LDA processes that move data, this persistent process reads data from the IP stack and writes it to disk on every delivery attempt.
Table 7.1. Sample /usr/ucb/ps -uaxc Output from the CPU-Bound Test Server
|% /usr/ucb/ps -uaxc|
Adding all numbers in the RSS column, they roughly equal the systems total main memory (only 32MB), which doesnt count RAM consumed by the kernel or the buffer cache. Because much of the memory consumed by the processes is shared, it provides enough space to keep the parts of the programs that run while resident in memory and still allow extra space for the kernel and the buffer cache.
On this machine, the script command is used to capture output from the iostat and vmstat commands, which will be discussed shortly. The stat entry is a home-built script that adds date and time information to the output of these two utilities. As wed expect, most of the CPU time is consumed by sendmail and mail.localprocesses. Also as wed expect, concurrent MTA processes outnumber LDA processes, even though the email is sent to this server over a low-latency local area network.
Most of the rest of the processes running on this server are either standard parts of the operating system or processes related to remote connections to the server.
Many UNIX operating systems include the venerable toputility, which is also one of the first Open Source programs installed on many other operating systems. The toputility lists the largest CPU resource consumers on a system and updates this list periodically, typically every few seconds. For understanding the general state of the system, some of the most valuable information appears in the first few lines of the programs display. A system consistently showing a CPU idle state at or near 0% is almost certainly CPU bound. The caveat is that some systems list an iowaitstate indicating what percentage of processes are waiting for I/O. This number doesnt represent CPU time being consumed, but rather consists of the systems best guess as to the amount of CPU time that would be consumed if no processes were blocked waiting for I/O. If a significant percentage of processes are in the iowait state, then the system may show 0% idle while the CPU is barely being used.
In the upper-left corner is the last process identifier (PID) used by the system. From its rate of change, one can deduce how many new processes are spawned per second, giving some idea of how fast sessions are coming and going on the server. This method isnt useful on those few operating systems, such as OpenBSD, that assign new PIDs randomly rather than sequentially.
The memory information displayed isnt as useful as one would first expect. On nearly any system that has been running for a few minutes, or even a few seconds if its busy, we should expect the amount of free memory listed to stay very near zero. On contemporary operating systems, any RAM that goes unused by processes will be allocated to caching some data. Thus, just because there is very little memory free, it doesnt mean that the system is memory starved. On some operating systems, topwill show more memory information, such as how much RAM is allocated to filesystem caches; if this number drops near zero, it would likely indicate that the server would use additional RAM effectively.
Even more so than with ps, the information displayed via top varies from operating system to operating system. A thorough reading of the utilitys man page should be performed before its results are interpreted.
The vmstat utility explores the activity of the virtual memory system, which includes real memory used by processes, memory used for caching, and swap space. The first line of data produced summarizes the activity since the system was booted. Generally, this information should be ignored.
While its much less impressive than the output that one will find on a true high-performance email server, some example output from the CPU-bound server during one of the test cases discussed in this book can be instructive. This output appears in Table 7.2.
Table 7.2. Sample vmstat 15 Output from the CPU-Bound Test Server
|% vmstat 15|
Excessive memory activity will cause heavy paging, which translates into relatively large numbers in the pi and po columns. Of course, what constitutes a large number depends heavily on the particular system. Interpreting these numbers without a baseline will be next to impossible. In the example case, these numbers are so small that we can safely conclude that the system is not memory bound.
On those systems whose vmstat provides this information, another column worth tracking is de. It gives a systems expected short-term memory deficiency, for which memory space will have to be actively reclaimed. A nonzero entry will show up occasionally in this column on a healthy but busy system. The more often this result appears, though, the more likely the system could use more memory. Our sample data show no deficiencies, another indication that this system is not memory bound.
The first column, labeled r, indicates the number of runnable processes, which provides a snapshot of the system load average. In this example, a number of processes want to run but cant because they have no CPU time slice available to them. The second column, labeled b, gives the number of processes that are blocked from proceeding because they are waiting for I/O. If a significant number of processes are listed in this column, the system is likely I/O bound. In our example, we occasionally see a blocked process, but this event is rare, giving us an indication that this system isnt I/O bound. Yet one more variable worth tracking is the third column, labeled w. It represents the number of processes that are either runnable or have been idle for a short period of time and have now been swapped out. Frequent nonzero numbers in this column also indicate that the server may be desperately short of RAM. The example looks like its in good shape on that point.
In the past, one could tell whether a system was memory starved just by looking for swapping activity, as opposed to the more healthy activity of paging. Paging is the process of writing parts of process data to swap space to make room for pages of other data in active memory. An operating system may page out part of a process if that page hasnt been accessed in a while, even if the process is running. This efficient behavior allows new processes to start up more quickly because memory reclamations dont need to occur first, and it leaves more room for caching data, leading to better performance. Some amount of paging will occur on all operating systems and is considered normal and healthy.
Swapping usually refers to taking a process and moving its entire memory image to disk. It might happen if the process has remained idle for a very long time (tens of seconds, which is a very long time in computer terms) or if the system desperately needs to make room for new processes. Desperation swapping and thrashing are terms used to describe a system that is so memory starved that nearly every time a process receives a CPU slice, it must be read in from swap to active memory before it can proceed. This horrible circumstance effectively slows memory access (typically measured in tens of nanoseconds) to disk speeds (measured in ones to tens to hundreds of milliseconds, a difference of two to four orders of magnitude). Once a system starts thrashing, it will not operate efficiently. One should aggressively avoid this situation.
Somewhat unfortunately, as virtual memory algorithms have become more complex and sophisticated over the years, its become more difficult to tell in a vacuum whether a system is thrashing. In fact, many operating systems dont distinguish between paging and swapping, eliminating the latter behavior altogether. Here is where a baseline becomes crucial. One must understand what sort of paging statistics occur on a heavily loaded but properly operating server before one can determine whether a system is beginning to thrash. However, once the disks with swap on them begin to get loaded, it will be painfully obvious that the system has simply run out of memory. Of course, this behavior will occur beyond the point where a server starts to slow down noticeably.
Solaris 8 introduced a new system for managing the buffer cache. Now the page daemon is no longer needed to free up memory used to cache filesystem information. Consequently, the page daemon does not have to do any work to reclaim memory space for new processes. The upshot is that on Solaris 8, if the sr field of vmstatoutput is nonzero, running processes are being paged to disk to make room for new processes. On this operating system, it has now become more straightforward to identify significant memory deficiencies. Significant activity in the sr field on other operating systems can indicate that the machine is memory starved, but the demarkation point is not as obvious as it is on Solaris 8.
The iostattool is similar to vmstat, except that it measures system I/O rather than virtual memory statistics. On many systems, it can measure not only disk-by-disk data transfers, but also I/O information to and from a wide variety of sources, including tape drives, printers, scanners, ttys, and so on. Like vmstat, this command displays CPU information in the last set of columns. On many systems, if one specifies no I/O devices, it can be a good mechanism to track CPU usage in scripts, such as running iostat -c 60 to get basic output of CPU information every minute on a Linux or Solaris system. As with vmstat, the first line of output by the iostatprogram is a summary since boot time and is effectively useless. Table 7.3 gives some data gathered with iostatwhile testing earlier examples in this book.
Table 7.3. Sample iostat -cx 15 Output from the CPU-Bound Test Server
|% iostat -cx 15|
|extended device statistics||cpu|
|extended device statistics||cpu|
|extended device statistics||cpu|
|extended device statistics||cpu|
|extended device statistics||cpu|
Typically, iostat reports its data as kilobytes per second or transfers per second. In this example, reads and writes per second for each device are listed in the second and third columns, while the amount of data being moved appears in the fourth and fifth columns. Some versions also show how long the average transfer takes, svc_t in this example, which can be very useful metric for determining loading. If this number starts going up, it indicates that the device is heavily loaded.
On Solaris and some recent versions of Linux, the -x flag gives even more valuable information, as in this example, including the average amount of time each request spends in the wait queue and the percentage of time I/O requests are waiting to be serviced by the disk device. These numbers represent some of the best indicators of disk contention in the absence of a baseline, but theyre no substitute for one. A disk can be 100% busy and yet the system can still provide adequate service. In our example, we can clearly see that the two disk devices (sd0contains the message store and sd3contains the logs and the email queue) are not saturated and, therefore, this system is not I/O bound.
Knowing that a disk always has requests sitting in the wait queue doesnt explain why a change in server behavior has occurred. If kilobytes per second increases while tps remains constant, it would indicate that were dealing with larger requests, which may alert us to a temporary or permanent change in the type of email flowing through the system.
On some operating systems, iostathas problems reporting useful information about disks managed by software RAID or from a hardware RAID system. This is especially true for those numbers indicated on a percentage basis. Absolute throughput numbers such as numbers of reads and writes per second or bytes per second compared against a baseline are likely to be more reliable. Because email servers so often become I/O bound, iostat may be the single most important utility in the email administrators toolkit. Anyone who expects to maintain such a system would be well advised to become very familiar with it.
In the operating system used in the examples here (Solaris 2.6), note that the CPU loading information given by the iostat command lists an I/O wait stat (the wtcolumn), whereas the vmstatcommand lumps it in with the idle CPU state (the idcolumn). Someone who looked at just the vmstatoutput might conclude that the system is not quite CPU bound, whereas this result would become more obvious if the CPU loading information was examined via topor iostat.
The third tool in the *stat trio is netstat. As one would expect, netstat provides information about system networking. It can display either a snapshot of very detailed information about nearly every conceivable network parameter (netstat -s) or periodic data like that found with vmstat or iostat (e.g., netstat -w 5 on BSD systems, netstat -i 5 on Solaris, or netstat -c on Linux).
Obviously, in its periodic mode, some of the parameters provided by netstat that we want to carefully observe include the number of packets per second and the number of bytes per second. Both statistics, and especially trends in them, can provide the most direct information on the objective external load on a system, so they should be tracked. How the ratio of input to output statistics might change can also be highly informative.
On some types of shared networks, such as Ethernet, when computers are connected to the network via a hub rather than a switch, two machines could potentially try to send a network packet at the same time. This attempt can result in a collision. Both senders will then wait for a small, random amount of time and try to send their packets again. On a shared network, the number of collisions is a good indicator of general network load. Again, hard and fast numbers are difficult to identify, as they depend on the speed of the network, packet sizes, and the number of other machines on the network, but as a rule of thumb a busy email server should not reside on a network that consistently shows hundreds of collisions per second. On a switched network, no collisions should occur. If they do arise, it might mean that the switch, or the connection between the server and the switch, dropped into a nonswitched mode for some period of time. To avoid this possibility, one can lock network interfaces on switched networks into full-duplex, rather than letting them autonegotiate speed and mode.
The other piece of data of special value from netstat in periodic mode involves the error rates. An error usually indicates that a packet has failed its checksumthat is, its contents dont match what the packet header indicates. An output error indicates that this problem occurred somewhere between the formation of the packet by the operating system and its transmission over the wire. This result is never good. Even a handful of entries in this field can indicate a serious problem with the servers NIC and should be investigated. Input errors are less severe, as a packet might legitimately have become corrupted traveling over a network to the server, but input error rates of even 0.1% may indicate a network problem, such as bad cabling, electrical interference, or a bad NIC. An error rate of 1% means something is seriously wrong with the network somewhere, and this problem should be tracked down and eliminated before it worsens and interferes with operations.
On System V-derived UNIX versions, you can run the System Activity Reporter (sar) program in the background to gather statistics and accounting information, including much of the data reported by the tools that have already been mentioned in this section. It is an excellent baselining tool, and collecting data every 1 to 15 minutes on a system via sarand archiving those data is something that every server administrator should seriously consider. This effort will be worthwhile on any system where performance monitoring is important.
Just about every piece of data one could want to examine is available via sar. In fact, its more likely that one will miss key information due to the presence of too much data than that information on the nature of a given problem isnt available. This tool provides a superset of the information available from vmstat, iostat, netstat, and other utilities. Any performance-critical server administrator should become very familiar with sarand its affiliated utilities.
7.3.7 Other Utilities
Many other utilities could have been mentioned here, such as pstat, lsof, ifconfig, systat, pstack, and nauseum. They have been omitted not because theyre not valuable, but because a line must be drawn somewhere. Playing around with these other possibilities is worthwhile with the proviso that before one makes a new utility part of the canon, it should be demonstrated that programs that are more familiar and already on the system cannot easily generate the same information.
Finally, if for no other reason than to satisfy the readers curiosity, Ill explain the statshell script that appeared on the example psoutput. This trivial script receives output from commands such as vmstat and iostat that do not indicate the date and time the data were gathered, and adds this information. Thus, instead of
% vmstat 15 procs memory ... r b w swap free re ... 0 0 0 4692 1804 0 ... 7 1 0 64440 1956 5 ... 8 0 0 64008 1800 8 ... 8 0 0 64612 2276 10 ...
we could run
% vmstat 15 | /usr/local/etc/stat 020315 15:14:03 procs memory ... 020315 15:14:03 r b w swap free re ... 020315 15:14:03 0 0 0 4692 1804 0 ... 020315 15:14:18 7 1 0 64440 1956 5 ... 020315 15:14:33 8 0 0 64008 1800 8 ... 020315 15:14:48 8 0 0 64612 2276 10 ...
Now data from one source can be matched up in time against data from another source.
The statscript is trivial:
#!/bin/sh OLDIFS=$IFS IFS= while read LINE do echo -n 'date "+%y%m%d %H:%M:%S"' echo " " $LINE done IFS=$OLDIFS
IFS is redefined to be null so that the whitespace isnt adjusted when each line of input is collected by the read command.