Finding and Removing Bottlenecks

By Nick Christenson
Jan 31, 2002

📄 Contents

␡

⎙ Print

< Back Page 3 of 6 Next >

This chapter is from the book 

sendmail Performance Tuning

Learn More Buy

7.3 Tools

In this section, we examine just a few of the tools that are likely to be useful to the email administrator. Many possibilities are available—many more than are listed here. Some are very specific, whereas others have broad applications. The tools discussed here are both generally useful and widely available. Email administrators interested in tuning, or just understanding, an email system should not restrict their studies to just the utilities mentioned here. Magazine articles, books, Web sites, and other system administrators can all provide insight into very helpful tools.

Each tool discussed has different options and displays slightly different information on each operating system version. While this inconsistency is annoying, some of the differences are tied to the internal workings of the operating system and are unavoidable. Also, some of the less common options are the most useful. It’s just not practical to limit the use of these utilities to their common flag and output subsets, so that won’t be done here. Instead, this section will generally provide examples using the FreeBSD (version 4.5) operating system utilities, throwing in some examples specific to other operating systems.

A final note: As we know from science, it is impossible to measure a system without affecting it. Just by running a tool we necessarily change the behavior of the very computer we’re monitoring. These utilities consume memory and CPU time, they open sockets and files, and they read data off disks. Therefore, we can never be entirely sure that a problem that we observe on a system isn’t at least partially influenced by the fact that we’re monitoring it. Although this is rarely the case, it’s a good idea to not go overboard by continually running topor by having scripts run ps every five seconds to capture the state of the machine. A much more modest approach to capturing data (running psevery five minutes, for example) will provide equally useful information without adding substantially to the server’s load.

7.3.1 `ps`

The venerable psutility comes in two flavors: the Berkeley flavor (found on BSD-based systems and Linux) and the System V flavor (found on AIX, HP-UX, and other systems). Solaris provides the System V flavor in /usr/bin, and the Berkeley flavor appears in /usr/ucb. My preference is for the Berkeley-style output of ps; I like the information it provides and the way that the Berkeley ps -u sorts the data. Essentially the same information is available from either version, however, so other than remembering which option does what, one shouldn’t be handicapped by any particular flavor.

A lot of information is available from ps, and it’s especially useful for such tasks as tracking the number of certain types of processes running on a machine or seeing which processes are the largest resource consumers. A great deal of information is available from this program, which varies depending on the option flags selected. Everyone performing system troubleshooting would be well advised to become very familiar with the psman page for the operating system that runs on their email server.

For both varieties of ps, some command-line flags require more processing to resolve than others. On Berkeley-type systems, it is more computationally intensive to resolve commands with the -u flag than without it. For System V versions, adding the -l flag requires more computational resources than if the command is run without it. Therefore, these flags, which produce extra output, should be used only when they relate important information. One thing that psprovides is rough process counts, for example:

ps -acx | grep -c "sendmail"

These sorts of data are useful, and periodic counts are often scripted. Especially in automated systems, it’s worthwhile to make sure that they produce minimal strain on the server. Determining which options are more resource intensive than others isn’t always straightforward, but the timecommand or shell built-in can aid in this calculation. On quiet servers, the response time for this command might be too fast to measure, so the aggregation of several commands may provide a more precise measurement. For example:

usr/bin/time sh -c ‘for i in 1 2 3 4 5 6 7 8 9 10; \ 
   do ps -aux > /dev/null; done‘

Some psoutput from the CPU-bound test server during one of the tests cited earlier in this book appears in Table 7.1. At the moment that this snapshot was taken, syslogdwas the most active process. While it is a busy process on an email server, it rarely does the most work at any given time. However, unlike the MTA and LDA processes that move data, this persistent process reads data from the IP stack and writes it to disk on every delivery attempt.

Table 7.1. Sample `/usr/ucb/ps -uaxc` Output from the CPU-Bound Test Server

`% /usr/ucb/ps -uaxc`
`USER`	`PID`	`%CPU`	`%MEM`	`SZ`	`RSS`	`TT`	`S`	`START`	`TIME`	`COMMAND`
`root`	`11302`	`1.3`	`3.3`	`3480`	`1004`	`?`	`S`	`15:03:18`	`0:23`	`syslogd`
`root`	`23420`	`1.1`	`4.3`	`2296`	`1304`	`?`	`R`	`16:05:56`	`0:02`	`sendmail`
`root`	`24881`	`0.9`	`5.6`	`2392`	`1700`	`?`	`S`	`16:13:11`	`0:00`	`sendmail`
`root`	`24884`	`0.8`	`5.3`	`2352`	`1592`	`?`	`R`	`16:13:11`	`0:00`	`sendmail`
`root`	`24861`	`0.7`	`5.6`	`2392`	`1700`	`?`	`S`	`16:13:07`	`0:00`	`sendmail`
`root`	`11009`	`0.6`	`3.9`	`2012`	`1172`	`?`	`S`	`14:47:30`	`0:08`	`nscd`
`root`	`24871`	`0.6`	`3.4`	`1552`	`1016`	`?`	`S`	`16:13:08`	`0:00`	`mail.local`
`root`	`24886`	`0.5`	`5.0`	`2312`	`1516`	`?`	`R`	`16:13:12`	`0:00`	`sendmail`
`root`	`24892`	`0.5`	`2.9`	`1140`	`860`	`pts/4`	`O`	`16:13:13`	`0:00`	`ps`
`test1`	`24890`	`0.5`	`3.4`	`1552`	`1012`	`?`	`R`	`16:13:12`	`0:00`	`mail.local`
`root`	`24889`	`0.4`	`5.0`	`2312`	`1516`	`?`	`R`	`16:13:12`	`0:00`	`sendmai`
`root`	`18650`	`0.3`	`3.8`	`1712`	`1160`	`?`	`S`	`15:45:14`	`0:00`	`sshd`
`npc`	`18716`	`0.2`	`2.5`	`1016`	`756`	`pts/4`	`S`	`15:45:24`	`0:01`	`csh`
`root`	`24891`	`0.2`	`1.7`	`2296`	`492`	`?`	`S`	`16:13:12`	`0:00`	`sendmail`
`npc`	`23454`	`0.2`	`1.9`	`856`	`580`	`pts/3`	`S`	`16:08:02`	`0:00`	`stat`
`root`	`23449`	`0.1`	`1.9`	`856`	`580`	`pts/2`	`S`	`16:08:00`	`0:00`	`stat`
`npc`	`23434`	`0.1`	`1.9`	`788`	`560`	`pts/0`	`S`	`16:06:48`	`0:00`	`script`
`root`	`3`	`0.0`	`0.0`	`0`	`0`	`?`	`S`	`Feb 04`	`19:27`	`fsflush`
`root`	`0`	`0.0`	`0.0`	`0`	`0`	`?`	`T`	`Feb 04`	`0:00`	`sched`
`root`	`1`	`0.0`	`0.5`	`652`	`132`	`?`	`S`	`Feb 04`	`0:26`	`init`
`root`	`2`	`0.0`	`0.0`	`0`	`0`	`?`	`S`	`Feb 04`	`0:02`	`pageout`
`root`	`156`	`0.0`	`1.8`	`1464`	`548`	`?`	`S`	`Feb 04`	`0:01`	`cron`
`root`	`159`	`0.0`	`2.4`	`1644`	`724`	`?`	`S`	`Feb 04`	`1:46`	`sshd`
`root`	`174`	`0.0`	`1.6`	`852`	`480`	`?`	`S`	`Feb 04`	`0:00`	`utmpd`
`root`	`203`	`0.0`	`2.1`	`1404`	`632`	`?`	`S`	`Feb 04`	`0:00`	`sac`
`root`	`204`	`0.0`	`2.1`	`1496`	`624`	`console`	`S`	`Feb 04`	`0:00`	`ttymon`
`root`	`206`	`0.0`	`2.3`	`1496`	`688`	`?`	`S`	`Feb 04`	`0:00`	`ttymon`
`root`	`10540`	`0.0`	`3.1`	`1800`	`936`	`?`	`S`	`14:19:23`	`0:10`	`sshd`
`npc`	`10543`	`0.0`	`1.5`	`1012`	`448`	`pts/1`	`S`	`14:19:33`	`0:00`	`csh`
`root`	`10554`	`0.0`	`0.0`	`276`	`4`	`pts/1`	`S`	`14:20:03`	`0:00`	`sh`
`root`	`11262`	`0.0`	`3.0`	`1712`	`904`	`?`	`S`	`15:01:22`	`0:02`	`sshd`
`npc`	`11265`	`0.0`	`2.5`	`1028`	`756`	`pts/0`	`S`	`15:01:25`	`0:00`	`csh`
`root`	`11456`	`0.0`	`2.7`	`1052`	`800`	`pts/1`	`S`	`15:05:33`	`0:00`	`csh`
`root`	`23429`	`0.0`	`1.8`	`764`	`536`	`pts/1`	`S`	`16:06:46`	`0:00`	`script`
`root`	`23430`	`0.0`	`1.9`	`788`	`560`	`pts/1`	`S`	`16:06:46`	`0:00`	`script`
`root`	`23431`	`0.0`	`2.4`	`996`	`732`	`pts/2`	`S`	`16:06:46`	`0:00`	`csh`
`npc`	`23433`	`0.0`	`1.8`	`764`	`536`	`pts/0`	`S`	`16:06:48`	`0:00`	`script`
`npc`	`23435`	`0.0`	`2.7`	`1024`	`804`	`pts/3`	`S`	`16:06:48`	`0:00`	`csh`
`root`	`23448`	`0.0`	`2.2`	`840`	`660`	`pts/2`	`S`	`16:08:00`	`0:00`	`vmstat`
`npc`	`23453`	`0.0`	`2.3`	`848`	`684`	`pts/3`	`S`	`16:08:02`	`0:00`	`iostat`

Adding all numbers in the RSS column, they roughly equal the system’s total main memory (only 32MB), which doesn’t count RAM consumed by the kernel or the buffer cache. Because much of the memory consumed by the processes is shared, it provides enough space to keep the parts of the programs that run while resident in memory and still allow extra space for the kernel and the buffer cache.

On this machine, the script command is used to capture output from the iostat and vmstat commands, which will be discussed shortly. The stat entry is a home-built script that adds date and time information to the output of these two utilities. As we’d expect, most of the CPU time is consumed by sendmail and mail.localprocesses. Also as we’d expect, concurrent MTA processes outnumber LDA processes, even though the email is sent to this server over a low-latency local area network.

Most of the rest of the processes running on this server are either standard parts of the operating system or processes related to remote connections to the server.

7.3.2 `top`

Many UNIX operating systems include the venerable toputility, which is also one of the first Open Source programs installed on many other operating systems. The toputility lists the largest CPU resource consumers on a system and updates this list periodically, typically every few seconds. For understanding the general state of the system, some of the most valuable information appears in the first few lines of the program’s display. A system consistently showing a CPU idle state at or near 0% is almost certainly CPU bound. The caveat is that some systems list an iowaitstate indicating what percentage of processes are waiting for I/O. This number doesn’t represent CPU time being consumed, but rather consists of the system’s best guess as to the amount of CPU time that would be consumed if no processes were blocked waiting for I/O. If a significant percentage of processes are in the iowait state, then the system may show 0% idle while the CPU is barely being used.

In the upper-left corner is the last process identifier (PID) used by the system. From its rate of change, one can deduce how many new processes are spawned per second, giving some idea of how fast sessions are coming and going on the server. This method isn’t useful on those few operating systems, such as OpenBSD, that assign new PIDs randomly rather than sequentially.

The memory information displayed isn’t as useful as one would first expect. On nearly any system that has been running for a few minutes, or even a few seconds if it’s busy, we should expect the amount of free memory listed to stay very near zero. On contemporary operating systems, any RAM that goes unused by processes will be allocated to caching some data. Thus, just because there is very little memory free, it doesn’t mean that the system is memory starved. On some operating systems, topwill show more memory information, such as how much RAM is allocated to filesystem caches; if this number drops near zero, it would likely indicate that the server would use additional RAM effectively.

Even more so than with ps, the information displayed via top varies from operating system to operating system. A thorough reading of the utility’s man page should be performed before its results are interpreted.

7.3.3 `vmstat`

The vmstat utility explores the activity of the virtual memory system, which includes real memory used by processes, memory used for caching, and swap space. The first line of data produced summarizes the activity since the system was booted. Generally, this information should be ignored.

While it’s much less impressive than the output that one will find on a true high-performance email server, some example output from the CPU-bound server during one of the test cases discussed in this book can be instructive. This output appears in Table 7.2.

Table 7.2. Sample `vmstat 15` Output from the CPU-Bound Test Server

% vmstat 15

r b w swap free re mf pi po fr de sr in sy cs us sy id

0 0 0 4692 1804 0 0 0 0 0 0 0 3 39 22 0 0 100

7 1 0 64440 1956 5 925 6 38 38 0 0 256 2150 328 31 68 2

8 0 0 64008 1800 8 923 0 40 44 0 1 249 2030 285 24 67 9

8 0 0 64612 2276 10 950 1 46 48 0 0 249 2079 283 27 66 7

7 1 0 64712 2956 4 954 6 23 94 0 22 262 2101 294 28 67 5

7 1 0 64320 2852 0 1024 2 0 0 0 0 271 2260 317 27 73 0

9 0 0 61684 1960 8 995 6 62 170 0 37 281 2186 329 29 71 0

7 0 0 62968 3836 0 1061 4 0 0 0 0 255 2209 315 29 71 0

15 1 0 58508 1800 12 956 7 71 138 0 27 288 2302 342 30 70 0

6 0 0 62936 4860 2 1035 1 10 10 0 0 252 2072 299 26 71 2

Excessive memory activity will cause heavy paging, which translates into relatively large numbers in the pi and po columns. Of course, what constitutes a large number depends heavily on the particular system. Interpreting these numbers without a baseline will be next to impossible. In the example case, these numbers are so small that we can safely conclude that the system is not memory bound.

On those systems whose vmstat provides this information, another column worth tracking is de. It gives a system’s expected short-term memory deficiency, for which memory space will have to be actively reclaimed. A nonzero entry will show up occasionally in this column on a healthy but busy system. The more often this result appears, though, the more likely the system could use more memory. Our sample data show no deficiencies, another indication that this system is not memory bound.

The first column, labeled r, indicates the number of runnable processes, which provides a snapshot of the system load average. In this example, a number of processes want to run but can’t because they have no CPU time slice available to them. The second column, labeled b, gives the number of processes that are blocked from proceeding because they are waiting for I/O. If a significant number of processes are listed in this column, the system is likely I/O bound. In our example, we occasionally see a blocked process, but this event is rare, giving us an indication that this system isn’t I/O bound. Yet one more variable worth tracking is the third column, labeled w. It represents the number of processes that are either runnable or have been idle for a short period of time and have now been swapped out. Frequent nonzero numbers in this column also indicate that the server may be desperately short of RAM. The example looks like it’s in good shape on that point.

In the past, one could tell whether a system was memory starved just by looking for swapping activity, as opposed to the more healthy activity of paging. Paging is the process of writing parts of process data to swap space to make room for pages of other data in active memory. An operating system may “page out” part of a process if that page hasn’t been accessed in a while, even if the process is running. This efficient behavior allows new processes to start up more quickly because memory reclamations don’t need to occur first, and it leaves more room for caching data, leading to better performance. Some amount of paging will occur on all operating systems and is considered normal and healthy.

Swapping usually refers to taking a process and moving its entire memory image to disk. It might happen if the process has remained idle for a very long time (tens of seconds, which is a very long time in computer terms) or if the system desperately needs to make room for new processes. “Desperation swapping” and “thrashing” are terms used to describe a system that is so memory starved that nearly every time a process receives a CPU slice, it must be read in from swap to active memory before it can proceed. This horrible circumstance effectively slows memory access (typically measured in tens of nanoseconds) to disk speeds (measured in ones to tens to hundreds of milliseconds, a difference of two to four orders of magnitude). Once a system starts thrashing, it will not operate efficiently. One should aggressively avoid this situation.

Somewhat unfortunately, as virtual memory algorithms have become more complex and sophisticated over the years, it’s become more difficult to tell in a vacuum whether a system is thrashing. In fact, many operating systems don’t distinguish between paging and swapping, eliminating the latter behavior altogether. Here is where a baseline becomes crucial. One must understand what sort of paging statistics occur on a heavily loaded but properly operating server before one can determine whether a system is beginning to thrash. However, once the disks with swap on them begin to get loaded, it will be painfully obvious that the system has simply run out of memory. Of course, this behavior will occur beyond the point where a server starts to slow down noticeably.

Solaris 8 introduced a new system for managing the buffer cache. Now the page daemon is no longer needed to free up memory used to cache filesystem information. Consequently, the page daemon does not have to do any work to reclaim memory space for new processes. The upshot is that on Solaris 8, if the sr field of vmstatoutput is nonzero, running processes are being paged to disk to make room for new processes. On this operating system, it has now become more straightforward to identify significant memory deficiencies. Significant activity in the sr field on other operating systems can indicate that the machine is memory starved, but the demarkation point is not as obvious as it is on Solaris 8.

7.3.4 `iostat`

The iostattool is similar to vmstat, except that it measures system I/O rather than virtual memory statistics. On many systems, it can measure not only disk-by-disk data transfers, but also I/O information to and from a wide variety of sources, including tape drives, printers, scanners, ttys, and so on. Like vmstat, this command displays CPU information in the last set of columns. On many systems, if one specifies no I/O devices, it can be a good mechanism to track CPU usage in scripts, such as running iostat -c 60 to get basic output of CPU information every minute on a Linux or Solaris system. As with vmstat, the first line of output by the iostatprogram is a summary since boot time and is effectively useless. Table 7.3 gives some data gathered with iostatwhile testing earlier examples in this book.