Home > Articles > Certification > Other IT

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Measuring CPU Load

Various tools are available for monitoring CPU load, but it's usually the combination of these tools that provides the most useful data.

Other than just showing how long the system has been up, the uptime command can be used to give you a rough estimate of the system load. The uptime command prints the current time, the length of time that the system has been up, and the average number of jobs in the run queue over the last 1, 5, and 15 minutes. When I type the uptime command, the system responds with the following:

11:23am up 2 day(s), 19:15, 1 user, load average: 0.01, 0.02, 0.04

Let's look at the load average numbers. The load average is the sum of the run queue length and the number of jobs currently running on CPUs. In short, it's a rough estimate of CPU usage. Notice the figures, showing averages over the last 1, 5, and 15 minutes. High load averages mean that the system is being used heavily and the response time is sluggish. What is a high load average? It depends on your system. If you've been keeping an eye on the load average, you'll know what is a good average and what is a bad average based on the history of the system. Normally, I would say a load average of 3 or less is good, but I've seen systems with a load average of 5 in which performance is still good. Different system configurations behave differently under the same load averages.

Keep in mind that the load average is simply a starting point. Just because the load average is low, it doesn't mean you are not experiencing slow response times.

The ps command will give you more useful information regarding what is going on with your system. Use the following options with the ps command to get a complete picture of all the processes running on your system:

ps –elf

The system responds with the following:

F S   UID  PID PPID C PRI NI  ADDR  SZ  WCHAN  STIME TTY  TIME CMD
19 T   root   0   0 0  0 SY  ?   0      Apr 30 ?  0:18 sched
 8 S   root   1   0 0 40 20  ?  150    ?  Apr 30 ?  0:00 /etc/init -
19 S   root   2   0 0  0 SY  ?   0    ?  Apr 30 ?  0:00 pageout
19 S   root   3   0 0  0 SY  ?   0    ?  Apr 30 ?  1:01 fsflush
 8 S   root  333   1 0 40 20  ?  217    ?  Apr 30 ?  0:00 \
 /usr/lib/saf/sac -t 300
 8 S   root 2087   1 0 40 20  ?  239    ? 10:42:32 ?  \
 0:00/bin/ksh /usr/dt/bin/sdtvolcheck -d
 8 S   root  144   1 0 40 20  ?  273    ?  Apr 30 ?  0:00 \
 /usr/sbin/rpcbind
 8 S   root  52   1 0 40 20  ?  268    ?  Apr 30 ?  0:00 \
 /usr/lib/sysevent/syseventd
 8 S   root  62   1 0 40 20  ?  343    ?  Apr 30 ?  0:01 \
 /usr/lib/picl/picld
 8 S   root  190   1 0 40 20  ?  562    ?  Apr 30 ?  0:00 \
 /usr/lib/autofs/automountd
 8 S   root  233   1 0 40 20  ?  173    ?  Apr 30 ?  0:00 \
 /usr/lib/power/powerd
 8 S   root  166   1 0 40 20  ?  292    ?  Apr 30 ?  0:00 \
 /usr/sbin/inetd -s
 8 S  daemon  183   1 0 40 20  ?  306    ?  Apr 30 ?  0:00 \
 /usr/lib/nfs/statd
 8 S   root  201   1 0 40 20  ?  410    ?  Apr 30 ?  0:00 \
 /usr/sbin/syslogd
 8 S   root  220   1 0 40 20  ?  394    ?  Apr 30 ?  0:00 \
 /usr/lib/lpsched
 8 S   root  180   1 0 40 20  ?  266    ?  Apr 30 ?  0:00 \
 /usr/lib/nfs/lockd
 8 S   root  215   1 0 40 20  ?  449    ?  Apr 30 ?  0:01 \
 /usr/openwin/bin/fbconsole -d :0

The ps command was covered in detail in Chapter 15, "Managing Processes," so I won't go into detail on this command again.

The prstat command is similar to the ps command, except (as shown in Chapter 15) it continually updates the display of information on your screen. Use this command to watch processes on your system that might be eating up system resources. The sdtprocess GUI, also described in Chapter 15, provides a friendlier graphical version of this command.

vmstat provides a convenient summary of system activity as well. When you run vmstat for the first time, the displayed result represents a summary of information since boot time. To obtain useful real-time statistics, run vmstat with a time step as follows:

vmstat 30

This tells vmstat to run every 30 seconds and to display the results on the screen as follows until you type Ctrl+C to interrupt the command:

kthr   memory      page      disk     faults   cpu
 r b w  swap free re mf pi po fr de sr dd f0 s0 --  in  sy  cs us sy id
 0 0 0 596704 31592  0  1 0 0 0 0 0 0 0 0 0 403  96  61 2 0 98
 0 0 0 595040 24624  2 12 0 0 0 0 0 1 0 0 0 404 104  62 0 0 99
 0 0 0 595040 24624  2 11 0 0 0 0 0 1 0 0 0 413 147  79 0 1 99

NOTE

Disregard the first line of output. This is a summary of information since the system was booted.

The vmstat command outputs columns of information with a header across the top. Each field of output is described in Table 19.1.

Table 19.1 vmstat Fields

Field

Description

kthr/r

Run queue length.

kthr/b

Kernel threads blocked while waiting for I/O.

kthr/w

Idle processes that have been swapped.

memory/swap

Free, unreserved swap space (KB).

memory/free

Free memory (KB).

page/re

Pages reclaimed from the free list.

page/mf

Minor faults (page in memory but not mapped). If the page is still in memory, a minor fault remaps the page.

page/pi

Paged in from swap (KB/s). (When a page is brought back from the swap device, the process will stop execution and wait. This might affect performance.)

page/po

Paged out to swap (KB/s). The page has been written and freed.

page/fr

Freed or destroyed (KB/s). This column reports the activity of the page scanner.

page/de

Anticipated short-term memory shortfall (KB).

page/sr

Scan rate (pages). This number is not reported as a "rate" but as a total number of pages scanned.

disk/s#

Disk activity for disk # (disk operations per second).

faults/in

Interrupts per second.

faults/sy

System calls per second.

faults/cs

Context switches per second.

cpu/us

User CPU time (%).

cpu/sy

System (kernel) CPU time (%).

cpu/id

Idle + I/O wait CPU time (%).


NOTE

The free column in vmstat now really does mean memory that is free and not used by the page cache. In the past, it gave unreliable results.

The column labeled r under the kthr section is the run queue of processes waiting to get on the CPU(s). The id column is CPU idle time. If a 0 (zero) appears in this column, the system lacks the CPU resources to keep up with the process demand. Here's an example of a system that lacks CPU resources:

kthr   memory      page        disk      faults    cpu
r b w  swap  free  re mf pi po fr de sr m0 m1 m2 m3  in  sy  cs us sy id
45 0 0 2887216 182104 3 707 449 6 455 0  80 2 6 1 0  1531 5797 983 61 30 9
58 0 0 2831312 46408 5 983 582 56 3211 0 492 0 0 0 0  1413 4797 1027 69 31 0
55 0 0 2830944 56064 2 649 656 3 806 0 121 0 0 0 0  1441 4627 989 69 31 0

See that the CPU idle time is zero, and the CPU is spending the majority of CPU time in user space (see us column). Two approaches can be taken here: Add extra CPUs or look over the application code to determine if the application can be opti

  • + Share This
  • 🔖 Save To Your Account