Home > Articles > Operating Systems, Server > Solaris

  • Print
  • + Share This
This chapter is from the book


Table 9-3 summarizes the scripts that follow and the providers they use.

Table 9-3. Application Script Summary





Snoop process execution



System call time statistics by process



Profile application on-CPU user stacks



Count application off-CPU user stacks by time



User-level mutex and read/write lock statistics



Snoop process signals



Signal distribution by source and destination processes



Sample multithreaded CPU usage



This is a script version of the "New Processes" one-liner shown earlier. Tracing the execution of new processes provides important visibility for applications that call the command line; some applications can call shell commands so frequently that it becomes a performance issue—one that is difficult to spot in traditional tools (such as prstat(1M) and top(1)) because the processes are so short-lived.


1   #!/usr/sbin/dtrace -s
3   #pragma D option quiet
4   #pragma D option switchrate=10hz
6   dtrace:::BEGIN
7   {
8           printf("%-8s %5s %6s %6s %s\n", "TIME(ms)", "UID", "PID", "PPID",
9               "COMMAND");
10          start = timestamp;
11  }
13  proc:::exec-success
14  {
15          printf("%-8d %5d %6d %6d %s\n", (timestamp - start) / 1000000,
16              uid, pid, ppid, curpsinfo->pr_psargs);
17  }

Script procsnoop.d



The following shows the Oracle Solaris commands executed as a consequence of restarting the cron daemon via svcadm(1M):

solaris# procsnoop.d
3227         0  13273  12224 svcadm restart cron
3709         0  13274    106 /sbin/sh -c exec /lib/svc/method/svc-cron
3763         0  13274    106 /sbin/sh /lib/svc/method/svc-cron
3773         0  13275  13274 /usr/bin/rm -f /var/run/cron_fifo
3782         0  13276  13274 /usr/sbin/cron

The TIME(ms) column is printed so that the output can be postsorted if desired (DTrace may shuffle the output slightly because it collects buffers from multiple CPUs).

See Also: execsnoop

A program called execsnoop exists from the DTraceToolkit, which has similar functionality to that of procsnoop. It was written originally for Oracle Solaris and is now shipped on Mac OS X by default. execsnoop wraps the D script in the shell so that command-line options are available:

macosx# execsnoop -h
USAGE: execsnoop [-a|-A|-ehjsvZ] [-c command]
       execsnoop                # default output
                -a              # print all data
                -A              # dump all data, space delimited
                -e              # safe output, parseable
                -j              # print project ID
                -s              # print start time, us
                -v              # print start time, string
                -Z              # print zonename
                -c command      # command name to snoop
        execsnoop -v            # human readable timestamps
        execsnoop –Z          # print zonename
        execsnoop -c ls         # snoop ls commands only

execsnoop traces process execution by tracing the exec() system call (and variants), which do differ slightly between operating systems. Unfortunately, system calls are not a stable interface, even across different versions of the same operating system. Small changes to execsnoop have been necessary to keep it working across different versions of Oracle Solaris, because of subtle changes with the names of the exec() system calls. The lesson here is to always prefer the stable providers, such as the proc provider (which is stable) instead of syscall (which isn't).


procsystime is a generic system call time reporter. It can count the execution of system calls, their elapsed time, and on-CPU time and can produce a report showing the system call type and process details. It is from the DTraceToolkit and shipped on Mac OS X by default in /usr/bin.


The essence of the script is explained here; the actual script is too long and too uninteresting (mostly dealing with command-line options) to list; see the DTraceToolkit for the full listing.

1      syscall:::entry
2      /self->ok/
3      {
4            @Counts[probefunc] = count();
5            self->start = timestamp;
6            self->vstart = vtimestamp;
7      }
9      syscall:::return
10      /self->start/
11      {
12            this->elapsed = timestamp - self->start;
13            this->oncpu = vtimestamp - self->vstart;
14            @Elapsed[probefunc] = sum(this->elapsed);
15            @CPU[probefunc] = sum(this->cpu);
16            self->start = 0;
17            self->vstart = 0;
18      }

A self->ok variable is set beforehand to true if the current process is supposed to be traced. The code is then straightforward: Time stamps are set on the entry to syscalls so that deltas can be calculated on the return.


Examples include usage and file system archive.


Command-line options can be listed using -h:

solaris# procsystime -h
lox# ./procsystime -h
USAGE: procsystime [-aceho] [ -p PID | -n name | command ]
                  -p PID          # examine this PID
                  -n name         # examine this process name
                  -a              # print all details
                  -e              # print elapsed times
                  -c               # print syscall counts
                  -o              # print CPU times
                  -T              # print totals
       procsystime -p 1871     # examine PID 1871
       procsystime -n tar      # examine processes called "tar"
       procsystime -aTn bash   # print all details for bash
       procsystime df -h        # run and examine "df -h"

File System Archive

The tar(1) command was used to archive a file system, with procsystime tracing elapsed times (which is the default) for processes named tar:

solaris# procsystime -n tar
Tracing... Hit Ctrl-C to end...

Elapsed Times for processes tar,

         SYSCALL           TIME (ns)
           fcntl               58138
         fstat64               96490
          openat              280246
           chdir             1444153
           write             8922505
          open64            15294117

        openat64            16804949
           close            17855422
      getdents64            46679462
       fstatat64            98011589
            read          1551039139

Most of the elapsed time for the tar(1) command was in the read() syscall, which is expected because tar(1) is reading files from disk (which is slow I/O). The total time spent waiting for read() syscalls during the procsystime trace was 1.55 seconds.


This is a script version of the DTrace one-liner to profile the user stack trace of a given application process name. As one of the most useful one-liners, it may save typing to provide it as a script, where it can also be more easily enhanced.


1      #!/usr/sbin/dtrace -s
3      profile:::profile-1001
4      /execname == $$1/
5      {
6            @["\n on-cpu (count @1001hz):", ustack()] = count();
7      }

Script uoncpu.d



Here the uoncpu.d script is used to frequency count the user stack trace of all currently running Perl programs. Note perl is passed as a command-line argument, evaluated in the predicate (line 4):

# uoncpu.d perl
dtrace: script 'uoncpu.d' matched 1 probe
[...output truncated...] 

  on-cpu (count @1001hz):
  on-cpu (count @1001hz):

The hottest stacks identified include the Perl_pp_multiply() function, suggesting that Perl is spending most of its time doing multiplications. Further analysis of those functions and using the perl provider, if available (see Chapter 8), could confirm.


As a companion to uoncpu.d, the uoffcpu.d script measures the time spent off-CPU by user stack trace. This time includes device I/O, lock wait, and dispatcher queue latency.


1      #!/usr/sbin/dtrace -s
3      sched:::off-cpu
4      /execname == $$1/
5      {
6            self->start = timestamp;
7      }
9      sched:::on-cpu
10      /self->start/
11      {
12            this->delta = (timestamp - self->start) / 1000;
13            @["off-cpu (us):", ustack()] = quantize(this->delta);
14            self->start = 0;
15      }

Script uoffcpu.d



Here the uoffcpu.d script was used to trace CPU time of bash shell processes:

# uoffcpu.d bash
dtrace: script 'uoffcpu.d' matched 6 probes
  off-cpu (us):

           value  ------------- Distribution ------------- count
          262144 |                                         0
          524288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
         1048576 |                                         0

  off-cpu (us):

           value  ------------- Distribution ------------- count
           32768 |                                         0
           65536 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            5
          131072 |@@@@@@@@@@@                              2
          262144 |                                         0

While tracing, in another bash shell, the command sleep 1 was typed and executed. The previous output shows the keystroke latency (mostly 65 ms to 131 ms) of the read commands, as well as the time spent waiting for the sleep(1) command to complete (in the 524 to 1048 ms range, which matches expectation: 1000 ms).

Note the user stack frame generated by the ustack() function contains a mix of symbol names and hex values (for example, bash`0x806dff4) in the output. This can happen for one of several reasons whenever ustack() is used. DTrace actually collects and stores the stack frames has hex values. User addresses are resolved to symbol names as a postprocessing step before the output is generated. It is possible DTrace will not be able to resolve a user address to a symbol name if any of the following is true:

  • The user process being traced has exited before the processing can be done.
  • The symbol table has been stripped, either from the user process binary or from the shared object libraries it has linked.
  • We are executing user code out of data via jump tables. 6


plockstat(1M) is a powerful tool to examine user-level lock events, providing details on contention and hold time. It uses the DTrace plockstat provider, which is available for developing custom user-land lock analysis scripts. The plockstat provider (and the plockstat(1M) tool) is available on Solaris and Mac OS X and is currently being developed for FreeBSD.


plockstat(1M) is a binary executable that dynamically produces a D script that is sent to libdtrace (instead of a static D script sent to libdtrace via dtrace(1M)). If it is of interest, this D script can be examined using the -V option:

solaris# plockstat -V -p 12219
plockstat: vvvv D program vvvv
        self->rwblock[arg0] = timestamp;
        self->mtxblock[arg0] = timestamp;
        self->mtxspin[arg0] = timestamp;
/self->rwblock[arg0] && arg1 == 1 && arg2 != 0/
        @rw_w_block[arg0, ustack(5)] =
            sum(timestamp - self->rwblock[arg0]);
        @rw_w_block_count[arg0, ustack(5)] = count(); 
        self->rwblock[arg0] = 0;
        rw_w_block_found = 1;
[...output truncated...]


Here the plockstat(1M) command traced all lock events (-A for both hold and contention) on the Name Service Cache Daemon (nscd) for 60 seconds:

solaris# plockstat -A -e 60 -p `pgrep nscd`
Mutex hold 

Count     nsec Lock                         Caller
   30  1302583 0x814c08c                    libnsl.so.1`rpc_fd_unlock+0x4d 
  326    15687 0x8089ab8                    nscd`_nscd_restart_if_cfgfile_changed+0x6c
    7   709342 libumem.so.1`umem_cache_lock libumem.so.1`umem_cache_applyall+0x5e
  112    16702 0x80b67b8                    nscd`lookup_int+0x611
    3   570898 0x81a0548                    libscf.so.1`scf_handle_bind+0x231
   60    24592 0x80b20e8                    nscd`_nscd_mutex_unlock+0x8d
   50    24306 0x80b2868                    nscd`_nscd_mutex_unlock+0x8d
   30    19839 libnsl.so.1`_ti_userlock     libnsl.so.1`sig_mutex_unlock+0x1e
    7    83100 libumem.so.1`umem_update_lock libumem.so.1`umem_update_thread+0x129
[...output truncated...] 

R/W reader hold 

Count     nsec Lock                         Caller
    30   95341 0x80c0e14                    nscd`_nscd_get+0xb8
   120   15586 nscd`nscd_nsw_state_base_lock nscd`_get_nsw_state_int+0x19c
    60   20256 0x80e0a7c                    nscd`_nscd_get+0xb8
   120    9806                              nscd`addrDB_rwlock   nscd`_nscd_is_int_addr+0xd1
    30   39155 0x8145944                    nscd`_nscd_get+0xb8
[...output truncated...]

R/W writer hold

Count     nsec Lock                         Caller
   30    16293 nscd`addrDB_rwlock           nscd`_nscd_del_int_addr+0xeb
   30    15440 nscd`addrDB_rwlock           nscd`_nscd_add_int_addr+0x9c
    3    14279 nscd`nscd_smf_service_state_lock nscd`query_smf_state+0x17b

Mutex block

Count     nsec Lock                         Caller
    2   119957 0x8089ab8                    nscd`_nscd_restart_if_cfgfile_changed+0x3e

Mutex spin

Count      nsec Lock                        Caller
  1       37959 0x8089ab8                   nscd`_nscd_restart_if_cfgfile_changed+0x3e

Mutex unsuccessful spin

Count      nsec Lock                        Caller
    2     42988 0x8089ab8                   nscd`_nscd_restart_if_cfgfile_changed+0x3e

While tracing, there were very few contention events and many hold events. Hold events are normal for software execution and are ideally as short as possible, while contention events can cause performance issues as threads are waiting for locks.

The output has caught a spin event for the lock at address 0x8089ab8 (no symbol name) from the code path location nscd`_nscd_restart_if_cfgfile_changed+0x3e, which was for 38 us. This means a thread span on-CPU for 38 us before being able to grab the lock. On two other occasions, the thread gave up spinning after an average of 43 us (unsuccessful spin) and was blocked for 120 us (block), both also shown in the output.


The kill.d script prints details of process signals as they are sent, such as the PID source and destination, signal number, and result. It's named kill.d after the kill() system call that it traces, which is used by processes to send signals.


This is based on the kill.d script from the DTraceToolkit, which uses the syscall provider to trace the kill() syscall. The proc provider could also be used via the signal-* probes, which will match other signals other than via kill() (see sigdist.d next).

1   #!/usr/sbin/dtrace -s
3   #pragma D option quiet
5   dtrace:::BEGIN
6   {
7          printf("%-6s %12s %6s %-8s %s\n",
8              "FROM", "COMMAND", "SIG", "TO", "RESULT");
9   }
11  syscall::kill:entry
12  {
13          self->target = (int)arg0;
14          self->signal = arg1;
15  }
17  syscall::kill:return
18  {
19          printf("%-6d %12s %6d %-8d %d\n",
20              pid, execname, self->signal, self->target, (int)arg0);
21          self->target = 0;
22          self->signal = 0;
23  }

Script kill.d


Note that the target PID is cast as a signed integer on line 13; this is because the kill() syscall can also send signals to process groups by providing the process group ID as a negative number, instead of the PID. By casting it, it will be correctly printed as a signed integer on line 19.


Here the kill.d script has traced the bash shell sending signal 9 (SIGKILL) to PID 12838 and sending signal 2 (SIGINT) to itself, which was a Ctrl-C. kill.d has also traced utmpd sending a 0 signal (the null signal) to various processes: This signal is used to check that PIDs are still valid, without signaling them to do anything (see kill(2)).

# kill.d
12224         bash      9 12838    0
3728         utmpd      0 4174     0
3728         utmpd      0 3949     0
3728         utmpd      0 10621    0
3728         utmpd      0 12221    0
12224         bash      2 12224    0


The sigdist.d script shows which processes are sending which signals to other processes, including the process names. This traces all signals: the kill() system call as well as kernel-based signals (for example, alarms).


This script is based on /usr/demo/dtrace/sig.d from Oracle Solaris and uses the proc provider signal-send probe.

1      #!/usr/sbin/dtrace -s
45      #pragma D option quiet
47      dtrace:::BEGIN
48      {
49            printf("Tracing... Hit Ctrl-C to end.\n");
50      }
52      proc:::signal-send
53      {
54            @Count[execname, stringof(args[1]->pr_fname), args[2]] = count();
55      }
57      dtrace:::END
58      {
59            printf("%16s %16s %6s %6s\n", "SENDER", "RECIPIENT", "SIG", "COUNT");
60            printa("%16s %16s %6d %6@d\n", @Count);
61      }

Script sigdist.d



The sigdist.d script has traced the bash shell sending signal 9 (SIGKILL) to a sleep process and also signal 2 (SIGINT, Ctrl-C) to itself. It's also picked up sshd sending bash the SIGINT, which happened via a syscall write() of the Ctrl-C to the ptm (STREAMS pseudo-tty master driver) device for bash, not via the kill() syscall.

# sigdist.d
Tracing... Hit Ctrl-C to end.
          SENDER        RECIPIENT    SIG  COUNT
            bash             bash      2      1
            bash            sleep      9      1
            sshd             bash      2      1
            sshd           dtrace      2      1
           sched             bash     18      2
            bash             bash     20      3
           sched         sendmail     14      3
           sched         sendmail     18      3
           sched          proftpd     14      7
           sched        in.mpathd     14     10


The threaded.d script provides data for quantifying how well multithreaded applications are performing, in terms of parallel execution across CPUs. If an application has sufficient CPU bound work and is running on a system with multiple CPUs, then ideally the application would have multiple threads running on those CPUs to process the work in parallel.


This is based on the threaded.d script from the DTraceToolkit.

1      #!/usr/sbin/dtrace -s
3      #pragma D option quiet
5      profile:::profile-101
6      /pid != 0/
7      {
8            @sample[pid, execname] = lquantize(tid, 0, 128, 1);
9      }
11      profile:::tick-1sec
12      {
13            printf("%Y,\n", walltimestamp);
14            printa("\n @101hz   PID: %-8d CMD: %s\n%@d", @sample);
15            printf("\n");
16            trunc(@sample);
17      }

Script threaded.d



To demonstrate threaded.d, two programs were written (called test0 and test1) that perform work on multiple threads in parallel. One of the programs was coded with a lock "serialization" issue, where only the thread holding the lock can really make forward progress. See whether you can tell which one:

# threaded.d
2010 Jul  4 05:17:09,

 @101hz   PID: 12974    CMD: test0

           value  ------------- Distribution ------------- count
               1 |                                         0
               2 |@@@@@@@@@                                28
               3 |@@                                       6
               4 |@@@@@@@@@@@                              32
               5 |@@@@@                                    14
               6 |@@@@@                                    15
               7 |@@@                                      8
               8 |@@                                       5
               9 |@@@                                      10
              10 |                                         0

 @101hz   PID: 12977    CMD: test1

           value  ------------- Distribution ------------- count
               1 |                                         0
               2 |@@@@                                     77
               3 |@@@@@@                                   97
               4 |@@@@                                     77
               5 |@@@@@                                    87
               6 |@@@@                                     76
               7 |@@@@@@                                   101
               8 |@@@@                                     76
               9 |@@@@@@                                   100
              10 |                                         0

threaded.d prints output every second, which shows a distribution plot where value is the thread ID and count is the number of samples during that second. By glancing at the output, both programs had every thread sampled on-CPU during the one second, so the issue may not be clear. The clue is in the counts: threaded.d is sampling at 101 Hertz (101 times per second), and the sample counts for test0 only add up to 118 (a little over one second worth of samples on one CPU), whereas test1 adds up to 691. The program with the issue is test0, which is using a fraction of the CPU cycles that test1 is able to consume in the same interval.

This was a simple way to analyze the CPU execution of a multithreaded application. A more sophisticated approach would be to trace kernel scheduling events (the sched provider) as the application threads stepped on- and off-CPU.

  • + Share This
  • 🔖 Save To Your Account