Mac OS X Unleashed

Mac OS X Unleashed

By John Ray and William C. Ray

Communication Between Processes: Redirection, Pipes

Building an operating system out of a multitude of small, cooperating processes would not provide such flexibility and power to the user were it not for a simple method of making all of these processes speak to each other. At the heart of the interprocess communications model of Unix, is a simple but amazingly effective abstraction of the idea of input and output.

To paraphrase the model on which Unix bases input and output, you can imagine that Unix thinks of user input to a program as a stream—a stream of information. Output from the program back to the user can be thought of in the same way. A stream of information is simply a collection of information that flows in or out of the program in a serial (ordered) fashion. A user can't send two pieces of information to a program at the same time—two key presses, no matter how closely they occur, are ordered, one first and one second. A cursor moving across a screen provides information serially as to where it is now, and where it was then. Even if two events manage to occur simultaneously, the electronics of the machine can't really deal with simultaneous events, and so they end up being registered as separate events occurring very close in time. Output must be similarly serially ordered. Whether drawing data to the screen or sending data over an Internet connection, no two data items leave a program at exactly the same time; therefore, they are also a serial stream of information.

Because both input and output from processes are streams of information, and every function of the system from user programs to reading files to parts of the OS is a running process, Unix models the implementation of communication between the processes as simply tying the output stream of one process to another's input stream. Tying the output stream (named STDOUT) from one process to the input stream (named STDIN) of another is called creating a pipe between them. When you understand the view of data moving into or out of a process as being a data stream, it is immediately obvious that there is no need for the system to concern itself over the endpoints of the stream. One endpoint (STDOUT) might be a process taking input from a user, and the other endpoint (STDIN) might be a process manipulating that input and writing it into a file. On the other hand, the same input could be placed in a file, and a process could read that file, creating the same output on STDOUT, and sending it via STDIN into the same manipulation program. There would be absolutely no difference between these two situations from the OS's point of view.

In short, this abstraction provides that so long as the input coming to a process "looks like" the input the process expects, it does not matter to the process or the OS where that input comes from. Likewise, provided that the destination of the output from the process "acts like expected," it does not matter where the output is actually going.

Redirection: STDIN, STDOUT, STDERR

Unix makes this input/output model available to the user through a concept known as redirection. This is implemented as a requirement that all processes adhere to certain conventions regarding input and output.

At the base is the notion that input and output from programs is generally from, and to, a user typing information at the command line. Even programs that are not intended to be used by a person at a command line are expected to adhere to the model that input comes from a user, and output goes to a user.

This might seem counterintuitive, but further conventions are required that allow this seeming restriction to be less restrictive, while generalizing the input/output model sufficiently that it can be applied to almost any need. Two of these are the idea of input arriving in a program through a virtual interface known as STDIN (standard input), and output leaving the program through a virtual interface known as STDOUT (standard output). It also requires the convention of a third virtual interface by which error messages can be conveyed, which is STDERR (standard error).

Redirection is accomplished by attaching these virtual interfaces to each other in various combinations—essentially redirecting the input or output from a process to a different location than to a user or from a user.

Standard In: STDIN

The virtual input interface to programs is called STDIN, for standard input. A program can expect the incoming data stream from the user (or any other source) to arrive at STDIN.

When you interact with a command-line program, the program is reading the data you are entering from STDIN. If you prefer not to enter the data by hand, you can put it in a file and redirect the contents of the file into the program's STDIN—the program will not know the difference.

A program that you can use for an example is the spell program. Apple hasn't distributed spell with OS X as of this writing, but we've provided instructions on how to install it in Chapter 15, "Command Line Applications and Application Suites." If you're using a system on which it's already been installed, follow along here. If not, spell still makes a good program for explanation because it has exactly the features we want to exhibit—just read along and imagine that it's really working until you get to Chapter 15.

The spell command finds misspellings. Given input from STDIN, spell parses through it, checks the input against a dictionary, and returns any misspellings it finds. Issued from the command line, you might type something like the following:

% spell
Now is the tyem for all good authors to come to thie ayde of some very
good Unix users
Ctrl+d

Pressing Ctrl+d finishes the input, sending an end-of-data signal into STDIN, effectively telling the program that there is no further information to come. The spell program goes to work, and returns the following:

tyem
thie
ayde

Each of the misspelled words (or at least words that aren't in the dictionary) is displayed, exactly as expected.

This might not seem to be a particularly useful program at first glance—how often do you want to type a sentence, just to find out what words are misspelled in it? The key to its usefulness, however, is that the spell program does not care whether you typed the input, or whether the input came from a file.

Now to try it with data from a file. Fire up your favorite text editor, and create a file containing the same text you typed to spell previously. Then try spell by redirecting this file into its STDIN interface. If you named your file reallydumbfile, you can run spell on it by typing the following:

% spell < reallydumbfile

     tyme
     thie
     ayde

The < character redirects STDIN for the program to its left to come from the file named to its right. Here, it redirects STDIN for the spell program so that it comes from the file rea l lydumbfile, rather than from your keyboard.

Standard Out: STDOUT

The virtual output interface that Unix provides to programs is called STDOUT, for standard output. Just as you can redirect STDIN from a file, if you want to store the output of a command in a file, you can redirect STDOUT from the program into the file. The > character directs the STDOUT of the program to its left into the file named to its right. For example, if you would like to collect the last few lines of /var/log/system.log into a file in your home directory, you could type

% tail -20 /var/log/system.log > ~/my-output

This command directs the shell to create the file my-output in your home directory, and to redirect STDOUT from the tail command into it. If my-output already exists in your home directory, it will be overwritten by the output from tail.

If you'd prefer to collect and archive the data, by appending it to my-output instead of overwriting it, the shell can be directed to append rather than replace the data. In this case, STDOUT is redirected with >> instead of the single >. The >> character pair appends the STDOUT of the program to the left into the file named on its right.

You can also combine STDOUT and STDIN, like so:

[localhost:~/Documents] nermal% spell < reallydumbfile > reallydumbspelling
[localhost:~/Documents] nermal% ls

     get_termcap         lynx.cfg            reallydumbspelling  termcap-1.3.tar
     lynx                reallydumbfile      termcap-1.3         test

[localhost:~/Documents] nermal% cat reallydumbspelling

     tyem
     thie
     ayde

Standard Error: STDERR

To make your life easier, Unix actually has two different output interfaces that it defines for programs. The first, STDOUT, has just been covered. The second, STDERR, is used to allow the program to provide error and diagnostic information to the user. This is done for two reasons. First, it allows error information to be reported in such a way that it does not interfere with data on the STDOUT interface. Second, if you are redirecting STDOUT from a program to another program or to a file, you would not see error messages if they were carried on STDOUT. By providing a separate error channel, the user is given the choice of how and where error and diagnostic information should be displayed, independent of data that is actually correct output data.

If you want to redirect STDERR into the same stream as STDOUT, effectively combining these two different pieces of information, you can do so by using the character pair >& to indicate redirection in the command, instead of >.

Again, if you've chosen to use a shell other than tcsh or csh, the redirection syntax is likely to vary considerably. See your online man pages to learn how the shell of your preference behaves.

Pipes

Finally, there is nothing in the input/output model that restricts redirection to coming from or going into files. STDIN and STDOUT can just as easily be tied together instead of being tied into files or the command line.

Perhaps more correctly, the OS never really redirects to or from files. What the OS is really doing when you redirect into a file is invisibly creating a process that writes into a file, and redirecting your output to the STDIN of the process writing the file. Likewise, when you redirect a file into a program's STDIN, the OS is invisibly creating a process that opens and reads the file, and is tying the STDOUT from this process into your process's STDIN. For the user's convenience, these common actions are abbreviated into the < and > redirection characters.

Programs, on the other hand, are connected by directly redirecting their STDOUT and STDIN interfaces with a pipe. To create a pipe in Unix, you simply use a | character between the programs on the command line.

Again, an example is more illustrative than a considerable amount of explanation. Consider a situation in which you would like to examine the content of a file that is larger than will fit on one screen. You can accomplish this easily by piping the output from the cat command into a pager, such as the more command.

[localhost:~/Public/spell-1.0] nermal% cat /etc/magic | more

#! file
#       $OpenBSD: Header,v 1.2 1996/06/26 05:33:03 deraadt Exp $

# Magic data for file(1) command.
# Machine-genererated from src/cmd/file/magdir/*; edit there only!
# Format is described in magic(files), where:
# files is 4 on V7 and BSD, 4 on SV, and ?? in the SVID.
#------------------------------------------------------------------------------
# Localstuff:  file(1) magic for locally observed files
#
# $OpenBSD: Localstuff,v 1.3 1997/02/09 23:58:40 millert Exp $
# Add any locally observed files here.  Remember:
# text if readable, executable if runnable binary, data if unreadable.

#------------------------------------------------------------------------------
# OpenBSD:  file(1) magic for OpenBSD objects
#
# All new-style magic numbers are in network byte order.
#

0       lelong                  000000407      OpenBSD little-endian object file
>16     lelong                  >0             not stripped
0       belong                  000000407      OpenBSD big-endian object file
>16     belong                  >0             not stripped

0       belong&0377777777       041400413      OpenBSD/i386 demand paged
>0      byte                    &0x80
>>20    lelong                  <4096          shared library
>>20    lelong                  =4096          dynamically linked executable
>>20    lelong                  >4096          dynamically linked executable
>0      byte                    ^0x80          executable

more

Of course, you already know that you could have accomplished this by just using more /etc/magic. The point, though, is that although we told you how to use more to read a file before, more actually wants to take its input from STDIN, and uses a file specified as an argument only as a last resort.

Knowing this, you now know how to make any other output from any other program viewable with the more pager. This lets you do things such as look at the full contents of your file system, without needing an immensely large scroll buffer in your terminal:

[localhost:~/Public/spell-1.0] nermal% ls -lRaF / | more

     ls: .Trashes: Permission denied
     total 13264
     drwxrwxr-t  39 root  admin     1282 Apr 20 15:00 ./
     drwxrwxr-t  39 root  admin     1282 Apr 20 15:00 ../
     -rwxrwxrwx   1 root  admin     8208 Apr 18 11:05 .DS_Store*
     d-wx-wx-wx   2 root  admin      264 Apr  4 12:20 .Trashes/
     -r--r--r--   1 root  wheel      142 Feb 25 03:05 .hidden
     dr--r--r--   2 root  wheel      224 Apr 20 15:00 .vol/
     -rwxrwxrwx   1 root  wheel   106496 Apr 20 14:59 AppleShare PDS*
     drwxrwxrwx  25 root  admin      806 Apr 18 11:05 Applications/
     drwxrwxrwx  18 root  wheel      568 Apr 20 14:54 Applications (Mac OS 9)/
     drwxrwxrwx   2 root  wheel      264 Apr  6 12:24 Cleanup At Startup/
     -rwxrwxrwx   1 root  wheel   212992 Apr 20 14:59 Desktop DB*
     -rwxrwxrwx   1 root  wheel  1432466 Apr 20 14:57 Desktop DF*
     drwxrwxrwx   6 root  staff      264 Apr  4 11:51 Desktop Folder/
     drwxrwxr-x  12 root  admin      364 Mar  1 20:29 Developer/
     more

These are, of course, simplistic examples of connecting programs, but keep an eye out for how pipes are used throughout the rest of the book. The ability to create small programs with small functions, and to tie these together into arbitrarily large programs with arbitrarily complex behaviors is a very powerful one, and is one of the largest reasons that having access to the BSD half of your new OS is such a valuable feature.

Think back to programs such as grep, and you can probably begin to see how you could apply this to creating custom solutions to problems that you might have encountered. You should also begin to see why this functionality cannot be conveniently duplicated with a GUI-only interface.

Joints in Pipes: tee

On occasion, you might want to redirect STDOUT to both a file and another program at the same time. In such a case, you can use the tee command. This command accepts data on STDIN, writes it to a filename specified on the command line, and continues to send the data, unaltered on STDOUT.

Consider an example in which you would like to search through your files, looking for files that match a particular name pattern. You would like to both browse the found names as they appear, and collect the names into a log file so that you can use the information again later. In this example, we will look in a rather inefficient fashion for files with names that contain java. Because there are probably lots of them on the system, we want the output piped through a pager (more). We also want to collect the filenames into a file in our home directory named my_output.

find / -name \*java\* -print | tee ~/my_output | more

     /Applications (Mac OS 9)/Apple...Applet
     Runner/Applets/Animator/Animator.java
     /Applications (Mac OS 9)/Apple...Applet Runner/Applets/ArcTest/ArcTest.java
     /Applications (Mac OS 9)/Apple...Applet Runner/Applets/BarChart/Chart.java
     /Applications (Mac OS 9)/Apple...Applet
     Runner/Applets/DrawTest/DrawTest.java
     .
     .
     .

If you let this run to completion, you can then look at the file my_output, and it will have all the stuff you just scrolled through with more. This isn't a very valuable listing, so you might not actually want to wait for this to finish. But, if you press Ctrl+C out of it, you'll kill the tee process and it won't write its output.

The tee command is invaluable if you need to split one STDOUT stream to be used by multiple different processes, or if you need to collect logging or partial output from intermediate steps in a large, multiprogram piped command.

Share ThisShare This

Informit Network