Pipes
Pipes are what Unix is really all about. I know I said that redirection and pipes were the heart of Unix, but effective use of pipes defines Unix. Although other systems do all right with redirection, only Unix does right by pipes. Although several early releases of Unix were made before pipes were added, pipes made Unix special.
The concept now seems absurdly simple: Rather than redirecting output of a program into a file, we can redirect it into another program. The implications of this concept are staggering, and they delineate what is thought of as the Unix philosophy.
The idea of Unix is like that of a set of good knives. Each has a single purpose. Each does well at its appointed task. Some might be used for another task, but only poorly. The modern GUI-based operating system, with its bloated, monolithic applications, might be thought of as a food processor: It can do all sorts of things, but it's big, complicated, and not as flexible as that drawer full of knives. The food processor certainly does some things better: When I want to make hummus, I run it through the food processor; if I want to carve a turkey, however, I look elsewhere.
There's another lesson here: A food processor permits a person with little experience to do a good job at slicing vegetables. An expert with a good set of knives, however, can probably do the job both faster and better. Some tools, such as knives, reward experience and practice. Some, such as food processors, do not. At least, the learning curve is much more limited with a food processor.
Many people refer to this as the toolbox philosophy. In order to effectively connect these individual tools to each other, we need the pipe.
A pipe is represented by the vertical bar (|), and that key is referred to by most Unix people simply as the pipe. The flow of data is from left to right: The program on the left of the pipe sends its output to the program on the right, which sees it as input. A program through which data is piped is known as a filter.
Some Common Filters
Now that you have a toolbox, let's fill it with some basic tools! This section is about the hammers, screwdrivers, and wrenches of the Unix toolboxthe ones you use every day. There are drill presses and circular saws, but before you can learn those tools you need to learn the basics.
Pagers
The first filter most users try is cat piped into more, which is a simple program that stops after each screenful of text and waits for the user to press a key.5 If you're reading a long text file, more is your friend. Just to try it out, type cat /usr/dict/words | more. (When you're sick of seeing a long list of alphabetical words, simply press Ctrl+C to break out of more.) If your system doesn't have /usr/dict/words, try examining /etc/services or any other long file on your system.
5. You might remember our old friends more and less from our discussion of documentation in Chapter 1. Not only are more and less pipes, they can also simply take a filename as an argument and display that file. Many Unix commands are flexible in the same way. Even cat can be used as a filter if you don't give it a filename on the command line. You've actually done this already, if you think about it, when you redirected a file into cat.
After you've tried more, try less instead, if it's installed on your system. If you're not sure whether less is installed on your system, just try these commands and see whether they work. If less isn't installed, you'll get an error message but nothing will break; you just won't get to see the file you tried to look at. less permits you to go backward as well as forward; if your terminal is properly configured, your PageUp and PageDown keys should enable you to scroll a page of text at a time. Because less doesn't quit when you reach the end of the file, you'll need to press q to exit. (Unlike most Unix programs, Ctrl+C doesn't exit less.) As with more, just cat /etc/syslog.conf | less to look at a really long file you don't particularly care about.
more and less are a special subcategory of filter known as pagers because they enable users to page through text. Most filters, however, transform the text that passes through them in some way.
Heads or Tails
head and tail transform text by displaying only part of it. head displays the top part of a file, whereas tail displays the bottom of the file:
[ elvis@frogbog elvis ]$ cat tao.txt|head -2 Lao Tzu Chuang Tzu [ elvis@frogbog elvis ]$ cat tao.txt|head -4 Lao Tzu Chuang Tzu K'ung Tzu Meng Tzu [ elvis@frogbog elvis ]$ cat tao.txt|tail -2 Meng Tzu Wang Bi [ elvis@frogbog elvis ]$ cat tao.txt|tail -4 Chuang Tzu K'ung Tzu Meng Tzu Wang Bi
By default, head and tail print ten lines apiece. You can specify a measurement other than lines in some versions of these commands, but counting lines is most common.
sort and uniq
Another very popular filter is sort, which simply sorts the file passed to it alphabetically. As with head and tail, there's not a whole lot to say about sort, so I'll just run a few simple demonstrations:
[ elvis@frogbog elvis ]$ cat tao.txt|sort Chuang Tzu K'ung Tzu Lao Tzu Meng Tzu Wang Bi [ elvis@frogbog elvis ]$ cat west.txt|sort Aristotle Descartes Heraclitus Kierkegaard Pascal Plato Plotinus Sartre Socrates [ elvis@frogbog elvis ]$ cat tao.txt west.txt|sort Aristotle Chuang Tzu Descartes Heraclitus Kierkegaard K'ung Tzu Lao Tzu Meng Tzu Pascal Plato Plotinus Sartre Socrates Wang Bi [ elvis@frogbog elvis ]$ cat west.txt tao.txt|sort Aristotle Chuang Tzu Descartes Heraclitus Kierkegaard K'ung Tzu Lao Tzu Meng Tzu Pascal Plato Plotinus Sartre Socrates Wang Bi
As you can see, sort takes all the input it receives via STDIN and sends the sorted data to STDOUT. There is a special numerical mode, accessible via the -n option. With this switch, if the text at the beginning of the line is a number, numbers are sorted in numerical order. This seems redundant, but let's have an example:
[ jon@frogbog jon ]$ cat some-stuff.txt 99 Dead Baboons 99 Red Balloons 101 Dalmatians 16 Candles 24 Hours 9 Lords a Leaping [ jon@frogbog jon ]$ cat some-stuff.txt|sort -n 9 Lords a Leaping 16 Candles 24 Hours 99 Dead Baboons 99 Red Balloons 101 Dalmatians
But what would happen without the -n switch? Let's try that, too:
[ jon@frogbog jon ]$ cat some-stuff.txt|sort 101 Dalmatians 16 Candles 24 Hours 99 Dead Baboons 99 Red Balloons 9 Lords a Leaping
Hardly what we might have in mind when sorting by quantity: Instead of sorting numerically, this list is sorted alphabetically. (Remember, in ASCII numbers come before letters.)
A related tool of somewhat limited use is uniq, which strips out duplicate lines when they follow each other in the file. For example,
[ jon@frogbog jon ]$ cat other-stuff.txt One Two One One Two Two Two Three Three One [ jon@frogbog jon ]$ cat other-stuff.txt|uniq One Two One Two Three One
This is much more useful following a sort:
[ jon@frogbog jon ]$ cat other-stuff.txt | sort | uniq One Three Two
Not what you were expecting? How's the computer supposed to know that "Two" and "Three" are numbers? You can, however, see that it works.
wc
By far my favorite filter is wc, which simply reports how many characters, words, and lines, are present in a file.
[ jon@frogbog jon ]$ cat some-stuff.txt | wc 6 16 85 [ jon@frogbog jon ]$ cat some-stuff.txt | wc -c 85 [ jon@frogbog jon ]$ cat some-stuff.txt | wc -w 16 [ jon@frogbog jon ]$ cat some-stuff.txt | wc -l 6
wc is particularly useful for counting anything stored one line per record, such as the output of a ps command:
[ jon@frogbog jon ]$ ps -ef|wc -l 63
Obviously, the precise number varies depending on your system and its use when you run the command. How many processes are running on this system right now? If you said 63, you're wrong: Remember that ps has a header line at the top of its output, so only 62 processes are running on the system.
Combining Filters into Longer Pipelines
A single filter by itself might be useful, but filters are most useful when combined with each other to produce a particular effect. The language metaphor for Unix is particularly apt here: Single pipes pipes:singleare like simple sentences, whereas longer pipelines are complex sentences. In this section, we endeavor to diagram some more complex pipelines to gain a better grasp of the language. Users who don't write complex pipelines can get their work done, which is what computing is intended to do, but they're only speaking pidgin Unix. Being able to speak fluent Unix means being able to get your work done more quickly and more elegantly than you might otherwise be able to.
Remember that earlier in the chapter, I asked you to find out what shell you were using by typing
cat /etc/passwd|grep ^ username:|cut -d : -f 7
Let's dissect that command and see what's going on with it. First, we're looking at /etc/passwd. Use man 5 passwd to find out what's going on with that file.6 Simply put, /etc/passwd contains a list of all the accounts that can log in to the system. (On systems with NIS, the YP map passwd contains the same information.) The information in this file is in several different fields, each separated by a colon. On my system, two such lines read as follows:
jon:x:500:500:Jon:/home/jon:/bin/bash elvis:x:501:100:Elvis:/home/elvis:/bin/bash
6. On Solaris systems, you'll have to use man -s 5 passwd, for some silly reason. I don't know why they took a mind to breaking this, but they did.
The first field in /etc/passwd is the username. Second is the password, but on most modern systems the password is stored elsewhere, and x is the only thing that shows up in this field. Third is the UID for this account, fourth is the primary GID. Fifth is the GECOS field, which contains what passes for a human-readable username. On some systems, this can also contain phone numbers, offices, and so on. Sixth is the user's home directory, and finally we have the user's default shell.
If I want to find my shell, as with the command I mentioned earlier, first I have to find my account. For this, we pipe /etc/passwd through grep, which is an advanced search program. The default is to return all lines that match the regular expression provided on the command line.
I'll talk about regular expressions in Chapter 5, but for now we only need to know that you can look for text with regular expressions and that some characters have special meanings. In the previous example, if I wanted to find my account, that part of the pipe would read grep ^jon:.
Why would I want to look for more than my account name? Because more than one account on the system might have jon in its name, I want to search for the entire field. We're lucky, because the username is the first field. Regular expressions have a special character to mark the beginning of the line: ^. Marking the end of the field is easy: All we need to do is put a colon at the end of it. Therefore, the regular expression ^jon: finds the word jon followed by a colon at the beginning of the line. This should return precisely one account, mine.
Finally, we can pass the output of grep to the cut command. cut lets you specify what part of a line you want to display and can work either with individual characters or fields. We want to work with fields, but by default cut expects that fields are delimited, or separated, by tabs. We want to use a colon instead, hence the -d : portion of the command line. We also want to specify that only the seventh field should be shown, and so we add a -f 7 to finish this command line.
Let's try another example to figure out how many different users are currently running processes on your machine. If you have a System V ps, it would look like this:
ps -ef|awk '{ print $1 }'|sort|uniq|wc -l|xargs expr -1 +
With a BSD ps, it would look like this:
ps aux|awk '{ print $1 }'|sort|uniq|wc -l|xargs expr -1 +
The part of the pipe that is providing the data is, in either case, a full listing of all processes on the machine. Although the output of this data is different, in both cases we happen to be interested in the first field, which is the username.
The username is passed onto awk. awk is, in fact, a full-fledged programming language, but it lends itself nicely to one-line commands like this. Like cut, awk can print an individual field. However, awk does an excellent job of figuring out where field breaks are if its input comes in reasonably good columns. ps meets this criterion, and because the fields aren't delimited with a single character, cut doesn't do a good job here.
Even though awk is a full-fledged computer language, many Unix users only use the single command I mentioned before. The single quotes around the curly braces are necessary, and to change the column number output, replace the 1 in the earlier example with the column of your choice.
The input stream for sort is now a single column listing the owner of each command. A uniq would not work in this case without a sort because processes might or might not be grouped by username. So we sort and uniq the output, producing a list of all unique usernames who are currently running processes.
We then pass this list to wc -l, which counts the number of lines in its input. Now we have a number, and a problem: The header for ps is counted in that number, unless a user currently running processes has a username the same as the header field! We have to get rid of that extra number. To do this, we just need to subtract one.
The program that can best do this is expr, which permits you to put in a simple math problem on the command line. So we want to expr <STDIN> - 1 to get our answer. Unfortunately, we have another problem: expr takes its input not from STDIN but from its command line. This means we have to turn our input stream into a command-line argument.
Fortunately, there's a program designed to do just that: xargs takes STDIN and appends it (each line separated by a space rather than a line break, if there are multiple lines in the input file) to its own command line. The first parameter on the xargs command line is required, and it tells the program which program gets the new command line. After that, you can put any number of options that get passed to that command before STDIN.
This is our last problem: We want to subtract one from STDIN, which would mean xargs expr <STDIN> -1, but xargs won't let us put text after STDIN. The simple answer is to add STDIN to -1, giving us a final pipe of xargs expr -1 +.
Right now, this sure looks like a lot of work to get a simple answer. In a way, it is. After some practice, however, command lines such as this will feel like second nature to you. If you use Unix enough, you'll even find that it's difficult to get along without the capability to do this because you'll find that it is central to getting the computer to do what you want. You'll begin to wonder (I hope) why it's so difficult to send matching lines of your word document through a filter that changes them in some consistent way. When you start to ask yourself questions like this, you begin to think Unix from the inside out.
Practice Problems
7. How many entries are in your system's /etc/passwd file?
8. Display the last five entries of your system's /etc/passwd file.
9. Sort the last five entries of your system's /etc/passwd file.
10. Sort your /etc/passwd file and display the last five lines, alphabetically speaking.
11. Display only the usernames of these last five entries.
12. Display only the usernames and UIDs of these entries. (Hint: Read the cut man page to find out how to do this.)
13. Redirect this list of usernames and UIDs to a file named last-users-on-system.txt.
14. Write a pipeline that will kill any of your processes whose names begin with cat and a space. (To create a test case, you can run cat &, which creates a process named cat.) Don't try to kill all processes named cat; instead, kill only those that belong to you.