Home > Articles

Redirection and Pipes

Pipes

Pipes are what Unix is really all about. I know I said that redirection and pipes were the heart of Unix, but effective use of pipes defines Unix. Although other systems do all right with redirection, only Unix does right by pipes. Although several early releases of Unix were made before pipes were added, pipes made Unix special.

The concept now seems absurdly simple: Rather than redirecting output of a program into a file, we can redirect it into another program. The implications of this concept are staggering, and they delineate what is thought of as the Unix philosophy.

The idea of Unix is like that of a set of good knives. Each has a single purpose. Each does well at its appointed task. Some might be used for another task, but only poorly. The modern GUI-based operating system, with its bloated, monolithic applications, might be thought of as a food processor: It can do all sorts of things, but it's big, complicated, and not as flexible as that drawer full of knives. The food processor certainly does some things better: When I want to make hummus, I run it through the food processor; if I want to carve a turkey, however, I look elsewhere.

There's another lesson here: A food processor permits a person with little experience to do a good job at slicing vegetables. An expert with a good set of knives, however, can probably do the job both faster and better. Some tools, such as knives, reward experience and practice. Some, such as food processors, do not. At least, the learning curve is much more limited with a food processor.

Many people refer to this as the toolbox philosophy. In order to effectively connect these individual tools to each other, we need the pipe.

A pipe is represented by the vertical bar (|), and that key is referred to by most Unix people simply as the pipe. The flow of data is from left to right: The program on the left of the pipe sends its output to the program on the right, which sees it as input. A program through which data is piped is known as a filter.

Some Common Filters

Now that you have a toolbox, let's fill it with some basic tools! This section is about the hammers, screwdrivers, and wrenches of the Unix toolbox—the ones you use every day. There are drill presses and circular saws, but before you can learn those tools you need to learn the basics.

Pagers

The first filter most users try is cat piped into more, which is a simple program that stops after each screenful of text and waits for the user to press a key.5 If you're reading a long text file, more is your friend. Just to try it out, type cat /usr/dict/words | more. (When you're sick of seeing a long list of alphabetical words, simply press Ctrl+C to break out of more.) If your system doesn't have /usr/dict/words, try examining /etc/services or any other long file on your system.

5. You might remember our old friends more and less from our discussion of documentation in Chapter 1. Not only are more and less pipes, they can also simply take a filename as an argument and display that file. Many Unix commands are flexible in the same way. Even cat can be used as a filter if you don't give it a filename on the command line. You've actually done this already, if you think about it, when you redirected a file into cat.

After you've tried more, try less instead, if it's installed on your system. If you're not sure whether less is installed on your system, just try these commands and see whether they work. If less isn't installed, you'll get an error message but nothing will break; you just won't get to see the file you tried to look at. less permits you to go backward as well as forward; if your terminal is properly configured, your PageUp and PageDown keys should enable you to scroll a page of text at a time. Because less doesn't quit when you reach the end of the file, you'll need to press q to exit. (Unlike most Unix programs, Ctrl+C doesn't exit less.) As with more, just cat /etc/syslog.conf | less to look at a really long file you don't particularly care about.

more and less are a special subcategory of filter known as pagers because they enable users to page through text. Most filters, however, transform the text that passes through them in some way.

Heads or Tails

head and tail transform text by displaying only part of it. head displays the top part of a file, whereas tail displays the bottom of the file:

[ elvis@frogbog elvis ]$ cat tao.txt|head -2
Lao Tzu
Chuang Tzu
[ elvis@frogbog elvis ]$ cat tao.txt|head -4
Lao Tzu
Chuang Tzu
K'ung Tzu
Meng Tzu
[ elvis@frogbog elvis ]$ cat tao.txt|tail -2
Meng Tzu
Wang Bi
[ elvis@frogbog elvis ]$ cat tao.txt|tail -4
Chuang Tzu
K'ung Tzu
Meng Tzu
Wang Bi

By default, head and tail print ten lines apiece. You can specify a measurement other than lines in some versions of these commands, but counting lines is most common.

sort and uniq

Another very popular filter is sort, which simply sorts the file passed to it alphabetically. As with head and tail, there's not a whole lot to say about sort, so I'll just run a few simple demonstrations:

[ elvis@frogbog elvis ]$ cat tao.txt|sort
Chuang Tzu
K'ung Tzu
Lao Tzu
Meng Tzu
Wang Bi
[ elvis@frogbog elvis ]$ cat west.txt|sort
Aristotle
Descartes
Heraclitus
Kierkegaard
Pascal
Plato
Plotinus
Sartre
Socrates
[ elvis@frogbog elvis ]$ cat tao.txt west.txt|sort
Aristotle
Chuang Tzu
Descartes
Heraclitus
Kierkegaard
K'ung Tzu
Lao Tzu
Meng Tzu
Pascal
Plato
Plotinus
Sartre
Socrates
Wang Bi
[ elvis@frogbog elvis ]$ cat west.txt tao.txt|sort
Aristotle
Chuang Tzu
Descartes
Heraclitus
Kierkegaard
K'ung Tzu
Lao Tzu
Meng Tzu
Pascal
Plato
Plotinus
Sartre
Socrates
Wang Bi

As you can see, sort takes all the input it receives via STDIN and sends the sorted data to STDOUT. There is a special numerical mode, accessible via the -n option. With this switch, if the text at the beginning of the line is a number, numbers are sorted in numerical order. This seems redundant, but let's have an example:

[ jon@frogbog jon ]$ cat some-stuff.txt
99 Dead Baboons
99 Red Balloons
101 Dalmatians
16 Candles
24 Hours
9 Lords a Leaping
[ jon@frogbog jon ]$ cat some-stuff.txt|sort -n
9 Lords a Leaping
16 Candles
24 Hours
99 Dead Baboons
99 Red Balloons
101 Dalmatians

But what would happen without the -n switch? Let's try that, too:

[ jon@frogbog jon ]$ cat some-stuff.txt|sort
101 Dalmatians
16 Candles
24 Hours
99 Dead Baboons
99 Red Balloons
9 Lords a Leaping

Hardly what we might have in mind when sorting by quantity: Instead of sorting numerically, this list is sorted alphabetically. (Remember, in ASCII numbers come before letters.)

A related tool of somewhat limited use is uniq, which strips out duplicate lines when they follow each other in the file. For example,

[ jon@frogbog jon ]$ cat other-stuff.txt
One
Two
One
One
Two
Two
Two
Three
Three
One
[ jon@frogbog jon ]$ cat other-stuff.txt|uniq
One
Two
One
Two
Three
One

This is much more useful following a sort:

[ jon@frogbog jon ]$ cat other-stuff.txt | sort | uniq
One
Three
Two

Not what you were expecting? How's the computer supposed to know that "Two" and "Three" are numbers? You can, however, see that it works.

wc

By far my favorite filter is wc, which simply reports how many characters, words, and lines, are present in a file.

[ jon@frogbog jon ]$ cat some-stuff.txt | wc
   6   16   85
[ jon@frogbog jon ]$ cat some-stuff.txt | wc -c
  85
[ jon@frogbog jon ]$ cat some-stuff.txt | wc -w
  16
[ jon@frogbog jon ]$ cat some-stuff.txt | wc -l
   6

wc is particularly useful for counting anything stored one line per record, such as the output of a ps command:

[ jon@frogbog jon ]$ ps -ef|wc -l
  63

Obviously, the precise number varies depending on your system and its use when you run the command. How many processes are running on this system right now? If you said 63, you're wrong: Remember that ps has a header line at the top of its output, so only 62 processes are running on the system.

Combining Filters into Longer Pipelines

A single filter by itself might be useful, but filters are most useful when combined with each other to produce a particular effect. The language metaphor for Unix is particularly apt here: Single pipes pipes:singleare like simple sentences, whereas longer pipelines are complex sentences. In this section, we endeavor to diagram some more complex pipelines to gain a better grasp of the language. Users who don't write complex pipelines can get their work done, which is what computing is intended to do, but they're only speaking pidgin Unix. Being able to speak fluent Unix means being able to get your work done more quickly and more elegantly than you might otherwise be able to.

Remember that earlier in the chapter, I asked you to find out what shell you were using by typing

cat /etc/passwd|grep ^ username:|cut -d : -f 7

Let's dissect that command and see what's going on with it. First, we're looking at /etc/passwd. Use man 5 passwd to find out what's going on with that file.6 Simply put, /etc/passwd contains a list of all the accounts that can log in to the system. (On systems with NIS, the YP map passwd contains the same information.) The information in this file is in several different fields, each separated by a colon. On my system, two such lines read as follows:

jon:x:500:500:Jon:/home/jon:/bin/bash
elvis:x:501:100:Elvis:/home/elvis:/bin/bash

6. On Solaris systems, you'll have to use man -s 5 passwd, for some silly reason. I don't know why they took a mind to breaking this, but they did.

The first field in /etc/passwd is the username. Second is the password, but on most modern systems the password is stored elsewhere, and x is the only thing that shows up in this field. Third is the UID for this account, fourth is the primary GID. Fifth is the GECOS field, which contains what passes for a human-readable username. On some systems, this can also contain phone numbers, offices, and so on. Sixth is the user's home directory, and finally we have the user's default shell.

If I want to find my shell, as with the command I mentioned earlier, first I have to find my account. For this, we pipe /etc/passwd through grep, which is an advanced search program. The default is to return all lines that match the regular expression provided on the command line.

I'll talk about regular expressions in Chapter 5, but for now we only need to know that you can look for text with regular expressions and that some characters have special meanings. In the previous example, if I wanted to find my account, that part of the pipe would read grep ^jon:.

Why would I want to look for more than my account name? Because more than one account on the system might have jon in its name, I want to search for the entire field. We're lucky, because the username is the first field. Regular expressions have a special character to mark the beginning of the line: ^. Marking the end of the field is easy: All we need to do is put a colon at the end of it. Therefore, the regular expression ^jon: finds the word jon followed by a colon at the beginning of the line. This should return precisely one account, mine.

Finally, we can pass the output of grep to the cut command. cut lets you specify what part of a line you want to display and can work either with individual characters or fields. We want to work with fields, but by default cut expects that fields are delimited, or separated, by tabs. We want to use a colon instead, hence the -d : portion of the command line. We also want to specify that only the seventh field should be shown, and so we add a -f 7 to finish this command line.

Let's try another example to figure out how many different users are currently running processes on your machine. If you have a System V ps, it would look like this:

ps -ef|awk '{ print $1 }'|sort|uniq|wc -l|xargs expr -1 +

With a BSD ps, it would look like this:

ps aux|awk '{ print $1 }'|sort|uniq|wc -l|xargs expr -1 +

The part of the pipe that is providing the data is, in either case, a full listing of all processes on the machine. Although the output of this data is different, in both cases we happen to be interested in the first field, which is the username.

The username is passed onto awk. awk is, in fact, a full-fledged programming language, but it lends itself nicely to one-line commands like this. Like cut, awk can print an individual field. However, awk does an excellent job of figuring out where field breaks are if its input comes in reasonably good columns. ps meets this criterion, and because the fields aren't delimited with a single character, cut doesn't do a good job here.

Even though awk is a full-fledged computer language, many Unix users only use the single command I mentioned before. The single quotes around the curly braces are necessary, and to change the column number output, replace the 1 in the earlier example with the column of your choice.

The input stream for sort is now a single column listing the owner of each command. A uniq would not work in this case without a sort because processes might or might not be grouped by username. So we sort and uniq the output, producing a list of all unique usernames who are currently running processes.

We then pass this list to wc -l, which counts the number of lines in its input. Now we have a number, and a problem: The header for ps is counted in that number, unless a user currently running processes has a username the same as the header field! We have to get rid of that extra number. To do this, we just need to subtract one.

The program that can best do this is expr, which permits you to put in a simple math problem on the command line. So we want to expr <STDIN> - 1 to get our answer. Unfortunately, we have another problem: expr takes its input not from STDIN but from its command line. This means we have to turn our input stream into a command-line argument.

Fortunately, there's a program designed to do just that: xargs takes STDIN and appends it (each line separated by a space rather than a line break, if there are multiple lines in the input file) to its own command line. The first parameter on the xargs command line is required, and it tells the program which program gets the new command line. After that, you can put any number of options that get passed to that command before STDIN.

This is our last problem: We want to subtract one from STDIN, which would mean xargs expr <STDIN> -1, but xargs won't let us put text after STDIN. The simple answer is to add STDIN to -1, giving us a final pipe of xargs expr -1 +.

Right now, this sure looks like a lot of work to get a simple answer. In a way, it is. After some practice, however, command lines such as this will feel like second nature to you. If you use Unix enough, you'll even find that it's difficult to get along without the capability to do this because you'll find that it is central to getting the computer to do what you want. You'll begin to wonder (I hope) why it's so difficult to send matching lines of your word document through a filter that changes them in some consistent way. When you start to ask yourself questions like this, you begin to think Unix from the inside out.

Practice Problems

7. How many entries are in your system's /etc/passwd file?

8. Display the last five entries of your system's /etc/passwd file.

9. Sort the last five entries of your system's /etc/passwd file.

10. Sort your /etc/passwd file and display the last five lines, alphabetically speaking.

11. Display only the usernames of these last five entries.

12. Display only the usernames and UIDs of these entries. (Hint: Read the cut man page to find out how to do this.)

13. Redirect this list of usernames and UIDs to a file named last-users-on-system.txt.

14. Write a pipeline that will kill any of your processes whose names begin with cat and a space. (To create a test case, you can run cat &, which creates a process named cat.) Don't try to kill all processes named cat; instead, kill only those that belong to you.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020