Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

UNIX Disk Usage

  • Print
  • + Share This
This chapter is from the book

Keeping Track of Users: diskhogs

Let's put all the information in this hour together and create an administrative script called diskhogs. When run, this script will report the users with the largest /home directories, and then report the five largest files in each of their homes.

Task 3.5: This Little Piggy Stayed Home?

This is the first shell script presented in the book, so a quick rule of thumb: Write your shell scripts in sh rather than csh. It's easier, more universally recognized, and most shell scripts you'll encounter are also written in sh. Also, keep in mind that just about every shell script discussed in this book will expect you to be running as root, since they'll need access to the entire file system for any meaningful or useful system administration functions.

In this book, all shell scripts will be written in sh, which is easily verified by the fact that they all have

#!/bin/sh

as their first line.

  1. Let's put all this together. To find the five largest home directories, you can use

    du –s /home/* | sort –rn | cut –f2 | head –5

    For each directory, you can find the largest files within by using

    find /home/loginID -type f -printf "%k %p\n" | sort -rn | head

    Therefore, we should be able to identify the top home directories, then step one-by-one into those directories to identify the largest files in each. Here's how that code should look:

    for dirname in ´du -s /home/* | sort -rn | cut -f2- | head -5´
    do
     echo ""
     echo Big directory: $dirname
     echo Four largest files in that directory are:
     find $dirname -type f -printf "%k %p\n" | sort -rn | head -4
    done
    exit 0
  2. This is a good first stab at this shell script. Let's save it as diskhogs.sh, run it and see what we find:

    # sh diskhogs.sh
    Big directory: /home/staging
    Four largest files in that directory are:
    423 /home/staging/waldorf/big/DSCF0165.jpg
    410 /home/staging/waldorf/big/DSCF0176.jpg
    402 /home/staging/waldorf/big/DSCF0166.jpg
    395 /home/staging/waldorf/big/DSCF0161.jpg
    
    Big directory: /home/chatter
    Four largest files in that directory are:
    1076 /home/chatter/comics/lynx
    388 /home/chatter/logs/access_log
    90 /home/chatter/logs/error_log
    64 /home/chatter/responding.cgi
    
    Big directory: /home/cbo
    Four largest files in that directory are:
    568 /home/cbo/financing.pdf
    464 /home/cbo/investors/CBO-plan.pdf
    179 /home/cbo/Archive/cbofinancial-modified-files/CBO Website.zip
    77 /home/cbo/Archive/cbofinancial-modified-files/CBO Financial Incorporated.doc
    
    Big directory: /home/sherlockworld
    Four largest files in that directory are:
    565 /home/sherlockworld/originals-from gutenberg.txt
    56 /home/sherlockworld/speckled-band.html
    56 /home/sherlockworld/copper-beeches.html
    54 /home/sherlockworld/boscombe-valley.html
    
    Big directory: /home/launchline
    Four largest files in that directory are:
    151 /home/launchline/logs/access_log
    71 /home/launchline/x/submit.cgi
    71 /home/launchline/x/admin/managesubs.cgi
    64 /home/launchline/x/status.cgi

    As you can see, the results are good, but the order of the output fields is perhaps less than we'd like. Ideally, I'd like to have all the disk hogs listed, then their largest files listed. To do this, we'll have to either store all the directory names in a variable that we then parse subsequently, or we'd have to write the information to a temporary file.

    Because it shouldn't be too much information (five directory names), we'll save the directory names as a variable. To do this, we'll use the nifty backquote notation.

    Here's how things will change. First off, let's load the directory names into the new variable:

    bigdirs="´du –s /home/* | sort –rn | cut –f2- | head –5´"

    Then we'll need to change the for loop to reflect this change, which is easy:

    for dirname in $bigdirs ; do

    Notice I've also pulled the do line up to shorten the script. Recall that a semicolon indicates the end of a command in a shell script, so we can then pull the next line up without any further ado.

TIP

Unix old-timers often refer to backquotes as backticks, so a wizened Unix admin might well say "stick the dee-ewe in backticks" at this juncture.

  1. Now let's not forget to output the list of big directories before we list the big files per directory. In total, our script now looks like this:

    echo "Disk Hogs Report for System ´hostname´"
    
    bigdirs="´du -s /home/* | sort -rn | cut -f2- | head -5´"
    
    echo "The Five biggest home directories are:"
    echo $bigdirs
    
    for dirname in $bigdirs ; do
     echo ""
     echo Big directory: $dirname
     echo Four largest files in that directory are:
     find $dirname -type f -printf "%k %p\n" | sort -rn | head -4
    done
    
    exit 0

    This is quite a bit closer to the finished product, as you can see from its output:

    Disk Hogs Report for System staging.intuitive.com
    The Five biggest home directories are:
    /home/staging /home/chatter /home/cbo /home/sherlockworld /home/launchline
    
    Big directory: /home/staging
    Four largest files in that directory are:
    423 /home/staging/waldorf/big/DSCF0165.jpg
    410 /home/staging/waldorf/big/DSCF0176.jpg
    402 /home/staging/waldorf/big/DSCF0166.jpg
    395 /home/staging/waldorf/big/DSCF0161.jpg
    
    Big directory: /home/chatter
    Four largest files in that directory are:
    1076 /home/chatter/comics/lynx
    388 /home/chatter/logs/access_log
    90 /home/chatter/logs/error_log
    64 /home/chatter/responding.cgi
    
    Big directory: /home/cbo
    Four largest files in that directory are:
    568 /home/cbo/financing.pdf
    464 /home/cbo/investors/CBO-plan.pdf
    179 /home/cbo/Archive/cbofinancial-modified-files/CBO Website.zip
    77 /home/cbo/Archive/cbofinancial-modified-files/CBO Financial Incorporated.doc
    
    Big directory: /home/sherlockworld
    Four largest files in that directory are:
    565 /home/sherlockworld/originals-from gutenberg.txt
    56 /home/sherlockworld/speckled-band.html
    56 /home/sherlockworld/copper-beeches.html
    54 /home/sherlockworld/boscombe-valley.html
    
    Big directory: /home/launchline
    Four largest files in that directory are:
    151 /home/launchline/logs/access_log
    71 /home/launchline/x/submit.cgi
    71 /home/launchline/x/admin/managesubs.cgi
    64 /home/launchline/x/status.cgi

    This is a script you could easily run every morning in the wee hours with a line in cron (which we'll explore in great detail in Hour 15, "Running Jobs in the Future"), or you can even put it in your .profile to run automatically each time you log in.

  2. One final nuance: To have the output e-mailed to you, simply append the following:

    | mail –s "Disk Hogs Report" your-mailaddr

    If you've named this script diskhogs.sh like I have, you could have the output e-mailed to you (as root) with

    sh diskhogs.sh | mail –s "Disk Hogs Report" root

    Try that, then check root's mailbox to see if the report made it.

  3. For those of you using Solaris, Darwin, or another Unix, the nifty -printf option probably isn't available with your version of find. As a result, the more generic version of this script is rather more complex, because we not only have to sidestep the lack of -printf, but we also have to address the challenge of having embedded spaces in most directory names (on Darwin). To accomplish the latter, we use sed and awk to change all spaces to double underscores and then back again when we feed the arg to the find command:

    #!/bin/sh
    echo "Disk Hogs Report for System ´hostname´"
    
    bigdir2="´du -s /Library/* | sed 's/ /_/g' | sort -rn | cut -f2- | head -5´"
    
    echo "The Five biggest library directories are:"
    echo $bigdir2
    
    for dirname in $bigdir2 ; do
     echo ""
     echo Big directory: $dirname
     echo Four largest files in that directory are:
     find "´echo $dirname | sed 's/_/ /g'´" -type f -ls | \
      awk '{ print $7" "$11 }' | sort -rn | head -4
    done
    
    exit 0

    The good news is that the output ends up being almost identical, which you can verify if you have an OS X or other BSD system available.

    Of course, it would be smart to replace the native version of find with the more sophisticated GNU version, but changing essential system tools is more than most Unix users want!

TIP

If you want to explore upgrading some of the Unix tools in Darwin to take advantage of the sophisticated GNU enhancements, then you'd do well to start by looking at http://www.osxgnu.org/ for ported code. The site also includes download instructions.

If you're on Solaris or another flavor of Unix that isn't Mac OS X, check out the main GNU site for tool upgrades at http://www.gnu.org/.

This shell script evolved in a manner that's quite common for Unix tools—it started out life as a simple command line; then as the sophistication of the tool increased, the complexity of the command sequence increased to where it was too tedious to type in directly, so it was dropped into a shell script. Shell variables then offered the capability to save interim output, fine-tune the presentation, and more, so we exploited it by building a more powerful tool. Finally, the tool itself was added to the system as an automated monitoring task by adding it to the root cron job.

  • + Share This
  • 🔖 Save To Your Account