Mac OS X Unleashed

Mac OS X Unleashed

By John Ray and William C. Ray

File Compression and Archiving

As with the Macintosh world, a number of standards have arisen in the Unix world for compressing and archiving files. Unlike the Mac world, however, these programs don't tend to be do-all programs such as StuffIt that can archive, compress, password protect, and perform a wealth of other useful file archive functions. Following the Unix tradition, software that compresses files, mostly just compresses files. Software that collects lots of files together into a single-file archive, mostly just collects lots of files together into a single-file archive. These functions are used together to collect files into an archive (uncompressed), and then subsequently used to compress the files into a compressed archive. Likewise, the analogous procedure to "UnStuffiting" a file requires two steps in Unix because decompression of the archive and unpacking of its contents are two separate steps.

Compressing and Decompressing Files: compress, gzip, uncompress, gunzip

Unix has various tools available for compressing and decompressing files. Compressing files, of course, causes them to take up less space. As drive space becomes cheaper, this is perhaps not as great a concern. However, if you will be transferring files over the network, smaller files transfer faster. In addition, you might find it useful to compress files—especially archives of software packages you have installed—for writing to CD-ROM, where space is limited.

compress and gzip are the compressing tools available on your system; uncompress and gunzip are the decompression tools. compress and uncompress are more widely available by default on systems. The gzip tool, however, can compress further than co m press.

Software packages that you download are frequently distributed as files compressed by compress or gzip. Files that you download ending in .Z are files compressed with compress. Files ending in .gz are compressed with gzip. Decompress files ending in .Z with uncompress; decompress files ending in .gz with gunzip. You will also occasion ally see files ending in .tgz, which is the result of shoehorning .tar.gz (for tar archive, compressed with gzip) into a three-letter file extension).

Here is a sample of compressing a file using gzip:

Rosalyn source 19 >ls -l sendmail-src.tar

      -rw-r--r--   1 miwa  class  4454400 Jul  6  2000 sendmail-src.tar

Rosalyn source 20 >gzip -9 sendmail.8.10.2-src.tar

Rosalyn source 21 >ls -l sendmail.8.10.2-src.tar*

      -rw-r--r--   1 miwa  class  1250050 Jul  6  2000 sendmail-src.tar.gz

As we see from the ls listing, the size of the file has been reduced and .gz has been appended to the filename. The syntax and options for compress and uncompress are in the command documentation table, Table 13.16. The syntax and options for gzip and gunzip are in the command documentation table, Table 13.17.

Table 13.16. The Command Documentation Table for compress and uncompress

compress Compresses data.
uncompress Expands data.
compress [-cfv] [-b <bits>] <file1> <file2> ...

uncompress [-cfv] <file1> <file2> ...

            
compress reduces the size of a file and renames the file by adding the .Z extension. As much of the original file characteristics (modification time, access time, file flags, file mode, user ID, and group ID) are retained as permissions allow. If compression would not reduce a file's size, the file is ignored.
uncompress restores a file reduced by compress to its original form, and renames the file by removing the .Z extension.
-c Writes compressed or uncompressed output to standard output without modifying any files.
-f Forces compression of a file, even when compression would not reduce its size. Additionally, forces files to be overwritten without prompting for confirmation.
-v Prints the percentage reduction of each file.
-b <bits> Specifies the upper-bit code limit. Default is 16. Bits must be between 9 and 16. Lowering the limit results in larger, less compressed files.

Table 13.17. The Command Documentation Table for gzip, gunzip, and zcat

gzip Compresses or expands files.
gunzip

zcat

gzip [-acdfhlLnNrtvV19] [-S <suffix>] <file1><file2> ...

gunzip [-acfhlLnNrtvV] [-S <suffix>] <file1>

                     ccc.gif
                   <file2> ...

zcat [-fhLV] <file1> <file2> ...
gzip reduces the size of a file and renames the file by adding the .gz extension. It keeps the same ownership modes, and access and modification times. If no files are specified, or if the filename - is specified, standard input is compressed to standard output. gzip compresses regular files, but ignores symbolic links.
Compressed files can be restored to their original form by using gunzip, gzip -d, or zcat.
gunzip takes a list of files from the command line, whose names end in .gz, -gz, .z, -z, _z, or .Z, and which also begin with the correct magic number, and replaces them with expanded files without the original extension. gunzip also recognizes the extensions .tgz and .taz as short versions of .tar.gz and .tar.Z, respectively. If necessary, gzip uses the .tgz extension to compress a .tar file.
zcat is equivalent to gunzip -c. It uncompresses either a list of files on the command line or from standard input and writes uncompressed data to standard output. zcat uncompresses files that have the right magic number, whether or not they end in .gz.
Compression is always formed, even if the compressed file is slightly larger than the original file.
-a ASCII text mode. Converts end-of-lines using local conventions. Supported only on some non-Unix systems.
--ascii Same as -a.
-c Writes output to standard output and keeps the original files unchanged.
--stdout Same as -c.
--to-stdout Same as -c.
-d Decompresses.
--decompress Same as -d.
--uncompress Same as -d.
-f Forces compression or decompression, even if the file has multiples links, or if the corresponding file already exists, or if the compressed data is read from or written to a terminal. If -f is not used, and gzip is not working in the back ground, the user is prompted before a file is overwritten.
-h Displays a help screen and quits.
--help Same as -h.
-l

Lists the following fields for each compressed file:

compressed (compressed size)

uncompressed (uncompressed size)

ratio (compression ratio; 0.0% if unknown)

uncompressed_name (name of uncompressed file)

Uncompressed size is -1 for files not in gzip format. To get an uncompressed size for such files, use

zcat <file1.Z> | wc -c

Combined with -verbose, it also displays

method (compression method)

crc (32-bit CRC of the uncompressed data)

date and time (time stamp of the uncompressed file)

Compression methods supported are deflate, compress, lzh, and pack. crc is listed as ffffffff when the file is not in gzip format.

--list Same as -l.
-L Displays the gzip license and quits.
--license Same as -L.
-n

When compressing, it does not save the original filename and time stamp by default. (Always saves the original name if it has to be truncated.)

When decompressing, it does not restore the original name (removes only .gz) and time stamp (only copies it from compressed file), if present. This is the default.

--no-name Same as -n.
-N

When compressing, it always saves the original filename and time stamp. This is the default.

When decompressing, it restores the original time stamp and filename, if present.

--name Same as -N.
-q Suppresses all warnings.
--quiet Same as -q.
-r

Traverses the directory structure recursively.

If a filename specified on the command line is a directory, gzip/gunzip descends into the directory and compresses/decompresses the files in that directory.

--recursive Same as -r.
-S <suffix>

Uses <suffix> instead of .gz. Any suffix can be used, but we recommend that suffixes other than .z and .gz be avoided to avoid confusion when transferring the file to another system.

A null suffix (-S "") forces gunzip to try decompression on all listed files, regardless of suffix.

--suffix <suffix> Same as -S <suffix> .
-t Test. Checks the integrity of the compressed file.
--test Same as -t.
-v Verbose. Displays the name and percentage reduction for each file compressed or decompressed.
--verbose Same as -v.
-V Version. Displays the version number and compilation options and quits.
--version Same as -V.
- <n>  
--fast  
--best Regulates the speed of compression as specified by - <n> , where -1 (or --fast) is the fastest compression method (least compression) and -9 (or --best) is the slowest compression method (most compression). Default compression option is -6.

Archiving Files: tar

tar is a useful tool for archiving files. Although originally intended for archiving to tape, tar is commonly used for archiving files or directories of files to a single file. After you have the archive file, it is common to compress it for further storage or distribution.

The most common options that you will probably use with tar are -c for creating a file, -t for getting a listing of the contents, -x for extracting the file, -f for specifying a file to create or act on, and -v for verbose output.

Here is an example of looking at the contents of a tar file. It is often useful to look at the contents of a tar file before extracting it. Because a tar file can be an archive of files rather than an archive of a directory of files, it is helpful to see the contents. That way, you know whether you should create a separate directory for extracting the file so that you have its contents in one place, or whether it will create a directory into which the files will be extracted.

Although not all the output is shown in this example, we can see nonetheless that the archive will create a directory into which the files will be extracted:

Rosalyn source 18 >tar -tvf sendmail.8.10.2-src.tar

     drwxr-xr-x 103/700       0 2000-06-07 13:01 sendmail-8.10.2/
     -rw-r--r-- 103/700     795 1999-09-27 17:39 sendmail-8.10.2/Makefile
     -rwxr-xr-x 103/700     327 1999-09-23 17:31 sendmail-8.10.2/Build
     -rw-r--r-- 103/700     321 1999-02-06 22:21 sendmail-8.10.2/FAQ
     -rw-r--r-- 103/700    1396 1999-04-04 03:01 sendmail-8.10.2/INSTALL
     -rw-r--r-- 103/700    8923 1999-11-17 13:56 sendmail-8.10.2/KNOWNBUGS
     -rw-r--r-- 103/700    4116 2000-03-03 14:24 sendmail-8.10.2/LICENSE
     -rw-r--r-- 103/700   23017 1999-11-23 14:08 sendmail- 8.10.2/PGPKEYS
     -rw-r--r-- 103/700   13703 2000-03-16 18:46 sendmail-8.10.2/README
     -rw-r--r-- 103/700  348392 2000-06-07 03:39 sendmail-8.10.2/RELEASE_NOTES
     drwxr-xr-x 103/700       0 2000-06-07 13:00 sendmail-8.10.2/devtools/
...

The syntax and options for tar are in the command documentation table, Table 13.18.

Table 13.18. The Command Documentation Table for ta r

tar Creates, extracts, or appends to tape archives.
tar [-] <c | t | x | r | u> [fbemopvwzZhHLPX014578] [<archive>]

[<blocksize>] [-C <directory>] [-s <replstr>] 
               <file1> <file2> ...

            
tar saves files to and restores files from a single file. Although that single file might have originally been intended to be magnetic tape, magnetic tape is not required.
One of the following flags is required:
-c Creates a new archive or overwrites an existing one.
-t Lists the contents of an archive. If any files are listed on the command line, only those files are listed.
-x Extracts files from an archive. If any files are listed on the command line, only those files are extracted. If more than one copy of a file exists in an archive, earlier copies are overwritten by later copies.
-r Appends the specified files to an archive. This works only on media on which an end-of-file mark can be overwritten.
-u Alias to -r.
In addition to the required flags, any of these options may be used:
-f <archive> Filename where the archive is stored. Default is /dev/rmt8.
-b <blocksize> Sets the blocksize to be used in the archive. Any multiple of 512 between 10240 and 32256 may be used.
-e Stops after the first error.
-m Does not preserve modification time.
-o Does not create directories.
-p Preserves user ID, group ID, file mode, and access and modification times.
-v Verbose mode.
-w Interactively renames files.
-z Compresses the archive using gzip.
-Z Compresses the archive using compress.
-h Follows symbolic links as if they were normal files or directories.
-H Follows symbolic links given on the command line only.
-L Follows all symbolic links.
-P Does not follow any symbolic links.
-X Does not cross mount points in the file system.
[-014578] Selects a backup device, /dev/rmtN.
-C <directory> Sets the working directory for the files. When extracting, files are extracted into the specified directory. When creating, specified files are matched from the directory.
-s <replstr>

Modifies the filenames or archive member names specified by the pattern or file operands according to the substitution expression <replstr> , using the syntax of ed(1) in this format:

/ old / new /[gp]

old is the old expression. new is the new expression.

The optional trailing g applies the substitution globally.

That is, it continues to apply the substitution. The first unsuccessful substitution stops the g option.

The optional trailing p causes the final result of a successful substitution to be written to standard error in this format:

<original pathname> >> <new pat h name>

Multiple -s <replstr> options can be specified. They are applied in the order listed.

Share ThisShare This

Informit Network