Home > Articles > Programming > C#

.NET Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Which Compression Method to Use

Last updated Mar 14, 2003.

When you make the decision to use compression in your program, the next thing you have to decide is which compression method you’re going to use. This can be a difficult decision that depends on a several factors: cost of the compression package, time to implement, compression ratio, and processor usage.

Obviously what we’d like is that mythical free package that’s simple to implement and provides the best compression ratio while taking very little in the way of processor resources. Absent that, we have to make tradeoffs.

As you saw in the previous sections, the .NET Framework includes two classes, DeflateStream and GZipStream in the System.IO.Compression namespace, that provide simple compression capabilities. These classes satisfy the first two of our for conditions quite well: they’re free and painless to implement. The question, then, is which provides the best compression ratio for the processor resources used.

I created a very simple program to test the performance of DeflateStream and GZipStream. The program, shown below, loads a file into memory and then calls individual methods to compress the data with the different classes. At the end of each compression and decompression pass, the program prints the time required and the compression ratio.

using System;
using System.Diagnostics;
using System.IO;
using System.IO.Compression;

namespace gzip_cs
{
  class Program
  {
    static string ifname = "msenv.dll";

    static void Main(string[] args)
    {
      byte[] inputBuffer;
      // First, read the entire file into memory.
      using (FileStream fs = new FileStream(ifname, FileMode.Open, FileAccess.Read))
      {
        int len = (int)fs.Length;
        Console.WriteLine("Reading {0:N0} bytes from {1}", len, ifname);
        inputBuffer = new byte[len];
        int bytesRead = fs.Read(inputBuffer, 0, len);
        Console.WriteLine("{0:N0} bytes read", len);
      }

      // create input memory stream
      using (MemoryStream inputMem = new MemoryStream(inputBuffer))
      {
        // allocate a memory stream for output
        // It’s beautiful having all this ram.
        byte[] outputBuffer = new byte[1024 * 1024 * 1024];
        using (MemoryStream outputMem = new MemoryStream(outputBuffer, true))
        {
          for (int i = 0; i < 2; i++)
          {
            gzCompress(inputMem, outputMem);
            GC.Collect();
            dflCompress(inputMem, outputMem);
            GC.Collect();
          }
        }
      }
    }

    private static void dflCompress(MemoryStream inputMem, MemoryStream outputMem)
    {
      inputMem.Position = 0;
      outputMem.Position = 0;
      Console.WriteLine("Compressing {0:N0} bytes using Deflate...", inputMem.Length);
      Stopwatch sw = new Stopwatch();
      sw.Start();
      using (DeflateStream dfs = new DeflateStream(outputMem, CompressionMode.Compress, true))
      {
        CopyFile(inputMem, dfs);
      }
      sw.Stop();
      Console.WriteLine("Compressed size = {0:N0} ({1:P2})", outputMem.Position, (double)outputMem.Position / inputMem.Length);
      Console.WriteLine("Elapsed time = {0} seconds", sw.Elapsed.TotalSeconds);
      Console.WriteLine();
      Console.WriteLine("Decompressing {0:N0} bytes using Deflate...", outputMem.Position);
      using (MemoryStream decompMem = new MemoryStream())
      {
        sw.Reset();
        sw.Start();
        using (DeflateStream dfs = new DeflateStream(outputMem, CompressionMode.Decompress, true))
        {
          outputMem.Position = 0;
          CopyFile(dfs, decompMem);
        }
        sw.Stop();
        Console.WriteLine("Elapsed time = {0} seconds", sw.Elapsed.TotalSeconds);
        Console.WriteLine();
      }
    }

    private static void gzCompress(MemoryStream inputMem, MemoryStream outputMem)
    {
      inputMem.Position = 0;
      outputMem.Position = 0;
      Console.WriteLine("Compressing {0:N0} bytes using GZip...", inputMem.Length);
      Stopwatch sw = new Stopwatch();
      sw.Start();
      // Create the GZipStream
      using (GZipStream gzs = new GZipStream(outputMem, CompressionMode.Compress, true))
      {
        CopyFile(inputMem, gzs);
      }
      sw.Stop();
      Console.WriteLine("Compressed size = {0:N0} ({1:P2})", outputMem.Position, (double)outputMem.Position / inputMem.Length);
      Console.WriteLine("Elapsed time = {0} seconds", sw.Elapsed.TotalSeconds);
      Console.WriteLine();
      Console.WriteLine("Decompressing {0:N0} bytes using GZip...", outputMem.Position);
      using (MemoryStream decompMem = new MemoryStream())
      {
        sw.Reset();
        sw.Start();
        using (GZipStream gzs = new GZipStream(outputMem, CompressionMode.Decompress, true))
        {
          outputMem.Position = 0;
          CopyFile(gzs, decompMem);
        }
        sw.Stop();
        Console.WriteLine("Elapsed time = {0} seconds", sw.Elapsed.TotalSeconds);
      }
      Console.WriteLine();
    }

    const int BUFFER_SIZE = 1024*1024;
    static void CopyFile(Stream ins, Stream outs)
    {
      int bytesRead;
      byte[] buff = new byte[BUFFER_SIZE];
      Stopwatch sw = new Stopwatch();
      sw.Start();
      while ((bytesRead = ins.Read(buff, 0, BUFFER_SIZE)) != 0)
      {
        outs.Write(buff, 0, bytesRead);
      }
    }
  }
}

It’s nice to have a lot of memory. I load the entire file into RAM and do the compression in RAM in order to eliminate all disk I/O time. The program tests just the time to compress and decompress the data.

The results for different types of files are shown below. The four files I tested with are:

File 1 is a 650 megabyte XML log file. This data is highly redundant and very compressible.

File 2 is the HTML from my blog at http://blog.mischel.com, and is about 89 kilobytes in size.

File 3 is the complete works of William Shakespeare in plain text format from http://www.gutenberg.org/dirs/etext94/shaks12.txt, with a size of about 5.3 megabytes.

File 4 is msenv.dll from my Visual Studio 8 directory. This file is 8.7 megabytes.

All times are in seconds.

File

Ratio

GZ Compress

GZ Decomp

Dfl Compress

Dfl Decomp

1

9.73:1

15.74

6.63

13.58

4.57

2

3.36:1

0.0045

0.0018

0.0040

0.0015

3

2.34:1

0.3887

0.1551

0.3733

0.1384

4

1.47:1

0.6399

0.3389

0.6109

0.3076

The table only shows one compression ratio because GZipStream and DeflateStream give almost exactly the same compression ratios on the four files tested. In all cases, the output size for DeflateStream was slightly smaller (a few dozen bytes) than the output for GZipStream.

The interesting part is that DeflateStream is noticeably faster than GZipStream. In the case of the large XML file, DeflateStream is 14% faster than GZipStream in compression, and 31% faster in decompression. The difference isn’t quite as dramatic for the binary msenv.dll file: 4.5% faster in compression and almost 10% faster in decompression. It’s interesting to note that the difference in speed is proportional to the compression ratio. It leads me to believe that the differences aren’t so much in the algorithm, but in the way that the algorithms were implemented.

From my testing, it looks like DeflateStream is the way to go. It provides the same compression ratio as GZipStream while at the same time using noticeably fewer processor cycles for both compression and decompression.

For general-purpose compression algorithms, compression ratio is usually proportional to processor resources used. That is, getting a 3:1 compression ratio will typically take more processor time than getting a 2:1 compression ratio. This isn’t always true, and the difference isn’t necessarily linear, but in general the conventional wisdom holds: it takes more processor cycles to make things smaller.

The trick is finding a compression algorithm that provides you the best tradeoff between compression ratio and processor usage. "Best," of course, is highly subjective. If you’re short on disk space but have plenty of processor time, then you can afford to squeeze every last bit out of your compressor. On the other hand, if you’re short on processor cycles, you’ll have to tolerate something less than optimum compression ratios. Usually your needs will fall somewhere in between.

The DeflateStream and GZipStream provide a good tradeoff between processor usage and compression ratio, and are very effective at reducing disk space or memory used when compared to working with uncompressed data. In addition, as you saw in the previous section, reading and writing compressed data can actually be faster overall than reading and writing uncompressed data, simply because the processor can compress and decompress faster than the hardware can read from or write to disk.