.NET Tutorials
Nov6

Written by:Timothy
11/6/2008 4:58 PM

Compression using the .NET framework is, surprisingly enough, very easy to accomplish.  In this article, I am going to go through the details of how to compress streams, byte data, and strings using the gzip and deflate algorithms.  Available for download from the Code Library, I have created a relatively simple Compression Helper class that can handle single line calls for compression/decompression using the appropriate algorithm based on a configuration setting.

First off, to understand how to perform compression, you first need to understand Streams.  The easiest way to explain a Stream is that it is a class that controls access to a block of data to/from some resource.  They can control access to memory, disk space, network output, etc.  Streams are used to provide sequential read or write operations to the resourceand the underlying data in a stream is generally not directly accessible.  Streams can also be stacked on top of each other, with each stream providing a different manipulation of the data before passing to the next stream in the pipeline.  This is how the compression streams work.  Let's start with an example:

protected static Stream GetCompressionStream(Stream baseStream, CompressionMode mode)

{

    // Create the stream variable that will perform compression/decompression

    Stream zip;

 

    // Return a GZip stream

    if (CompressionType == "gzip")

        zip = new GZipStream(baseStream, mode, true);

    // Return a Deflate stream

    else if (CompressionType == "deflate")

        zip = new DeflateStream(baseStream, mode, true);

    // Otherwise, return the base stream and data will simply be filtered through

    else

        zip = new ZeroCompressionStream(baseStream, true);

 

    // Return the compression stream

    return zip;

}

The GetCompressionStream function(listed above) is basically a helper method that chooses the appropriate type of compression stream to use based on an application configuration setting.  You will notice a reference to the CompressionType property, which simply loads the configuration setting and makes sure it is a valid value.  The first two items in the if statement create a compression stream using the base stream that was provided. Otherwise, a ZeroCompressionStream object will be returned.  

ZeroCompressionStream is a custom class that was made as an alternative to just returning the base stream itself.  By creating a stream that does nothing more than dump directly to the underlying stream, it is possible to close this stream without closing the base stream.  This will allow the user to post additional data to the underlying stream after the compression is completed.

Now, we're ready to see some actual compression.

public static long Compress(Stream input, Stream output)

{

    // Create a buffer to transfer the data

    byte[] buffer = new byte[1024];

    int bytesRead = 0;

    long totalBytes = 0;

 

    // Get the stream that will perform the decompression

    using (Stream zip = GetCompressionStream(output, CompressionMode.Compress))

    {

        // Use the buffer to move data to the compression stream until complete

        while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)

        {

            totalBytes += bytesRead;

            zip.Write(buffer, 0, bytesRead);

        }

 

        // Close the compression stream

        zip.Close();

    }

 

    // Return the total number of bytes of compressed data

    return totalBytes;

}

As you will notice, this function takes two parameters.  First, it takes an input stream that contains the data to be compressed.  This can be a FileStream, MemoryStream, or any other type that inherits from System.IO.Stream. If you have already written to or read from this stream, be sure to seek back to the start of your data.  The second parameter is an output stream.  This is the stream that the compressed data will be written to.  The first three lines just create a byte array that will be used to transfer the data from the input stream to the compression stream in 1KB increments and initialize some integers for counting the amount of byte data that was read.

Next, we create the stream that will perform the compression.  I have broken out the stream creation to a separate function so that the appropriate stream will be created based on the application configuration, this could be a GZipStream, DeflateStream, or a ZeroCompressionStream.  Because the streams implement the IDisposable interface, I put it in a "using" block to ensure the resources are cleaned up once the compression is finished.  The data is then transfered one kilobyte at a time using a while loop until no more bytes are read from the input stream.  The compression stream is then closed to ensure everything is cleaned up properly.

The decompression process is very similar.  Here is the decompression function with the changes highlighted:

public static long Decompress(Stream input, Stream output)

{

    // Create a buffer to transfer the data

    byte[] buffer = new byte[1024];

    int bytesRead = 0;

    long totalBytes = 0;

 

    // Get the stream that will perform the decompression

    using (Stream zip = GetCompressionStream(input, CompressionMode.Decompress))

    {

        // Use the buffer to move data to the compression stream until complete

        while ((bytesRead = zip.Read(buffer, 0, buffer.Length)) > 0)

        {

            totalBytes += bytesRead;

            output.Write(buffer, 0, bytesRead);

        }

 

        // Close the compression stream

        zip.Close();

    }

 

    // Returns the total number of bytes of decompressed data

    return totalBytes;

}

Those are the most important functions.  Compressing and decompressing data in other formats such as byte arrays or strings, is simply a matter of converting the data to a Stream and then running this function.

Compressing/Decompressing Byte Data

public static byte[] Compress(byte[] data)

{

    byte[] final;

 

    // Create a stream to hold the output

    using (MemoryStream output = new MemoryStream(), input = new MemoryStream(data))

    {

        // Process the compression

        Compress(input, output);

 

        // Get the resultant data

        final = output.ToArray();

 

        // Close the underlying streams

        zipStream.Close();

        output.Close();

    }

 

    // Convert the output stream to a byte array

    return final;

}

 

public static byte[] Decompress(byte[] gzipData)

{

    byte[] final;

 

    // Create a stream to hold the output

    using (MemoryStream output = new MemoryStream(), input = new MemoryStream(gzipData))

    {

        // Process the compression

        Decompress(input, output);

 

        // Get the resultant data

        final = output.ToArray();

 

        // Close the underlying streams

        zipStream.Close();

        output.Close();

    }

 

    // Convert the output stream to a byte array

    return final;

}

Compressing/Decompressing Strings

Converts the string to a byte array, then calls the byte array compression method

public static string Compress(string data)

{

    // get a byte array of the data and pass to the Compress method.

    return Convert.ToBase64String(Compress(Encoding.Default.GetBytes(data)));

}

 

public static string Decompress(string gzipString)

{

    // Decompress and convert data to a string

    return Encoding.Default.GetString(Decompress(Convert.FromBase64String(gzipString)));

}

Putting It All Together

To compress a string using the completed CompressionHelper class is a simple matter of calling that static function.  The same goes for byte data, and streams.  But it is important to note that compressing to and from a stream will leave both streams open after compression, so do remember to close your stream when you are done with it.

One more point to note: When compressing data to a FileStream, the .NET classes will not support the compression of multiple files into a single file archive as can be done with many 3rd party compression tools.  There are a number of additional headers required in such a file that GZipStream and DeflateStream do not directly handle.  If you need to compress multiple files together, consider SharpZipLib, a free, open source compression library that is generally accepted by the developer community as the best free compression library out there.  Or, if you are interested in creating your own library, you will need to learn about the headers required for the archive format. If I can find a good article on writing your own zip headers, I will post it in the comments.

The CompressionHelper class has been made available for download in the Code Library.

Tags: