Home > Articles > Programming > C#

.NET Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

The HashSet Collection

Last updated Mar 14, 2003.

If you’re looking for more up-to-date information on this topic, please visit our .NET and Windows Programming article, podcast, and store pages.

Imagine that you need to keep track of whether you’ve seen a particular item key before. For purposes of this exercise, assume that the item key is a person’s name. All you need to know is if you’ve seen that person. You don’t need to store any other information. There are several ways you can do this.

You can use a simple array of strings and do a sequential search of that array every time you encounter a name. If the name is not in the list, you add it at the end. You could do the same thing with an ArrayList or a generic List of strings. The sequential organization has the benefit of simplicity, but has very poor runtime performance for moderately sized lists.

A more efficient means would be to use a HashTable or generic Dictionary collection. These offer much better performance at the cost of slightly more memory, but they have the annoying disadvantage of requiring that you supply a value for each key, as shown in the example below that uses a Dictionary.

[C#]

Dictionary<string, object> names = new Dictionary<string, object>();
names.Add("Jim", null);
names.Add("Debra", null);
names.Add("Joe", null);
names.Add("Jess", null);

[Visual Basic]

Dim names As New Dictionary(Of String, Object)
names.Add("Jim", Nothing)
names.Add("Debra", Nothing)
names.Add("Joe", Nothing)
names.Add("Jess", Nothing)

To determine if an item is in the collection, you call the dictionary object’s ContainsKey method, which will return True if the key is in the collection, or False if the key is not there.

Using Dictionary in this manner to simulate simple set inclusion operations is easy, and the only thing that’s even the slightest bit cumbersome is including that null or Nothing value when you add an item. But very often that’s not all you want to do with sets of items.

Imagine you have two Dictionary objects that hold sets of names that you’ve seen and you want to merge those two into a single set--perform what is in essence the mathematical union of two sets. You’ll have to write code that goes through each of the dictionaries and conditionally adds items to the result set. The C# code to do that is:

private static void Union(Dictionary<string, object> nameSet, Dictionary<string, object> resultSet)
{
    foreach (string key in nameSet.Keys)
    {
        if (!resultSet.ContainsKey(key))
        {
            resultSet.Add(key, null);
        }
    }
}

You would have to write similar code for other common set operations: intersection, difference, and so on. The result would work, no doubt, and possibly quite well. But it’s less than an ideal solution.

.NET 3.5 introduces the HashSet generic collection type that is based on the model of mathematical sets and provides high performance set operations. Conceptually, you can think of the HashSet as a generic Dictionary collection without the values. However, HashSet adds a number of set-specific operations.

The C# code to create and populate a HashSet is very similar to the Dictionary-based code example shown earlier:

HashSet<string> names = new HashSet<string>();
names.Add("Jim");
names.Add("Debra");
if (names.Contains("Jim"))
{
    // item in in the set
}
else
{
    // item is not in the set
}

Determining if an item is in the set is a simple matter of calling Contains. As far as adding items to the set and determining if an item is in the set are concerned, HashSet works in much the same way as Dictionary. But HashSet adds a large number of set-specific operations that would be very cumbersome to implement using a Dictionary.

The ExceptWith method removes all elements in a specified collection from the current HashSet. So if you have a set called AllNames, and a set called NamesSeen, you can remove the "seen" names from the AllNames collection by calling AllNames.ExceptWith(NamesSeen). This can be thought of as a logical "NOT" operation.

UnionWith combines two sets, resulting in a set that contains all of the items from both sets. This can be thought of as a logical "OR" operation.

IntersectWith creates a set that contains a list of items that exist in both the current set and the specified set. Intersection is a logical "AND" operation.

SymmetricExceptWith results in a set that contains items that are in one of the two sets, but not both. It is a logical exclusive "OR" operation.

There are methods to determine if a set is a subset, proper subset, superset, or proper superset of a given set. These methods use the standard mathematical definitions of subset and superset.

The Overlaps method will return True if two sets overlap--that is, if at least one item is present in each of the sets.

I’ve confined the discussion of the HashSet methods above to use with other sets, but those methods actually take an IEnumerable(T) parameter, which means that any collection that implements IEnumerable on the same type as the HashSet that you’re working with can be used. So you can union or intersect an array of names with a HashSet of strings, for example.

HashSet provides a comprehensive treatment of sets that is much more powerful and easier to use than Dictionary when performing set operations. I, for one, will be making good use of it.