-
Table of Contents
- .NET Book Recommendations
- Getting Started with .NET
- The Microsoft .NET Framework
- The Common Language Runtime (CLR), the Common Type System (CTS), and the Common Language Specification (CLS)
- .NET Framework Class Library
- Visual Studio .NET
- .NET Enterprise Servers and .NET My Services
- .NET Compliant Languages
- C#
- Visual Basic .NET (VB .NET)
- ASP.NET
- XML Web Services
- ADO.NET
- XML.NET
- Windows Forms
- Why .NET?
- Displaying Errors with the Error Provider
- COM Interoperability
- Comparing Java and .NET
- Calling Unmanaged Code
- .NET Application Security
- Code Access Security
- .NET Standards Support
- Numeric Types in the .NET Framework
- Working with Strings
- Formatting Strings
- Trimming Character Strings
- Comparing Strings in .NET 2.0
- Arrays and Collections
- Arrays as Class Members
- Sorting a Multi-Dimensional Array
- Sorting a Multi-Dimensional Array with LINQ
- File I/O (System.IO)
- Working with File Names
- Using the File System
- Working with Files and Directories
- Monitoring the File System
- Working with Streams
- Working with Text Encodings
- Working with Date and Time
- Extending the DateTime Class
- Using DateTimeOffset
- Fun with Dates
- Exceptions
- Delegates
- Events
- Asynchronous Programming
- Asynchronous File I/O
- Timers
- Random Numbers
- Cryptographically Secure Random Numbers
- Serialization
- MultiThreading (System.Threading)
- Multi-Threading Overview
- The Managed Thread Pool
- Managed Threading
- Thread Synchronization
- Synchronizing Data Access
- Trace Debugging
- Tracing in .NET 2.0
- ASP.NET Trace
- Validating User Input in ASP.NET Web Pages
- Event Logging
- Monitoring Application Performance
- Accessing the Registry
- Accessing Environment Information
- Environment Variables in .NET 2.0
- Managing Windows Forms Applications
- Working with Email
- Working with Graphics
- Animating a Background
- Working with Images
- Drawing Cycloid Curves
- Simulating the Spirograph
- Building International Web Applications
- .NET Compact Framework
- Mobile Web Development with ASP.NET
- Speech Technologies
- Microsoft MapPoint Web Service
- Working with Typed DataSets
- Using Relationships in DataSets
- DataColumn Expressions
- Playing Simple Sounds
- Playing Sounds with .NET 2.0
- Returning an Image in a Web Page
- RSS
- Best Practices — Project Structure
- Best Practices — Application Blocks
- The Data Access Application Block
- The Exception Management Application Block
- Best Practices — Performance
- Best Practices — Performance and Scalability
- Best Practices - Testing
- Reading the Tea Leaves, 2005
- Predictions: A Look Back at 2005, and a Look Ahead to 2006
- .NET Downloads
- Application Deployment Overview
- Application Deployment — Versioning
- Application Deployment — Version Policy
- Application Deployment — Packaging and Distribution
- .NET Remoting Overview
- A Remoting Demonstration
- Remoting Configuration
- Remoting: Lifetimes and Leases
- Remoting: Other Issues
- Attributes
- Writing Custom Attributes
- Accessing Attributes in Code
- Reflection
- Class Design: Inheritance, Interface, or Composition?
- The TriTryst Game
- Console Applications in .NET 2.0
- New File I/O Methods in .NET 2.0
- Building Projects with MSBuild
- Unmanaged Callbacks in .NET 2.0
- Timer Troubles
- Non-Rectangular Windows Forms
- Windows Forms Transparency
- 10 Things I Hate About Visual Basic
- 10 Things I Hate About C#
- Background Processing with Idle Time
- Scaling Windows Forms
- Reading and Writing Binary Data
- New Memory Management Functions in .NET 2.0
- Compatibility Between .NET 1.1 and .NET 2.0
- Managed Debugging Assistants in .NET 2.0
- XDir: A Program for Viewing Directory Sizes
- The Microsoft.VisualBasic Namespace
- Operator Overloading
- Working with GPS Data
- Hidden Visual Studio Tools
- .NET 3.0
- The .NET 2.0 Stopwatch Class
- Nullable Types
- Drawing Rotated Text
- Unsafe Code
- Other .NET Languages
- Compiler Directives
- Safe Handles
- Predictions, 2007 Edition
- New Features in C# 3.0
- Generics
- Network Client Programming
- On the Misuse of Exceptions
- Maximum Object Size in .NET
- More on Maximum Object Sizes
- Keyed Collection Memory Limitations
- Matching String Endings
- Allocating Small Data Structures
- Grumbling About Limitations
- Some Thoughts on the Nature of What We Do
- Working with Predicates in Collections
- Working with DataReaders
- Outputting XML with XmlWriter
- Writing XML Data
- Working with Compression
- Another Look at Compressed Streams
- Compressing a Very Large File
- Canonical URIs
- Constructing URIs
- Using OneWayAttribute for Remote Calls
- Selecting a Garbage Collector
- Linked List
- Linked List Application - The MRU List
- Auto-implemented Properties in C#
- The HashSet Collection
- Looking Ahead: 2018
- An Experiment in Optimization
- A Larger Integer
- Extension Methods
- Language Integrated Query (LINQ)
- Variable Length Parameter Lists
- The ReaderWriterLockSlim Synchronization Primitive
- Sorting a Text File
- Sorting a Large Text File
- Using ListView with Large Data Sets
- LINQ One-Liners
- Regular Expression Optimization
- Random File I/O
- Computing the Size of a Structure
- More on Computing Structure Sizes
- UnmanagedMemoryStream
- Dynamically Loading Code
- Building a String Table
- Delegates Versus Function Pointers
- Visual Studio Editor Features
- A Simple Profile Timer
- New Features in C# 4.0
- IEnumerator or IList?
- New Features in .NET 4.0
- Set Operations with IEnumerable and HashSet
- Using File Locks
- Extending Object Functionality
- Clearing a HashSet
- When Hash Codes Matter
- Parsing Command Line Options
- Creating a Single-Instance Program
- Asynchronous Windows Forms Events
- The BackgroundWorker Component
- Fixing a Dumb Mistake
- Thinking About Multi-Threaded Programs
- JavaScript Object Notation
- Better JSON Processing with JSON.Net
- Useful .NET-related Sites
- Markov Models
- Building an Order 0 Markov Model
- Higher Order Markov Models
- Webmaster's Guide to robots.txt
- An Overview of the Parallel Extensions to .NET
- Parallel Extensions Synchronization Objects
- Thread Safe Collections
- A Bug and a Conundrum
- Another Bug and an Answer
- Task Parallel Library
- Good and Bad Ideas in C#
- Parallel LINQ
- Copying Large Files
- Replacing File.Copy
- Learning from Our Mistakes
- Symbolic Links
- There Is No Easy Fix
- Tracking Hurricanes
- Examining Hurricane Data
- Searching for Multiple Strings
- Simple JSON Processing
- Aho-Corasick String Searching
- Writing a Web Crawler
- Web Crawler Politeness
- Source Control Management
- Subversion
- Communicating with Datagrams
- Fun with Actions and Funcs
- The Future of Media
- The Importance of Metadata
- Of Comparison and IComparer
- IComparer, Comparer, IComparable, Oh My!
- Comparing Generic Types
- A Simple HTTP Server
- Quantizing DateTime Fields
- More Fun with the Garbage Collector
- Refactor, Don't Rewrite
- A Generic BinaryHeap Class
- A Generic File Sorter
- Birthdays, Random Numbers, and Hash Keys
- Random Selection from Large Groups
- Command Line Tools for Windows
- Reading and Writing, Bit by Bit
- Selecting the Top N Items from a Group
- Determining Website Content Encoding
- Benefits and Drawbacks of Syndication
- Pubsubhubbub
- Memory Use Misconceptions
- Risk, Lost Opportunity, and Other Hidden Upgrade Costs
- Culture Shock: from .NET to JavaScript
- Using .NET for a Startup
- Tracking Wikipedia Changes with IRC
- Browser Applications and the Same Origin Policy
- Handling the Unexpected
- Dealing with Growth
- Deleting the Oldest File
- Where Do I Put Stuff?
- .NET Timer Resolution
- Exploring Options for Better Timers
- Using the Windows Timer Queue API
- Locks Aren't Slow
- Alternatives to Locks
- Lock Free Concurrent Collections
- The BlockingCollection Class
- Customizing BlockingCollection
- What Time Is It? Daylight Saving Time and Computers
- Using enums to Save Memory
- New File Operations in .NET 4.0
- Building a Hierarchy of Rectangles
- A Faster File Copy
- Constants Are Forever
- The Dangers of Floating Point
- Goto is Not Inherently Evil
- The Weakest Link
- Reducing Memory Required for Strings
- Grouping with LINQ
- HttpListener "Gotchas"
- Extension Methods Are Evil
- Finding the Registered Domain in a URL
- Drawing Text
- Obfuscating Sequential Keys
- Properties of Obfuscated Keys
- Finding Changes Between Two Lists
- Using the ConcurrentBag Collection
- Never Sleep!
- Shuffling and Sorting
- Viewing Large Text Files
- Use the Right Tool
- Why GetHashCode Matters
- Optimization Guidelines
- Timer Differences
- The Mutex
- Modifying a Working System
- Building a New Type of Stream
- More Large File Problems
- A Better File.Copy Replacement
- Throwing the Wrong Exception
- Approximate Counters
- Monitoring a Timer
- Combining Consoles and Forms
- Embedding a Text Resource
- Handling Concurrent Downloads
- The Importance of Domain Knowledge
- Stupid Programmer Tricks
- Aho-Corasick Revisited
- Expressiveness is the Soul of Brevity
- Fun with Anonymous Types
- Simplifying a Multi-Threaded Application
- Work Smarter
- The Skip List Data Structure
- A More Memory-Efficient Skip List
- Selection Revisited
- Why Async?
- What the Future Holds
- The "Roslyn" CTP
- Where We've Been
- Informit Reference Library
Set Operations with IEnumerable and HashSet
Last updated Mar 14, 2003.
The HashSet collection, according to the .NET documentation, "represents a set of values." That sounds like a pretty vague description until you realize that "set" implies that it's a representation of a mathematical set. That implies a considerable amount of functionality. HashSet can make short work of many tasks that, without it, are difficult or tedious to write using other means.
In addition to the HashSet collection type, the generic IEnumerable<T> interface includes set operations. Both groups of set operations make a number of things much easier.
Set Union
The union of two sets results in a set that contains all elements of both sets, although no item is duplicated. That is, if two sets each contain an item with the key "Jim", the resulting set will have only one item with that key.
The IEnumerable interface has a Union method that produces a new collection that contains all the elements of both sets.
The HashSet UnionWith method modifies the collection so that it contains the items from both sets.
A union operation is useful when you want to combine lists and remove duplicates. Suppose, for example, you are running an election and you want to combine lists of voters from multiple precincts:
var AllWhoVoted = Precinct1Voters.Union(Precinct2Voters);
HashSet has its own UnionWith method that modifies the collection. That is, assuming that your collections are both HashSets, then:
AllWhoVoted.UnionWith(Precinct1Voters);
Will add the voters from precinct 1 to the list of all voters.
For me, HashSet is more convenient in this circumstance because I'm building a list that I want modified. Imagine you're combining results from many precincts. With the IEnumerable implementation, you end up building a new list every time:
var AllWhoVoted = Precinct1Voters.Union(Precinct2Voters); AllWhoVoted = AllWhoVoted.Union(Precinct3Voters); AllWhoVoted = AllWhoVoted.Union(Precinct4Voters); // etc.
HashSet, on the other hand, just adds items:
HashSet AllWhoVoted = new HashSet(Precinct1Voters); AllWhoVoted.UnionWith(Precinct2Voters); AllWhoVoted.UnionWith(Precinct3Voters); AllWhoVoted.UnionWith(Precinct4Voters);
The results are the same (they better be!), and they might even be arrived at the same way under the hood. But the code that uses HashSet seems more clear to me.
Set Intersection
Set intersection produces the set of items that exist in both sets. That is, if set A contains {Jim, David, Ron, Joe}, and set B contains {Jim, David, George}, then A.Intersect(B) would return {Jim, David}.
The IEnumerable interface has an Intersect method that will generate the intersection of any two collections that implement IEnumerable.
HashSet has its own IntersectWith method, which operates slightly differently than IEnumerable.Intersect. IEnumerable.Intersect creates a new collection that contains the intersection of the two collections that you passed to it. For example, assuming you have two Lists, list1 and list2:
var listIntersection = list1.Intersect(list2);
HashSet.IntersectWith, on the other hand, modifies the original collection. So if you have two HashSets called hash1 and hash2, then after executing this line of code:
hash1.IntersectWith(hash2);
hash1 contains only those items that existed in both collections.
So what good is Intersect or IntersectWith? Any time you want to see how many items in one list also exist in another, you are doing an intersection. For example, you might have a list of eligible voters and a separate list of people who voted. Assuming you have two collections that implement the generic IEnumerable interface, you could write:
var EligibleVotersWhoVoted = EligibleVoters.Intersect(PeopleWhoVoted);
One would hope that the following would produce the empty set:
var VotedTwice = Precinct1Voters.Intersect(Precinct2Voters);
Of course, if you have multiple precincts and you want to make sure no voter cast a ballot twice, things get a bit more complicated:
HashSet<Voter> VotedTwice = new HashSet<Voter>(); var AllWhoVoted = Precinct1Voters; VotedTwice.UnionWith(AllWhoVoted.Intersect (Precinct2Voters)); AllWhoVoted = AllWhoVoted.Union(Precinct2Voters); // etc
Set Difference
Another common operation is to determine which items from one set are not in another. In .NET, the methods that perform this operation are called "Except": IEnumerable<T>.Except and HashSet.ExceptWith.
Unlike union and intersection, with this operation you have to be careful about the order. That is, given two collections, A and B:
A.Union(B) is the same as B.Union(A)
A.Intersect(B) is the same as B.Intersect(A)
A.Except(B) is not the same as B.Except(A)
If A contains the values {Jim, Joe, Ron, David}, and B contains the values {Jim, Sam, George}, then:
A.Except(B) will return {Joe, Ron, David}
B.Except(A) will return {Sam, George}
In the voting analogy, set difference can be pretty interesting. How about a list of eligible voters who did not vote?
var DidNotVote = EligibleVoters.Except(PeopleWhoVoted);
Or, a better one:
var IneligibleVoters = PeopleWhoVoted.Except(EligibleVoters);
One would hope that the last would produce no results.
Perhaps it's just the kind of work that I do, but I find that I use set difference a lot. Very often, I'm trying to determine what things that were supposed to happen didn't actually take place. Typically, I'll have a log file of queued actions, a list of pending actions, and a log file of completed actions. A queued action that's not in the pending list of in the log of completed actions got lost somewhere. Either it wasn't performed or it didn't get logged.
In that case, the union of pending and logged should be equal to queued. Assuming we have three collections, Queued, Pending, and Logged, then the following will tell me which actions got lost:
var LostActions = Queued.Except(Pending.Union(Logged));
Subsets and Supersets
If you recall from your high school math, a set is a subset of another if all of its elements are also contained in the other set. That is, B is a subset of A if every element in B is also in A. B is a proper subset of A if B is a subset of A and it's not equal to A.
For example, say that the set A contains the elements {1,2,3}. The set B, containing {1,3} is a proper subset of A. The set C, containing {0,2,1} is not a subset, because the item '0' does not appear in set A. Set D, containing {1,2,3} is a subset of A, but not a proper subset of A.
Supersets are the inverse of subsets. If B is a subset of A, then A is a superset of B.
HashSet contains methods that determine if one set is a subset or superset of the other. The four methods are: IsProperSubsetOf, IsProperSuperSetOf, IsSubsetOf, and IsSupersetOf. The IEnumerable interface does not have corresponding methods.
These methods are handy when you want to know if the relationship exists, but you don't want to generate a new list. For example, imagine that ElibigleVoters and PeopleWhoVoted are both HashSet collections. You already saw that we could get a list of ineligible voters (i.e. people who voted but weren't supposed to) by writing:
var IneligibleVoters = PeopleWhoVoted.Except(EligibleVoters);
If the resulting collection has any elements, then you know that there's a problem.
But suppose we don't want the list, but just want to know if there are any ineligible voters. If there are no ineligible voters, then PeopleWhoVoted should be a subset of EligibleVoters:
bool votersOk = PeopleWhoVoted.IsSubsetOf(EligibleVoters);
Using the IEnumerable interface only, you could get the same result by writing:
bool votersOK = (PeopleWhoVoted.Except(EligibleVoters).Count() == 0);
SetEquals and SequenceEqual
HashSet.SetEquals determines if the HashSet and the specified collection are the same length and contain all the same elements. So, given these two collections:
var A = new HashSet<int>() {1, 2, 3, 4};
var B = new HashSet<int>() {4, 3, 2, 1};
A.SetEquals(B) returns true, as does B.SetEquals(A). Both collections contain the same items.
The IEnumerable interface has a method called SequenceEqual, which does much the same thing, except that the comparison is order dependent. Given the two collections above, neither A.SequenceEqual(B) nor B.SequenceEqual(A) will return true.
That leads to the rather curious result that A.SetEquals(B) == true and A.SequenceEqual(B) == false.
It looks to me like, if you want to determine if two IEnumerable collections have the same items, you first have to sort the items before calling SequenceEqual.
Or, you can use union or intersection to determine if the sets are equal. If two sequences are the same length, and the union or the intersection of the two sequences also has the same length, then the sets are equal. That is:
bool SetsAreEqual = (A.Count() == B.Count() && A.Intersect(B).Count() == A.Count())
You could replace Intersect with Union in the above code and achieve the same result.
