-
Table of Contents
- .NET Book Recommendations
- Getting Started with .NET
- The Microsoft .NET Framework
- The Common Language Runtime (CLR), the Common Type System (CTS), and the Common Language Specification (CLS)
- .NET Framework Class Library
- Visual Studio .NET
- .NET Enterprise Servers and .NET My Services
- .NET Compliant Languages
- C#
- Visual Basic .NET (VB .NET)
- ASP.NET
- XML Web Services
- ADO.NET
- XML.NET
- Windows Forms
- Why .NET?
- Displaying Errors with the Error Provider
- COM Interoperability
- Comparing Java and .NET
- Calling Unmanaged Code
- .NET Application Security
- Code Access Security
- .NET Standards Support
- Numeric Types in the .NET Framework
- Working with Strings
- Formatting Strings
- Trimming Character Strings
- Comparing Strings in .NET 2.0
- Arrays and Collections
- Arrays as Class Members
- Sorting a Multi-Dimensional Array
- Sorting a Multi-Dimensional Array with LINQ
- File I/O (System.IO)
- Working with File Names
- Using the File System
- Working with Files and Directories
- Monitoring the File System
- Working with Streams
- Working with Text Encodings
- Working with Date and Time
- Extending the DateTime Class
- Using DateTimeOffset
- Fun with Dates
- Exceptions
- Delegates
- Events
- Asynchronous Programming
- Asynchronous File I/O
- Timers
- Random Numbers
- Cryptographically Secure Random Numbers
- Serialization
- MultiThreading (System.Threading)
- Multi-Threading Overview
- The Managed Thread Pool
- Managed Threading
- Thread Synchronization
- Synchronizing Data Access
- Trace Debugging
- Tracing in .NET 2.0
- ASP.NET Trace
- Validating User Input in ASP.NET Web Pages
- Event Logging
- Monitoring Application Performance
- Accessing the Registry
- Accessing Environment Information
- Environment Variables in .NET 2.0
- Managing Windows Forms Applications
- Working with Email
- Working with Graphics
- Animating a Background
- Working with Images
- Drawing Cycloid Curves
- Simulating the Spirograph
- Building International Web Applications
- .NET Compact Framework
- Mobile Web Development with ASP.NET
- Speech Technologies
- Microsoft MapPoint Web Service
- Working with Typed DataSets
- Using Relationships in DataSets
- DataColumn Expressions
- Playing Simple Sounds
- Playing Sounds with .NET 2.0
- Returning an Image in a Web Page
- RSS
- Best Practices — Project Structure
- Best Practices — Application Blocks
- The Data Access Application Block
- The Exception Management Application Block
- Best Practices — Performance
- Best Practices — Performance and Scalability
- Best Practices - Testing
- Reading the Tea Leaves, 2005
- Predictions: A Look Back at 2005, and a Look Ahead to 2006
- .NET Downloads
- Application Deployment Overview
- Application Deployment — Versioning
- Application Deployment — Version Policy
- Application Deployment — Packaging and Distribution
- .NET Remoting Overview
- A Remoting Demonstration
- Remoting Configuration
- Remoting: Lifetimes and Leases
- Remoting: Other Issues
- Attributes
- Writing Custom Attributes
- Accessing Attributes in Code
- Reflection
- Class Design: Inheritance, Interface, or Composition?
- The TriTryst Game
- Console Applications in .NET 2.0
- New File I/O Methods in .NET 2.0
- Building Projects with MSBuild
- Unmanaged Callbacks in .NET 2.0
- Timer Troubles
- Non-Rectangular Windows Forms
- Windows Forms Transparency
- 10 Things I Hate About Visual Basic
- 10 Things I Hate About C#
- Background Processing with Idle Time
- Scaling Windows Forms
- Reading and Writing Binary Data
- New Memory Management Functions in .NET 2.0
- Compatibility Between .NET 1.1 and .NET 2.0
- Managed Debugging Assistants in .NET 2.0
- XDir: A Program for Viewing Directory Sizes
- The Microsoft.VisualBasic Namespace
- Operator Overloading
- Working with GPS Data
- Hidden Visual Studio Tools
- .NET 3.0
- The .NET 2.0 Stopwatch Class
- Nullable Types
- Drawing Rotated Text
- Unsafe Code
- Other .NET Languages
- Compiler Directives
- Safe Handles
- Predictions, 2007 Edition
- New Features in C# 3.0
- Generics
- Network Client Programming
- On the Misuse of Exceptions
- Maximum Object Size in .NET
- More on Maximum Object Sizes
- Keyed Collection Memory Limitations
- Matching String Endings
- Allocating Small Data Structures
- Grumbling About Limitations
- Some Thoughts on the Nature of What We Do
- Working with Predicates in Collections
- Working with DataReaders
- Outputting XML with XmlWriter
- Writing XML Data
- Working with Compression
- Another Look at Compressed Streams
- Compressing a Very Large File
- Canonical URIs
- Constructing URIs
- Using OneWayAttribute for Remote Calls
- Selecting a Garbage Collector
- Linked List
- Linked List Application - The MRU List
- Auto-implemented Properties in C#
- The HashSet Collection
- Looking Ahead: 2018
- An Experiment in Optimization
- A Larger Integer
- Extension Methods
- Language Integrated Query (LINQ)
- Variable Length Parameter Lists
- The ReaderWriterLockSlim Synchronization Primitive
- Sorting a Text File
- Sorting a Large Text File
- Using ListView with Large Data Sets
- LINQ One-Liners
- Regular Expression Optimization
- Random File I/O
- Computing the Size of a Structure
- More on Computing Structure Sizes
- UnmanagedMemoryStream
- Dynamically Loading Code
- Building a String Table
- Delegates Versus Function Pointers
- Visual Studio Editor Features
- A Simple Profile Timer
- New Features in C# 4.0
- IEnumerator or IList?
- New Features in .NET 4.0
- Set Operations with IEnumerable and HashSet
- Using File Locks
- Extending Object Functionality
- Clearing a HashSet
- When Hash Codes Matter
- Parsing Command Line Options
- Creating a Single-Instance Program
- Asynchronous Windows Forms Events
- The BackgroundWorker Component
- Fixing a Dumb Mistake
- Thinking About Multi-Threaded Programs
- JavaScript Object Notation
- Better JSON Processing with JSON.Net
- Useful .NET-related Sites
- Markov Models
- Building an Order 0 Markov Model
- Higher Order Markov Models
- Webmaster's Guide to robots.txt
- An Overview of the Parallel Extensions to .NET
- Parallel Extensions Synchronization Objects
- Thread Safe Collections
- A Bug and a Conundrum
- Another Bug and an Answer
- Task Parallel Library
- Good and Bad Ideas in C#
- Parallel LINQ
- Copying Large Files
- Replacing File.Copy
- Learning from Our Mistakes
- Symbolic Links
- There Is No Easy Fix
- Tracking Hurricanes
- Examining Hurricane Data
- Searching for Multiple Strings
- Simple JSON Processing
- Aho-Corasick String Searching
- Writing a Web Crawler
- Web Crawler Politeness
- Source Control Management
- Subversion
- Communicating with Datagrams
- Fun with Actions and Funcs
- The Future of Media
- The Importance of Metadata
- Of Comparison and IComparer
- IComparer, Comparer, IComparable, Oh My!
- Comparing Generic Types
- A Simple HTTP Server
- Quantizing DateTime Fields
- More Fun with the Garbage Collector
- Refactor, Don't Rewrite
- A Generic BinaryHeap Class
- A Generic File Sorter
- Birthdays, Random Numbers, and Hash Keys
- Random Selection from Large Groups
- Command Line Tools for Windows
- Reading and Writing, Bit by Bit
- Selecting the Top N Items from a Group
- Determining Website Content Encoding
- Benefits and Drawbacks of Syndication
- Pubsubhubbub
- Memory Use Misconceptions
- Risk, Lost Opportunity, and Other Hidden Upgrade Costs
- Culture Shock: from .NET to JavaScript
- Using .NET for a Startup
- Tracking Wikipedia Changes with IRC
- Browser Applications and the Same Origin Policy
- Handling the Unexpected
- Dealing with Growth
- Deleting the Oldest File
- Where Do I Put Stuff?
- .NET Timer Resolution
- Exploring Options for Better Timers
- Using the Windows Timer Queue API
- Locks Aren't Slow
- Alternatives to Locks
- Lock Free Concurrent Collections
- The BlockingCollection Class
- Customizing BlockingCollection
- What Time Is It? Daylight Saving Time and Computers
- Using enums to Save Memory
- New File Operations in .NET 4.0
- Building a Hierarchy of Rectangles
- A Faster File Copy
- Constants Are Forever
- The Dangers of Floating Point
- Goto is Not Inherently Evil
- The Weakest Link
- Reducing Memory Required for Strings
- Grouping with LINQ
- HttpListener "Gotchas"
- Extension Methods Are Evil
- Finding the Registered Domain in a URL
- Drawing Text
- Obfuscating Sequential Keys
- Properties of Obfuscated Keys
- Finding Changes Between Two Lists
- Using the ConcurrentBag Collection
- Never Sleep!
- Shuffling and Sorting
- Viewing Large Text Files
- Use the Right Tool
- Why GetHashCode Matters
- Optimization Guidelines
- Timer Differences
- The Mutex
- Modifying a Working System
- Building a New Type of Stream
- More Large File Problems
- A Better File.Copy Replacement
- Throwing the Wrong Exception
- Approximate Counters
- Monitoring a Timer
- Combining Consoles and Forms
- Embedding a Text Resource
- Handling Concurrent Downloads
- The Importance of Domain Knowledge
- Stupid Programmer Tricks
- Aho-Corasick Revisited
- Expressiveness is the Soul of Brevity
- Fun with Anonymous Types
- Simplifying a Multi-Threaded Application
- Work Smarter
- The Skip List Data Structure
- A More Memory-Efficient Skip List
- Selection Revisited
- Why Async?
- What the Future Holds
- The "Roslyn" CTP
- Where We've Been
- Informit Reference Library
A Generic BinaryHeap Class
Last updated Mar 14, 2003.
In Sorting a Large Text File, I implemented a binary heap to speed the process of selecting the smallest item from a group of items. In the article, I implemented a binary heap of integers and, later, a binary heap of strings. A binary heap, however, is a handy data structure to have, though, for a number of different applications. It should be easily instantiated to work with any data type. That is, it's a perfect candidate for a generic implementation.
You may want to re-familiarize yourself with the heap data structure and binary heap in particular before continuing.
The goal of this project is to create a generic BinaryHeap class that implements minimal useful functionality.
The interface
In order to be useful a heap has to support two operations: insert an item, and remove the root item. The addition of a constructor or two will give us a minimal, but functional, binary heap. The minimal class interface would be:
public class BinaryHeap<T>
{
public BinaryHeap(IComparer<T> comp);
public Insert(T item);
public T RemoveRoot();
}
The IComparer interface passed to the constructor contains the comparison function that's used to order items in the heap. The only difference between a min-heap (where items are sorted in ascending order and the root item is the minimum item on the heap) and a max-heap (descending order, maximum item at the root) is the comparison function.
I've elected to add a default constructor for the common case of wanting a min-heap that uses the default comparison for a given type. That is:
public BinaryHeap()
: this(Comparer<T>.Default)
{
}
That allows you to write:
var heap = new BinaryHeap<int>();
Rather than having to write:
var heap = new BinaryHeap<int>(Comparer<int>.Default);
I've also implemented the IEnumerable<T> interface, and the following methods and properties, which make the BinaryHeap a bit easier to work with:
- The Count property simply returns the number of items currently in the collection.
- The Clear method removes all items from the collection.
- The TrimExcess method helps reduce memory usage by decreasing the capacity of the collection to the current Count, if that value is less than 90% of the current capacity. This can save a lot of memory if you have a mostly empty collection. It's common to follow a call to Clear with a call to TrimExcess, which will reset the collection to the default number of items.
- The Peek method lets you examine the root item, without actually removing it from the heap.
The code
Implementation of the generic BinaryHeap class turns out to be a relatively small amount of code: about 120 lines. A little less than half of that code manages the heap itself (the Insert and RemoveRoot methods). The rest of the code is comments, the constructors, and the convenience methods mentioned above.
public class BinaryHeap<T> : IEnumerable<T>
{
private IComparer<T> Comparer;
private List<T> Items = new List<T>();
public BinaryHeap()
: this(Comparer<T>.Default)
{
}
public BinaryHeap(IComparer<T> comp)
{
Comparer = comp;
}
/// <summary>
/// Get a count of the number of items in the collection.
/// </summary>
public int Count
{
get { return Items.Count; }
}
/// <summary>
/// Removes all items from the collection.
/// </summary>
public void Clear()
{
Items.Clear();
}
/// <summary>
/// Sets the capacity to the actual number of elements in the BinaryHeap,
/// if that number is less than a threshold value.
/// </summary>
/// <remarks>
/// The current threshold value is 90% (.NET 3.5), but might change in a future release.
/// </remarks>
public void TrimExcess()
{
Items.TrimExcess();
}
/// <summary>
/// Inserts an item onto the heap.
/// </summary>
/// <param name="newItem">The item to be inserted.</param>
public void Insert(T newItem)
{
int i = Count;
Items.Add(newItem);
while (i > 0 && Comparer.Compare(Items[(i - 1) / 2], newItem) > 0)
{
Items[i] = Items[(i - 1) / 2];
i = (i - 1) / 2;
}
Items[i] = newItem;
}
/// <summary>
/// Return the root item from the collection, without removing it.
/// </summary>
/// <returns>Returns the item at the root of the heap.</returns>
public T Peek()
{
if (Items.Count == 0)
{
throw new InvalidOperationException("The heap is empty.");
}
return Items[0];
}
/// <summary>
/// Removes and returns the root item from the collection.
/// </summary>
/// <returns>Returns the item at the root of the heap.</returns>
public T RemoveRoot()
{
if (Items.Count == 0)
{
throw new InvalidOperationException("The heap is empty.");
}
// Get the first item
T rslt = Items[0];
// Get the last item and bubble it down.
T tmp = Items[Items.Count - 1];
Items.RemoveAt(Items.Count - 1);
if (Items.Count > 0)
{
int i = 0;
while (i < Items.Count / 2)
{
int j = (2 * i) + 1;
if ((j < Items.Count - 1) && (Comparer.Compare(Items[j], Items[j + 1]) > 0))
{
++j;
}
if (Comparer.Compare(Items[j], tmp) >= 0)
{
break;
}
Items[i] = Items[j];
i = j;
}
Items[i] = tmp;
}
return rslt;
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
foreach (var i in Items)
{
yield return i;
}
}
public IEnumerator GetEnumerator()
{
return GetEnumerator();
}
}
Final thoughts
Minimal as it is, this binary heap implementation is quite useful for a number of different things. It can serve as the intermediate data structure for a large file sort, and you can also use it to implement a priority queue. You can also use a heap to select, in linear time, the top N elements in an unordered list. Also, many graph algorithms rely on heaps to provide better performance.
This implementation could benefit from a few improvements. As written, the is no way to remove an arbitrary item from the heap. If the item isn't the root, you can't access it. It would be useful to have a Remove(T item) function that will find and remove an item, while maintaining the heap property. Also, it would be useful to implement a ChangeKey method that allows you to change an item's key and re-order the heap appropriately. It might also be useful to have a Merge method that combines two heaps into one. Merge could be implemented as an extension method.
The implementation above is a classic min-heap. There is also something called a min-max heap, which isn't actually a heap in that the tree doesn't meet the requirements of the heap property, but it acts like a binary heap that gives you very fast access to the lowest and highest items, and provides performance that is very close to that of min-heap.
Finally, there are many different ways to implement a heap. A binary heap is convenient because it provides reasonably good performance overall while being small and easy to implement. Other implementations have their strengths and weaknesses: faster for some operations and slower for others, and are variously more difficult to implement. If your application is highly dependent on the performance of a particular heap operation, you might be interested in investigating one of the many heap variants.
