Home > Articles > Open Source > Python

  • Print
  • + Share This
This chapter is from the book 2.3 heapq—Heap Sort Algorithm Purpose The heapq module implements a min-heap sort algorithm suitable for use with Python's lists. Python Version New in 2.3 with additions in 2.5 A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or an array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items. A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python's heapq module implements a min-heap. 2.3.1 Example Data The examples in this section use the data in heapq_heapdata.py. # This data was generated with the random module. data = [19, 9, 4, 10, 11] The heap output is printed using heapq_showtree.py. import math from cStringIO import StringIO def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i+1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2**row col_width = int(math.floor((total_width * 1.0) / columns)) output.write(str(n).center(col_width, fill)) last_row = row print output.getvalue() print '-' * total_width print return 2.3.2 Creating a Heap There are two basic ways to create a heap: heappush() and heapify(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print 'random :', data print for n in data: print 'add %3d:' % n heapq.heappush(heap, n) show_tree(heap) Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source. $ python heapq_heappush.py random : [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------ If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place. import heapq from heapq_showtree import show_tree from heapq_heapdata import data print 'random :', data heapq.heapify(data) print 'heapified :' show_tree(data) The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify(). $ python heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ 2.3.3 Accessing Contents of a Heap Once the heap is organized correctly, use heappop() to remove the element with the lowest value. import heapq from heapq_showtree import show_tree from heapq_heapdata import data print 'random :', data heapq.heapify(data) print 'heapified :' show_tree(data) print for i in xrange(2): smallest = heapq.heappop(data) print 'pop %3d:' % smallest show_tree(data) In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers. $ python heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------ To remove existing elements and replace them with new values in a single operation, use heapreplace(). import heapq from heapq_showtree import show_tree from heapq_heapdata import data heapq.heapify(data) print 'start:' show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print 'replace %2d with %2d:' % (smallest, n) show_tree(data) Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority. $ python heapq_heapreplace.py start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------ 2.3.4 Data Extremes from a Heap heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains. import heapq from heapq_heapdata import data print 'all :', data print '3 largest :', heapq.nlargest(3, data) print 'from sort :', list(reversed(sorted(data)[-3:])) print '3 smallest:', heapq.nsmallest(3, data) print 'from sort :', sorted(data)[:3] Using nlargest() and nsmallest() is only efficient for relatively small values of n > 1, but can still come in handy in a few cases. $ python heapq_extremes.py all : [19, 9, 4, 10, 11] 3 largest : [19, 11, 10] from sort : [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10] See Also: heapq (http://docs.python.org/library/heapq.html) The standard library documentation for this module. Heap (data structure) (http://en.wikipedia.org/wiki/Heap_(data_structure)) Wikipedia article that provides a general description of heap data structures. Priority Queue (page 98) A priority queue implementation from Queue (page 96 in the standard library.
  • + Share This
  • 🔖 Save To Your Account