Home > Articles

  • Print
  • + Share This
This chapter is from the book

2.4 heapq: Heap Sort Algorithm

A heap is a tree-like data structure in which the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.

A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap.

2.4.1 Example Data

The examples in this section use the data in heapq_heapdata.py.

Listing 2.46: heapq_heapdata.py

      # This data was generated with the random module.

      data = [19, 9, 4, 10, 11]

The heap output is printed using heapq_showtree.py.

Listing 2.47: heapq_showtree.py

      import math
      from io import StringIO


      def show_tree(tree, total_width=36, fill=' '):
          """Pretty-print a tree."""
          output = StringIO()
          last_row = -1
          for i, n in enumerate(tree):
              if i:
                  row = int(math.floor(math.log(i + 1, 2)))
              else:
                  row = 0
              if row != last_row:
                  output.write('\n')
              columns = 2 ** row
              col_width = int(math.floor(total_width / columns))
              output.write(str(n).center(col_width, fill))
              last_row = row
          print(output.getvalue())
          print('-' * total_width)
          print()

2.4.2 Creating a Heap

There are two basic ways to create a heap: heappush() and heapify().

Listing 2.48: heapq_heappush.py

     import heapq
     from heapq_showtree import show_tree
     from heapq_heapdata import data

     heap = []
     print('random :', data)
     print()

     for n in data:
         print('add {:>3}:'.format(n))
         heapq.heappush(heap, n)
         show_tree(heap)

When heappush() is used, the heap sort order of the elements is maintained as new items are added from a data source.

     $ python3 heapq_heappush.py

     random : [19, 9, 4, 10, 11]

     add   19:

                      19
     ------------------------------------

     add    9:

                      9
             19
     ------------------------------------

     add    4:

                      4
             19                9
     ------------------------------------

     add   10:

                          4
               10              9
         19
     ------------------------------------
     add    11:

                           4
                10             9
         19       11
     ------------------------------------

If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.

Listing 2.49: heapq_heapify.py

     import heapq
     from heapq_showtree import show_tree
     from heapq_heapdata import data

     print('random    :', data)
     heapq.heapify(data)
     print('heapified :')
     show_tree(data)

The result of building a list in heap order one item at a time is the same as building an unordered list and then calling heapify().

     $ python3 heapq_heapify.py

     random    : [19, 9, 4, 10, 11]
     heapified :

                           4
                9              19
         10       11
     ------------------------------------

2.4.3 Accessing the Contents of a Heap

Once the heap is organized correctly, use heappop() to remove the element with the lowest value.

Listing 2.50: heapq_heappop.py

     import heapq
     from heapq_showtree import show_tree
     from heapq_heapdata import data

     print('random    :', data)
     heapq.heapify(data)
     print('heapified :')
     show_tree(data)
     print

     for i in range(2):
         smallest = heapq.heappop(data)
         print('pop    {:>3}:'.format(smallest))
         show_tree(data)

In this example, adapted from the standard library documentation, heapify() and heappop() are used to sort a list of numbers.

     $ python3 heapq_heappop.py

     random    : [19, 9, 4, 10, 11]
     heapified :

                          4
                 9             19
         10       11
     ------------------------------------

     pop         4:

                          9
                 10            19
         11
     ------------------------------------

     pop         9:

                      10
             11                19
     ------------------------------------

To remove existing elements and replace them with new values in a single operation, use heapreplace().

Listing 2.51: heapq_heapreplace.py

     import heapq
     from heapq_showtree import show_tree
     from heapq_heapdata import data

     heapq.heapify(data)
     print('start:')
     show_tree(data)

     for n in [0, 13]:
           smallest = heapq.heapreplace(data, n)
           print('replace {:>2} with {:>2}:'.format(smallest, n))
           show_tree(data)

Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority.

        $ python3 heapq_heapreplace.py

        start:

                           4
                  9               19
            10       11
        ------------------------------------

        replace   4 with   0:

                           0
                  9               19
            10       11
        ------------------------------------

        replace   0 with 13:

                           9
                  10              19
            13       11
        ------------------------------------

2.4.4 Data Extremes from a Heap

heapq also includes two functions to examine an iterable and find a range of the largest or smallest values it contains.

Listing 2.52: heapq_extremes.py

        import heapq
        from heapq_heapdata import data

        print('all       :', data)
        print('3 largest :', heapq.nlargest(3, data))
        print('from sort :', list(reversed(sorted(data)[-3:])))
        print('3 smallest:', heapq.nsmallest(3, data))
        print('from sort :', sorted(data)[:3])

Using nlargest() and nsmallest() is efficient only for relatively small values of n > 1, but can still come in handy in a few cases.

     $ python3 heapq_extremes.py

     all       : [19, 9, 4, 10, 11]
     3 largest : [19, 11, 10]
     from sort : [19, 11, 10]
     3 smallest: [4, 9, 10]
     from sort : [4, 9, 10]

2.4.5 Efficiently Merging Sorted Sequences

Combining several sorted sequences into one new sequence is easy for small data sets.

     list(sorted(itertools.chain(*data)))

For larger data sets, this technique can use a considerable amount of memory. Instead of sorting the entire combined sequence, merge() uses a heap to generate a new sequence one item at a time, determining the next item using a fixed amount of memory.

Listing 2.53: heapq_merge.py

     import heapq
     import random


     random.seed(2016)

     data = []
     for i in range(4):
         new_data = list(random.sample(range(1, 101), 5))
         new_data.sort()
         data.append(new_data)

     for i, d in enumerate(data):
         print('{}: {}'.format(i, d))

     print('\nMerged:')
     for i in heapq.merge(*data):
         print(i, end=' ')
     print()

Because the implementation of merge() uses a heap, it consumes memory based on the number of sequences being merged, rather than the number of items in those sequences.

     $ python3 heapq_merge.py

     0: [33, 58, 71, 88, 95]
     1: [10, 11, 17, 38, 91]
     2: [13, 18, 39, 61, 63]
     3: [20, 27, 31, 42, 45]
     
     Merged:
     10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95
  • + Share This
  • 🔖 Save To Your Account