- 2.1 enum: Enumeration Type
- 2.2 collections: Container Data Types
- 2.3 array: Sequence of Fixed-Type Data
- 2.4 heapq: Heap Sort Algorithm
- 2.5 bisect: Maintain Lists in Sorted Order
- 2.6 queue: Thread-Safe FIFO Implementation
- 2.7 struct: Binary Data Structures
- 2.8 weakref: Impermanent References to Objects
- 2.9 copy: Duplicate Objects
- 2.10 pprint: Pretty-Print Data Structures

## 2.4 heapq: Heap Sort Algorithm

A *heap* is a tree-like data structure in which the child nodes have a sort-order relationship with the parents. *Binary heaps* can be represented using a list or array organized so that the children of element *N* are at positions 2**N*+1 and 2**N*+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.

A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s `heapq` module implements a min-heap.

### 2.4.1 Example Data

The examples in this section use the data in `heapq_heapdata.py`.

#### Listing 2.46: `heapq_heapdata.py`

# This data was generated with the random module. data = [19, 9, 4, 10, 11]

The heap output is printed using `heapq_showtree.py`.

#### Listing 2.47: `heapq_showtree.py`

import math from io import StringIO def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print()

### 2.4.2 Creating a Heap

There are two basic ways to create a heap: `heappush()` and `heapify()`.

#### Listing 2.48: `heapq_heappush.py`

import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print('random :', data) print() for n in data: print('add {:>3}:'.format(n)) heapq.heappush(heap, n) show_tree(heap)

When `heappush()` is used, the heap sort order of the elements is maintained as new items are added from a data source.

$ python3 heapq_heappush.py random : [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------

If the data is already in memory, it is more efficient to use `heapify()` to rearrange the items of the list in place.

#### Listing 2.49: `heapq_heapify.py`

import heapq from heapq_showtree import show_tree from heapq_heapdata import data print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data)

The result of building a list in heap order one item at a time is the same as building an unordered list and then calling `heapify()`.

$ python3 heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------

### 2.4.3 Accessing the Contents of a Heap

Once the heap is organized correctly, use `heappop()` to remove the element with the lowest value.

#### Listing 2.50: `heapq_heappop.py`

import heapq from heapq_showtree import show_tree from heapq_heapdata import data print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data) print for i in range(2): smallest = heapq.heappop(data) print('pop {:>3}:'.format(smallest)) show_tree(data)

In this example, adapted from the standard library documentation, `heapify()` and `heappop()` are used to sort a list of numbers.

$ python3 heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------

To remove existing elements and replace them with new values in a single operation, use `heapreplace()`.

#### Listing 2.51: `heapq_heapreplace.py`

import heapq from heapq_showtree import show_tree from heapq_heapdata import data heapq.heapify(data) print('start:') show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print('replace {:>2} with {:>2}:'.format(smallest, n)) show_tree(data)

Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority.

$ python3 heapq_heapreplace.py start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------

### 2.4.4 Data Extremes from a Heap

`heapq` also includes two functions to examine an iterable and find a range of the largest or smallest values it contains.

#### Listing 2.52: `heapq_extremes.py`

import heapq from heapq_heapdata import data print('all :', data) print('3 largest :', heapq.nlargest(3, data)) print('from sort :', list(reversed(sorted(data)[-3:]))) print('3 smallest:', heapq.nsmallest(3, data)) print('from sort :', sorted(data)[:3])

Using `nlargest()` and `nsmallest()` is efficient only for relatively small values of *n* > 1, but can still come in handy in a few cases.

$ python3 heapq_extremes.py all : [19, 9, 4, 10, 11] 3 largest : [19, 11, 10] from sort : [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10]

### 2.4.5 Efficiently Merging Sorted Sequences

Combining several sorted sequences into one new sequence is easy for small data sets.

list(sorted(itertools.chain(*data)))

For larger data sets, this technique can use a considerable amount of memory. Instead of sorting the entire combined sequence, `merge()` uses a heap to generate a new sequence one item at a time, determining the next item using a fixed amount of memory.

#### Listing 2.53: `heapq_merge.py`

import heapq import random random.seed(2016) data = [] for i in range(4): new_data = list(random.sample(range(1, 101), 5)) new_data.sort() data.append(new_data) for i, d in enumerate(data): print('{}: {}'.format(i, d)) print('\nMerged:') for i in heapq.merge(*data): print(i, end=' ') print()

Because the implementation of `merge()` uses a heap, it consumes memory based on the number of sequences being merged, rather than the number of items in those sequences.

$ python3 heapq_merge.py 0: [33, 58, 71, 88, 95] 1: [10, 11, 17, 38, 91] 2: [13, 18, 39, 61, 63] 3: [20, 27, 31, 42, 45] Merged: 10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95