Home > Articles > Programming > C/C++

  • Print
  • + Share This
From the author of Higher-Order Ranges

Higher-Order Ranges

In keeping with higher-order functions that take and return other functions, a higher-order range is one that aggregates and manipulates other ranges, while itself still offering a range interface. Building ranges that decorate other ranges is easy, useful, and fun. In fact, higher-order ranges fulfill the promise that many saw in the early days of the STL. Back then, people thought custom iterators would solve many problems, and consequently defined quite a few of them. The idea, however, has had limited success. I'm not sure why, but I hypothesize that the difficulties of defining iterators and the verboseness of using them were possible factors.

Ranges, by contrast, are very easy to define and use. The result of many algorithms are, in fact, custom ranges. Consider, for example, the classic higher-order function filter, which takes a range r plus a predicate, and returns a range that only offers the elements of r satisfying the predicate. The work that filter itself does is minimal—it merely constructs and returns a range type, call it Filter, that does the filtration in its iteration primitives.

  • Laziness. Higher-order ranges offer the opportunity of doing computation lazily, in the style preferred by functional languages, instead of eagerly. Consider, for example, the STL algorithm set_union that takes two sorted ranges and yields one sorted range containing the elements of both ranges, in linear time. set_union is eager—when it returns, it has finished the job. This approach has two problems. First, you need to create (and possibly allocate memory for) the target range. This is wasteful of both memory and time if all you want to do is peek at each element of set_union in turn and maybe finish the loop without inspecting all elements. Second, eager set_union needs to read all of its input before finishing, which simply does not work with infinite ranges.
  • A classic argument in favor of lazy evaluation is that it leads to better modular composition. This is because lazy evaluation allows for much more involved compositions, with powerful generators that construct a large data space, out of which a selector chooses the items of interest. This advantage has been beautifully argued by Hughes [9] and is famously used by Google in its implementation of the MapReduce algorithm [6]. D's standard library uses lazy evaluation wherever possible, to great effect.

  • Preserving Range Categories. Recall Retro, a range that traverses a given range backwards. Clearly the original range, call it r, must offer double-ended iteration. Question: If the original range offered random access, should Retro also offer random access? The answer is a resounding yes. As a rule, a higher-order range must offer the highest capability that it can, subject to what the original range can offer. So Retro should do something like this:
  • struct Retro<DoubleEndedRange> {
       ... as before ...
       static if (isRandomAccess(DoubleEndedRange)
             && hasLength(DoubleEndedRange)) {
          Ref<T> at(unsigned i) {
             return r.at(r.length() - 1 - i);
          }
          DoubleEndedRange slice(int i, int j) {
             return r.slice(r.length() - j,
                r.length() - i);
          }
       }
       static if (hasLength(DoubleEndedRange)) {
          unsigned length() {
             return r.length();
          }
       }
    }

    I used the CDJ#++ construct static if as an optional declaration: If the tested condition is true, then the guarded code is compiled in; otherwise, it just vanishes. The predicates hasLength and isRandomAccess use introspection to figure out during compilation whether the original range offers length and random access, respectively. Note how DoubleEndedRange also may or may not define length, depending on whether r does.

    This kind of optionally enriched interface depending on the type of the input puts a great strain on the language's static introspection mechanisms. I don't know of a way to do that in Java or C#, and in C++ things can be done, albeit with difficulty. In D, the static if construct exists and makes it easy to define isRandomAccess and hasLength. If your target language is dynamic, there should be no problem to use dynamic reflection to allow clients to discover the capabilities of a range object.

    If static introspection is available, a host of really cool stuff can be done. For example, if Retro is composed with itself (e.g., Retro<Retro<SomeRange>>), why do all the busywork? The entire construct should statically boil down to SomeRange at exactly zero computational cost.

  • Chain. One particularly interesting higher-order range is Chain, which takes two or more ranges, possibly of different categories, and offers a logically contiguous view of those ranges. Then, Chain can be used whenever one single range is expected; the user of Chain has no idea that iteration segues from one range to another. The capabilities of Chain are naturally the intersection of all capabilities of its inputs. For example, for Chain to define length, all of its contained ranges must also define length. If all contained ranges offer random access and length, then Chain offers random access as well. In that case, accessing the nth element of a Chain is proportional to the number of ranges that the Chain iterates (which could become a complexity threat if the number of ranges is large). With Chain, quite interesting operations become possible. For example, sort(Chain(a, b, c)) sorts a logical array that has three physical arrays as support. Although iteration over Chain is in a sense lazy because it doesn't create a contiguous copy of the three arrays, sort itself is not lazy—after it returns, all elements of the three arrays are arranged in sorted order.
  • + Share This
  • 🔖 Save To Your Account