Home > Articles

Streams

This chapter is from the book

1.9. Collectors

For collecting stream elements to another target, there is a convenient collect method that takes an instance of the Collector interface. A collector is an object that accumulates elements and produces a result. The Collectors class provides a large number of factory methods for common collectors.

Suppose you want to collect all strings in a stream by concatenating them. You can call

String result = stream.collect(Collectors.joining());

If you want a delimiter between elements, pass it to the joining method:

String result = stream.collect(Collectors.joining(", "));

If your stream contains objects other than strings, you need to first convert them to strings, like this:

String result = stream.map(Object::toString).collect(Collectors.joining(", "));

If you want to reduce the stream results to a sum, count, average, maximum, or minimum, use one of the summarizing(Int|Long|Double) methods. These methods take a function that maps the stream objects to numbers and yield a result of type (Int|Long|Double)SummaryStatistics, simultaneously computing the sum, count, average, maximum, and minimum.

IntSummaryStatistics summary = stream.collect(
    Collectors.summarizingInt(String::length));
double averageWordLength = summary.getAverage();
double maxWordLength = summary.getMax();

Note that you are not calling the joining and summarizingInt methods on a stream. These are static methods of the Collectors class. They yield instances of Collector which you pass to the collect method of the Stream interface.

The joining collector yields a string, and the summarizing collectors yield four numbers. However, most collectors collect values in data structures, as you will see in the following sections.

The example program in Listing 1.5 shows how to collect elements from a stream.

Listing 1.4 v2ch01/collecting/CollectingResults.java

1.9.1. Collecting into Collections

Before the toList method was added in Java 16, you had to use the collector produced by Collectors.toList():

List<String> result = stream.collect(Collectors.toList());

Similarly, here is how you can collect stream elements into a set:

Set<String> result = stream.collect(Collectors.toSet());

These calls give you a list or set, but you cannot make any further assumptions. The collection might not be mutable, serializable, or thread-safe. If you want to control which kind of collection you get, use the following call instead:

TreeSet<String> result = stream.collect(Collectors.toCollection(TreeSet::new));

The example program in Listing 1.5 shows how to place stream elements into a collection.

Listing 1.5 v2ch01/collecting/CollectingResults.java

1.9.2. Collecting into Maps

Suppose you have a Stream<Person> and want to collect the elements into a map so that later you can look up people by their ID. Call Collectors.toMap with two functions that produce the map's keys and values. For example,

public record Person(int id, String name) {}
. . .
Map<Integer, String> idToName = people.collect(
    Collectors.toMap(Person::id, Person::name));

In the common case when the values should be the actual elements, use Function.identity() for the second function.

Map<Integer, Person> idToPerson = people.collect(
    Collectors.toMap(Person::id, Function.identity()));

If there is more than one element with the same key, there is a conflict, and the collector will throw an IllegalStateException. You can override that behavior by supplying a third function that resolves the conflict and determines the value for the key, given the existing and the new value. Your function could return the existing value, the new value, or a combination of them.

Here, we construct a map that contains, for each language in the available locales, as key its name in your default locale (such as "German"), and as value its localized name (such as "Deutsch").

Map<String, String> languageNames = Locale.availableLocales().collect(
    Collectors.toMap(
        Locale::getDisplayLanguage,
        loc -> loc.getDisplayLanguage(loc),
        (existingValue, newValue) -> existingValue));

We don't care that the same language might occur twice (for example, German in Germany and in Switzerland), so we just keep the first entry.

Now suppose we want to know all languages in a given country. Then we need a Map<String, Set<String>>. For example, the value for "Switzerland" is the set [French, German, Italian]. At first, we store a singleton set for each language. Whenever a new language is found for a given country, we form the union of the existing and the new set.

Map<String, Set<String>> countryLanguageSets = Locale.availableLocales().collect(
    Collectors.toMap(
        Locale::getDisplayCountry,
        l -> Collections.singleton(l.getDisplayLanguage()),
        (a, b) -> { // Union of a and b
            var union = new HashSet<String>(a);
            union.addAll(b);
            return union;
        }));

You will see a simpler way of obtaining this map in the next section.

If you want a TreeMap, supply the constructor as the fourth argument. You must provide a merge function. Here is one of the examples from the beginning of the section, now yielding a TreeMap:

Map<Integer, Person> idToPerson = people.collect(
    Collectors.toMap(
        Person::id,
        Function.identity(),
        (existingValue, newValue) -> { throw new IllegalStateException(); },
        TreeMap::new));

The program in Listing 1.6 gives examples of collecting stream results into maps.

Listing 1.6 v2ch01/collecting/CollectingIntoMaps.java

1.9.3. Grouping and Partitioning

In the preceding section, you saw how to collect all languages in a given country. But the process was a bit tedious. You had to generate a singleton set for each map value and then specify how to merge the existing and new values. Forming groups of values with the same characteristic is very common, so the groupingBy method supports it directly.

Let’s look at the problem of grouping locales by country. First, form this map:

Map<String, List<Locale>> countryToLocales = Locale.availableLocales().collect(
    Collectors.groupingBy(Locale::getCountry));

The function Locale::getCountry is the classifier function of the grouping. You can now look up all locales for a given country code, for example

List<Locale> swissLocales = countryToLocales.get("CH");
    // Yields locales de_CH, fr_CH, it_CH, and maybe more

When the classifier function is a predicate function (that is, a function returning a boolean value), the stream elements are partitioned into two lists: those where the function returns true and the complement. In this case, it is more efficient to use partitioningBy instead of groupingBy. For example, here we split all locales into those that use English and all others:

Map<Boolean, List<Locale>> englishAndOtherLocales = Locale.availableLocales().collect(
    Collectors.partitioningBy(l -> l.getLanguage().equals("en")));
List<Locale> englishLocales = englishAndOtherLocales.get(true);

1.9.4. Downstream Collectors

The groupingBy method yields a map whose values are lists. If you want to process those lists in some way, supply a downstream collector. For example, if you want sets instead of lists, you can use the Collectors.toSet collector that you saw in the preceding section:

Map<String, Set<Locale>> countryToLocaleSet = Locale.availableLocales().collect(
    groupingBy(Locale::getCountry, toSet()));

You can also apply groupingBy twice:

Map<String, Map<String, List<Locale>>> countryAndLanguageToLocale =
    Locale.availableLocales().collect(
        groupingBy(Locale::getCountry,
            groupingBy(Locale::getLanguage)));

Then countryAndLanguageToLocale.get("IN").get("hi") is a list of the Hindi locales in India. (There are several variants.)

Several collectors are provided for reducing collected elements to numbers:

  • counting produces a count of the collected elements. For example,

    Map<String, Long> countryToLocaleCounts = Locale.availableLocales().collect(
        groupingBy(Locale::getCountry, counting()));

    counts how many locales there are for each country.

  • summing(Int|Long|Double) and averaging(Int|Long|Double) apply a provided function to the downstream elements and produce the sum or average of the function's results. For example,

    public record City(String name, String state, int population) {}
    . . .
    Map<String, Integer> stateToCityPopulation = cities.collect(
        groupingBy(City::state, averagingInt(City::population)));

    computes the average of populations per state in a stream of cities.

  • maxBy and minBy take a comparator and produce maximum and minimum of the downstream elements. For example,

    Map<String, Optional<City>> stateToLargestCity = cities.collect(
        groupingBy(City::state,
            maxBy(Comparator.comparing(City::population))));

    produces the largest city per state.

The collectingAndThen collector adds a final processing step behind a collector. For example, if you want to know how many distinct results there are, collect them into a set and then compute the size:

Map<Character, Integer> stringCountsByStartingLetter = strings.collect(
    groupingBy(s -> s.charAt(0),
        collectingAndThen(toSet(), Set::size)));

The mapping collector does the opposite. It applies a function to each collected element and passes the results to a downstream collector.

Map<Character, Set<Integer>> stringLengthsByStartingLetter = strings.collect(
    groupingBy(s -> s.charAt(0),
        mapping(String::length, toSet())));

Here, we group strings by their first character. Within each group, we produce the lengths and collect them in a set.

The mapping method also yields a nicer solution to a problem from the preceding section—gathering a set of all languages in a country.

Map<String, Set<String>> countryToLanguages = Locale.availableLocales().collect(
    groupingBy(Locale::getDisplayCountry,
        mapping(Locale::getDisplayLanguage,
            toSet())));

There is a flatMapping method as well, for use with functions that return streams.

If the grouping or mapping function has return type int, long, or double, you can collect elements into a summary statistics object, as discussed in Section 1.8. For example,

Map<String, IntSummaryStatistics> stateToCityPopulationSummary = cities.collect(
    groupingBy(City::state,
        summarizingInt(City::population)));

Then you can get the sum, count, average, minimum, and maximum of the function values from the summary statistics objects of each group.

The filtering collector applies a filter to each group, for example:

Map<String, Set<City>> largeCitiesByState
    = cities.collect(
        groupingBy(City::state,
            filtering(c -> c.population() > 500000,
                toSet()))); // States without large cities have empty sets

Finally, you can use the teeing collector to branch into two downstream collections. This is useful whenever you need to compute more than one result from a stream. Suppose you want to collect city names and also compute their average population. You can't read a stream twice, but teeing lets you carry out two computations. Specify two downstream collectors and a function that combines the results.

record Pair<S, T>(S first, T second) {}
Pair<List<String>, Double> result = cities.filter(c -> c.state().equals("NV"))
    .collect(teeing(
        mapping(City::name, toList()), // First downstream collector
        averagingDouble(City::population), // Second downstream collector
        (list, avg) -> new Pair(list,  avg))); // Combining function

Composing collectors is powerful, but it can lead to very convoluted expressions. The best use is with groupingBy or partitioningBy to process the “downstream” map values. Otherwise, simply apply methods such as map, reduce, count, max, or min directly on streams.

The example program in Listing 1.7 demonstrates downstream collectors.

Listing 1.7 v2ch01/collecting/DownstreamCollectors.java

1.9.5. Implementing Collectors

In the collection process, a collector accumulates incoming stream elements in an internal data structure called a “result container.” Each collector can choose a suitable result container type. If the stream is parallel, multiple result containers are filled concurrently and then merged. After accumulation and merging, the collector can optionally transform the final result container into another object that is the collection result.

Specifically, a Collector must implement four methods, each of which yields a function object:

  • Supplier<A> supplier() supplies a result container of type A.
  • BiConsumer<A,T> accumulator() accumulates a stream element of type T into a result container.
  • BinaryOperator<A> combiner() combines two result containers into one.
  • Function<A, R> finisher() transforms the final result container into the result of type R.

Let’s look at the function objects of the Collectors.toList() collector:

  • The supplier is a function yielding an empty ArrayList<T> or simply ArrayList<T>::new.
  • The accumulator is (rc, t) -> rc.add(t) or List::add. (That method expression is why the result container is the first parameter.)
  • The combiner concatenates two lists: (rc1, rc2) -> { rc1.addAll(rc2); return rc1; }.
  • The finisher does nothing (which is the most common case).

The characteristics method of a collector returns a set of Collector.Characteristics flags that can help the stream with optimizing the collection process. There are three of them:

  • IDENTITY_FINISH: the finisher need not be called.
  • UNORDERED: it is ok not to call the accumulator in encounter order.
  • CONCURRENT: it is safe to call the accumulator from multiple threads on the same result container.

Collectors.joining returns a collector with none of these characteristics. The result container is a StringBuilder. The finisher must convert it into a string. Accumulation order matters. And the result container is not threadsafe.

In contrast, the toConcurrentMap collector has all three characteristics. The result container is a threadsafe ConcurrentHashMap. Insertion order does not matter, and no finishing is needed.

Let’s build a collector that statistically samples elements, accepting them with a given probability. For now, we’ll collect the accepted elements in a list.

Following the Collectors API, let’s make a method that yields the collector:

 <T> Collector<T, List<T>, List<T>> randomSampling(double p) {
    return Collector.of(ArrayList<T>::new,
            (resultContainer, e) -> {
                if (Math.random() < p)
                    resultContainer.add(e);
                },
            (rc1, rc2) -> { rc1.addAll(rc2); return rc1; },
            Function.identity(),
            Collector.Characteristics.IDENTITY_FINISH);
    }

The Collector.of method has five parameters: the four functions and a varargs parameter for the characteristics.

This collector is almost the same as the toList collector, except that the accumulator only stores some of the elements.

Here is how to use it:

Stream<Integer> numbers = . . .;
List<Integer> sampled = numbers.collect(randomSampling(0.05)); // About every 20th element

What if we don't want the accepted elements in a list? Our API should support downstream collectors. Here is how:

<T, A, R> Collector<T, A, R> randomSampling(double p, Collector<T, A, R> downstream) {
        BiConsumer<A, T> downstreamAccumulator = downstream.accumulator();
        return Collector.of(downstream.supplier(),
                (resultContainer, e) -> {
                    if (Math.random() < p)
                        downstreamAccumulator.accept(resultContainer, e);
                    },
                downstream.combiner(),
                downstream.finisher(),
                downstream.characteristics().toArray(Collector.Characteristics[]::new));
    }

Here, downstream is a collector that accepts elements of type T, with a result container of type A, and result type R.

We can reuse the downstream’s supplier, combiner, and finisher, but we must adapt its accumulator, so that only a fraction of the incoming elements are inserted.

Unfortunately, the API leaves something to be desired for this particular use case. I have to turn the downstream’s set of characteristics into an array for the varargs parameter.

The program in Listing 1.8 has the complete example.

Listing 1.8 v2ch01/collecting/ImplementingCollectors.java

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.