Home > Articles > Programming > Java

  • Print
  • + Share This
This chapter is from the book

Concrete Collections

Rather than getting into more details about all the interfaces, we thought it would be helpful to first discuss the concrete data structures that the Java library supplies. Once we have thoroughly described the classes you might want to use, we will return to abstract considerations and see how the collections framework organizes these classes. Table 2-1 shows the collections in the Java library and briefly describes the purpose of each collection class. (For simplicity, we omit the thread-safe collections that were discussed in Chapter 1.) All classes in Table 2-1 implement the Collection interface, with the exception of the classes with names ending in Map. Those classes implement the Map interface instead. We will discuss the Map interface on page 110.

Table 2-1. Concrete Collections in the Java Library

Collection Type

Description

See Page

ArrayList

An indexed sequence that grows and shrinks dynamically

101

LinkedList

An ordered sequence that allows efficient insertions and removal at any location

93

HashSet

An unordered collection that rejects duplicates

101

TreeSet

A sorted set

104

EnumSet

A set of enumerated type values

116

LinkedHashSet

A set that remembers the order in which elements were inserted

115

PriorityQueue

A collection that allows efficient removal of the smallest element

109

HashMap

A data structure that stores key/value associations

110

TreeMap

A map in which the keys are sorted

110

EnumMap

A map in which the keys belong to an enumerated type

116

LinkedHashMap

A map that remembers the order in which entries were added

115

WeakHashMap

A map with values that can be reclaimed by the garbage collector if they are not used elsewhere

114

IdentityHashMap

A map with keys that are compared by ==, not equals

117

Linked Lists

We used arrays and their dynamic cousin, the ArrayList class, for many examples in Volume 1. However, arrays and array lists suffer from a major drawback. Removing an element from the middle of an array is expensive since all array elements beyond the removed one must be moved toward the beginning of the array (see Figure 2-4). The same is true for inserting elements in the middle.

02fig04.gif

Figure 2-4 Removing an element from an array

Another well-known data structure, the linked list, solves this problem. Whereas an array stores object references in consecutive memory locations, a linked list stores each object in a separate link. Each link also stores a reference to the next link in the sequence. In the Java programming language, all linked lists are actually doubly linked; that is, each link also stores a reference to its predecessor (see Figure 2-5).

02fig05.gif

Figure 2-5 A doubly linked list

Removing an element from the middle of a linked list is an inexpensive operation—only the links around the element to be removed need to be updated (see Figure 2-6).

02fig06.gif

Figure 2-6 Removing an element from a linked list

Perhaps you once took a data structures course in which you learned how to implement linked lists. You may have bad memories of tangling up the links when removing or adding elements in the linked list. If so, you will be pleased to learn that the Java collections library supplies a class LinkedList ready for you to use.

The following code example adds three elements and and then removes the second one.

List<String> staff = new LinkedList<String>(); // LinkedList implements List
staff.add("Amy");
staff.add("Bob");
staff.add("Carl");
Iterator iter = staff.iterator();
String first = iter.next(); // visit first element
String second = iter.next(); // visit second element
iter.remove(); // remove last visited element

There is, however, an important difference between linked lists and generic collections. A linked list is an ordered collection in which the position of the objects matters. The LinkedList.add method adds the object to the end of the list. But you often want to add objects somewhere in the middle of a list. This position-dependent add method is the responsibility of an iterator, since iterators describe positions in collections. Using iterators to add elements makes sense only for collections that have a natural ordering. For example, the set data type that we discuss in the next section does not impose any ordering on its elements. Therefore, there is no add method in the Iterator interface. Instead, the collections library supplies a subinterface ListIterator that contains an add method:

interface ListIterator<E> extends Iterator<E>
{  
   void add(E element);
   . . .
}

Unlike Collection.add, this method does not return a boolean—it is assumed that the add operation always modifies the list.

In addition, the ListIterator interface has two methods that you can use for traversing a list backwards.

E previous()
boolean hasPrevious()

Like the next method, the previous method returns the object that it skipped over.

The listIterator method of the LinkedList class returns an iterator object that implements the ListIterator interface.

ListIterator<String> iter = staff.listIterator();

The add method adds the new element before the iterator position. For example, the following code skips past the first element in the linked list and adds "Juliet" before the second element (see Figure 2-7).

List<String> staff = new LinkedList<String>(); 
staff.add("Amy");
staff.add("Bob");
staff.add("Carl");
ListIterator<String> iter = staff.listIterator();
iter.next(); // skip past first element
iter.add("Juliet");  
02fig07.gif

Figure 2-7 Adding an element to a linked list

If you call the add method multiple times, the elements are simply added in the order in which you supplied them. They are all added in turn before the current iterator position.

When you use the add operation with an iterator that was freshly returned from the listIterator method and that points to the beginning of the linked list, the newly added element becomes the new head of the list. When the iterator has passed the last element of the list (that is, when hasNext returns false), the added element becomes the new tail of the list. If the linked list has n elements, there are n+1 spots for adding a new element. These spots correspond to the n+1 possible positions of the iterator. For example, if a linked list contains three elements, A, B, and C, then there are four possible positions (marked as |) for inserting a new element:

|ABC
A|BC
AB|C
ABC|

You have to be careful with the “cursor” analogy. The remove operation does not quite work like the BACKSPACE key. Immediately after a call to next, the remove method indeed removes the element to the left of the iterator, just like the BACKSPACE key would. However, if you just called previous, the element to the right is removed. And you can't call remove twice in a row.

Unlike the add method, which depends only on the iterator position, the remove method depends on the iterator state.

Finally, a set method replaces the last element returned by a call to next or previous with a new element. For example, the following code replaces the first element of a list with a new value:

ListIterator<String> iter = list.listIterator();
String oldValue = iter.next(); // returns first element
iter.set(newValue); // sets first element to newValue

As you might imagine, if an iterator traverses a collection while another iterator is modifying it, confusing situations can occur. For example, suppose an iterator points before an element that another iterator has just removed. The iterator is now invalid and should no longer be used. The linked list iterators have been designed to detect such modifications. If an iterator finds that its collection has been modified by another iterator or by a method of the collection itself, then it throws a ConcurrentModificationException. For example, consider the following code:

List<String> list = . . .;
ListIterator<String> iter1 = list.listIterator();
ListIterator<String> iter2 = list.listIterator();
iter1.next();
iter1.remove();
iter2.next(); // throws ConcurrentModificationException

The call to iter2.next throws a ConcurrentModificationException since iter2 detects that the list was modified externally.

To avoid concurrent modification exceptions, follow this simple rule: You can attach as many iterators to a collection as you like, provided that all of them are only readers. Alternatively, you can attach a single iterator that can both read and write.

Concurrent modification detection is achieved in a simple way. The collection keeps track of the number of mutating operations (such as adding and removing elements). Each iterator keeps a separate count of the number of mutating operations that it was responsible for. At the beginning of each iterator method, the iterator simply checks whether its own mutation count equals that of the collection. If not, it throws a ConcurrentModificationException. This is an excellent check and a great improvement over the fundamentally unsafe iterators in the C++ STL framework.

There is, however, a curious exception to the detection of concurrent modifications. The linked list only keeps track of structural modifications to the list, such as adding and removing links. The set method does not count as a structural modification. You can attach multiple iterators to a linked list, all of which call set to change the contents of existing links. This capability is required for a number of algorithms in the Collections class that we discuss later in this chapter.

Now you have seen the fundamental methods of the LinkedList class. You use a ListIterator to traverse the elements of the linked list in either direction and to add and remove elements.

As you saw in the preceding section, many other useful methods for operating on linked lists are declared in the Collection interface. These are, for the most part, implemented in the AbstractCollection superclass of the LinkedList class. For example, the toString method invokes toString on all elements and produces one long string of the format [A, B, C]. This is handy for debugging. Use the contains method to check whether an element is present in a linked list. For example, the call staff.contains("Harry") returns true if the linked list already contains a string that is equal to the string "Harry".

The library also supplies a number of methods that are, from a theoretical perspective, somewhat dubious. Linked lists do not support fast random access. If you want to see the nth element of a linked list, you have to start at the beginning and skip past the first n − 1 elements first. There is no shortcut. For that reason, programmers don't usually use linked lists in programming situations in which elements need to be accessed by an integer index.

Nevertheless, the LinkedList class supplies a get method that lets you access a particular element:

LinkedList<String> list = . . .;
String obj = list.get(n);

Of course, this method is not very efficient. If you find yourself using it, you are probably using the wrong data structure for your problem.

You should never use this illusory random access method to step through a linked list. The code


for (int i = 0; i < list.size(); i++)
   do something with list.get(i);

is staggeringly inefficient. Each time you look up another element, the search starts again from the beginning of the list. The LinkedList object makes no effort to cache the position information.

The get method has one slight optimization: If the index is at least size() / 2, then the search for the element starts at the end of the list.

The list iterator interface also has a method to tell you the index of the current position. In fact, because Java iterators conceptually point between elements, it has two of them: The nextIndex method returns the integer index of the element that would be returned by the next call to next; the previousIndex method returns the index of the element that would be returned by the next call to previous. Of course, that is simply one less than nextIndex. These methods are efficient—the iterators keep a count of the current position. Finally, if you have an integer index n, then list.listIterator(n) returns an iterator that points just before the element with index n. That is, calling next yields the same element as list.get(n); obtaining that iterator is inefficient.

If you have a linked list with only a handful of elements, then you don't have to be overly paranoid about the cost of the get and set methods. But then why use a linked list in the first place? The only reason to use a linked list is to minimize the cost of insertion and removal in the middle of the list. If you have only a few elements, you can just use an ArrayList.

We recommend that you simply stay away from all methods that use an integer index to denote a position in a linked list. If you want random access into a collection, use an array or ArrayList, not a linked list.

The program in Example 2-1 puts linked lists to work. It simply creates two lists, merges them, then removes every second element from the second list, and finally tests the removeAll method. We recommend that you trace the program flow and pay special attention to the iterators. You may find it helpful to draw diagrams of the iterator positions, like this:

|ACE   |BDFG
A|CE   |BDFG
AB|CE  B|DFG
. . .

Note that the call

System.out.println(a);

prints all elements in the linked list a by invoking the toString method in AbstractCollection.

Example 2-1. LinkedListTest.java

 1. import java.util.*;
 2. 
 3. /**
 4.    This program demonstrates operations on linked lists.
 5. */
 6. public class LinkedListTest
 7. {  
 8.    public static void main(String[] args)
 9.    {  
10.       List<String> a = new LinkedList<String>();
11.       a.add("Amy");
12.       a.add("Carl");
13.       a.add("Erica");
14. 
15.       List<String> b = new LinkedList<String>();
16.       b.add("Bob");
17.       b.add("Doug");
18.       b.add("Frances");
19.       b.add("Gloria");
20. 
21.       // merge the words from b into a
22. 
23.       ListIterator<String> aIter = a.listIterator();
24.       Iterator<String> bIter = b.iterator();
25. 
26.       while (bIter.hasNext())
27.       {  
28.          if (aIter.hasNext()) aIter.next();
29.          aIter.add(bIter.next());
30.       }
31. 
32.       System.out.println(a);
33. 
34.       // remove every second word from b
35. 
36.       bIter = b.iterator();
37.       while (bIter.hasNext())
38.       {  
39.          bIter.next(); // skip one element
40.          if (bIter.hasNext())
41.          {  
42.             bIter.next(); // skip next element
43.             bIter.remove(); // remove that element
44.          }
45.       }
46. 
47.       System.out.println(b);
48. 
49.       // bulk operation: remove all words in b from a
50. 
51.       a.removeAll(b);
52. 
53.       System.out.println(a);
54.    }
55. }
api_icon.gif
   java.util.List<E> 1.2
   
  • ListIterator<E> listIterator()

    returns a list iterator for visiting the elements of the list.

  • ListIterator<E> listIterator(int index)

    returns a list iterator for visiting the elements of the list whose first call to next will return the element with the given index.

  • void add(int i, E element)

    adds an element at the specified position.

  • void addAll(int i, Collection<? extends E> elements)

    adds all elements from a collection to the specified position.

  • E remove(int i)

    removes and returns an element at the specified position.

  • E set(int i, E element)

    replaces the element at the specified position with a new element and returns the old element.

  • int indexOf(Object element)

    returns the position of the first occurrence of an element equal to the specified element, or −1 if no matching element is found.

  • int lastIndexOf(Object element)

    returns the position of the last occurrence of an element equal to the specified element, or −1 if no matching element is found.

api_icon.gif
   java.util.ListIterator<E> 1.2
   
  • void add(E newElement)

    adds an element before the current position.

  • void set(E newElement)

    replaces the last element visited by next or previous with a new element. Throws an IllegalStateException if the list structure was modified since the last call to next or previous.

  • boolean hasPrevious()

    returns true if there is another element to visit when iterating backwards through the list.

  • E previous()

    returns the previous object. Throws a NoSuchElementException if the beginning of the list has been reached.

  • int nextIndex()

    returns the index of the element that would be returned by the next call to next.

  • int previousIndex()

    returns the index of the element that would be returned by the next call to previous.

api_icon.gif
   java.util.LinkedList<E> 1.2
   
  • LinkedList()

    constructs an empty linked list.

  • LinkedList(Collection<? extends E> elements)

    constructs a linked list and adds all elements from a collection.

  • void addFirst(E element)
  • void addLast(E element)

    add an element to the beginning or the end of the list.

  • E getFirst()
  • E getLast()

    return the element at the beginning or the end of the list.

  • E removeFirst()
  • E removeLast()

    remove and return the element at the beginning or the end of the list.

Array Lists

In the preceding section, you saw the List interface and the LinkedList class that implements it. The List interface describes an ordered collection in which the position of elements matters. There are two protocols for visiting the elements: through an iterator and by random access with methods get and set. The latter is not appropriate for linked lists, but of course get and set make a lot of sense for arrays. The collections library supplies the familiar ArrayList class that also implements the List interface. An ArrayList encapsulates a dynamically reallocated array of objects.

If you are a veteran Java programmer, you may have used the Vector class whenever you needed a dynamic array. Why use an ArrayList instead of a Vector? For one simple reason: All methods of the Vector class are synchronized. It is safe to access a Vector object from two threads. But if you access a vector from only a single thread—by far the more common case—your code wastes quite a bit of time with synchronization. In contrast, the ArrayList methods are not synchronized. We recommend that you use an ArrayList instead of a Vector whenever you don't need synchronization.

Hash Sets

Linked lists and arrays let you specify the order in which you want to arrange the elements. However, if you are looking for a particular element and you don't remember its position, then you need to visit all elements until you find a match. That can be time consuming if the collection contains many elements. If you don't care about the ordering of the elements, then there are data structures that let you find elements much faster. The drawback is that those data structures give you no control over the order in which the elements appear. The data structures organize the elements in an order that is convenient for their own purposes.

A well-known data structure for finding objects quickly is the hash table. A hash table computes an integer, called the hash code, for each object. A hash code is an integer that is somehow derived from the instance fields of an object, preferably such that objects with different data yield different codes. Table 2-2 lists a few examples of hash codes that result from the hashCode method of the String class.

Table 2-2. Hash Codes Resulting from the hashCode Function

String

Hash Code

"Lee"

76268

"lee"

107020

"eel"

100300

If you define your own classes, you are responsible for implementing your own hashCode method—see Volume 1, Chapter 5 for more information. Your implementation needs to be compatible with the equals method: If a.equals(b), then a and b must have the same hash code.

What's important for now is that hash codes can be computed quickly and that the computation depends only on the state of the object that needs to be hashed, and not on the other objects in the hash table.

A hash table is an array of linked lists. Each list is called a bucket (see Figure 2-8). To find the place of an object in the table, compute its hash code and reduce it modulo the total number of buckets. The resulting number is the index of the bucket that holds the element. For example, if an object has hash code 76268 and there are 128 buckets, then the object is placed in bucket 108 (because the remainder 76268 % 128 is 108). Perhaps you are lucky and there is no other element in that bucket. Then, you simply insert the element into that bucket. Of course, it is inevitable that you sometimes hit a bucket that is already filled. This is called a hash collision. Then, you compare the new object with all objects in that bucket to see if it is already present. Provided that the hash codes are reasonably randomly distributed and the number of buckets is large enough, only a few comparisons should be necessary.

02fig08.gif

Figure 2-8 A hash table

If you want more control over the performance of the hash table, you can specify the initial bucket count. The bucket count gives the number of buckets that are used to collect objects with identical hash values. If too many elements are inserted into a hash table, the number of collisions increases and retrieval performance suffers.

If you know approximately how many elements will eventually be in the table, then you can set the bucket count. Typically, you set it to somewhere between 75% and 150% of the expected element count. Some researchers believe that it is a good idea to make the bucket count a prime number to prevent a clustering of keys. The evidence for this isn't conclusive, however. The standard library uses bucket counts that are a power of 2, with a default of 16. (Any value you supply for the table size is automatically rounded to the next power of 2.)

Of course, you do not always know how many elements you need to store, or your initial guess may be too low. If the hash table gets too full, it needs to be rehashed. To rehash the table, a table with more buckets is created, all elements are inserted into the new table, and the original table is discarded. The load factor determines when a hash table is rehashed. For example, if the load factor is 0.75 (which is the default) and the table is more than 75% full, then it is automatically rehashed, with twice as many buckets. For most applications, it is reasonable to leave the load factor at 0.75.

Hash tables can be used to implement several important data structures. The simplest among them is the set type. A set is a collection of elements without duplicates. The add method of a set first tries to find the object to be added, and adds it only if it is not yet present.

The Java collections library supplies a HashSet class that implements a set based on a hash table. You add elements with the add method. The contains method is redefined to make a fast lookup to find if an element is already present in the set. It checks only the elements in one bucket and not all elements in the collection.

The hash set iterator visits all buckets in turn. Because the hashing scatters the elements around in the table, they are visited in seemingly random order. You would only use a HashSet if you don't care about the ordering of the elements in the collection.

The sample program at the end of this section (Example 2-2) reads words from System.in, adds them to a set, and finally prints out all words in the set. For example, you can feed the program the text from Alice in Wonderland (which you can obtain from http://www.gutenberg.net) by launching it from a command shell as

java SetTest < alice30.txt

The program reads all words from the input and adds them to the hash set. It then iterates through the unique words in the set and finally prints out a count. (Alice in Wonderland has 5,909 unique words, including the copyright notice at the beginning.) The words appear in random order.

Be careful when you mutate set elements. If the hash code of an element were to change, then the element would no longer be in the correct position in the data structure.

Example 2-2. SetTest.java

 1. import java.util.*;
 2. 
 3. /**
 4.    This program uses a set to print all unique words in 
 5.    System.in.
 6. */
 7. public class SetTest
 8. { 
 9.    public static void main(String[] args)
10.    {  
11.       Set<String> words = new HashSet<String>(); // HashSet implements Set
12.       long totalTime = 0;
13. 
14.       Scanner in = new Scanner(System.in);
15.       while (in.hasNext())
16.       {  
17.          String word = in.next();
18.          long callTime = System.currentTimeMillis();
19.          words.add(word);
20.          callTime = System.currentTimeMillis() - callTime;
21.          totalTime += callTime;
22.       }
23. 
24.       Iterator<String> iter = words.iterator();
25.       for (int i = 1; i <= 20; i++)
26.          System.out.println(iter.next());
27.       System.out.println(". . .");
28.       System.out.println(words.size() + " distinct words. " + totalTime + "  milliseconds.");
29.    }
30. }
   
api_icon.gif
   java.util.HashSet<E> 1.2
   
  • HashSet()

    constructs an empty hash set.

  • HashSet(Collection<? extends E> elements)

    constructs a hash set and adds all elements from a collection.

  • HashSet(int initialCapacity)

    constructs an empty hash set with the specified capacity.

  • HashSet(int initialCapacity, float loadFactor)

    constructs an empty hash set with the specified capacity and load factor (a number between 0.0 and 1.0 that determines at what percentage of fullness the hash table will be rehashed into a larger one).

api_icon.gif
   java.lang.Object 1.0
   
  • int hashCode()

    returns a hash code for this object. A hash code can be any integer, positive or negative. The definitions of equals and hashCode must be compatible: If x.equals(y) is true, then x.hashCode() must be the same value as y.hashCode().

Tree Sets

The TreeSet class is similar to the hash set, with one added improvement. A tree set is a sorted collection. You insert elements into the collection in any order. When you iterate through the collection, the values are automatically presented in sorted order. For example, suppose you insert three strings and then visit all elements that you added.

SortedSet<String> sorter = new TreeSet<String>(); // TreeSet implements SortedSet
sorter.add("Bob");
sorter.add("Amy");
sorter.add("Carl");
for (String s : sorter) System.println(s);

Then, the values are printed in sorted order: Amy Bob Carl. As the name of the class suggests, the sorting is accomplished by a tree data structure. (The current implementation uses a red-black tree. For a detailed description of red-black trees, see, for example, Introduction to Algorithms by Thomas Cormen, Charles Leiserson, Ronald Rivest, and Clifford Stein [The MIT Press 2001].) Every time an element is added to a tree, it is placed into its proper sorting position. Therefore, the iterator always visits the elements in sorted order.

Adding an element to a tree is slower than adding it to a hash table, but it is still much faster than adding it into the right place in an array or linked list. If the tree contains n elements, then an average of log2 n comparisons are required to find the correct position for the new element. For example, if the tree already contains 1,000 elements, then adding a new element requires about 10 comparisons.

Thus, adding elements into a TreeSet is somewhat slower than adding into a HashSet—see Table 2-3 for a comparison—but the TreeSet automatically sorts the elements.

api_icon.gif
   java.util.TreeSet<E> 1.2
   

Table 2-3. Adding Elements into Hash and Tree Sets

Document

Total Number of Words

Number of Distinct Words

HashSet

TreeSet

Alice in Wonderland

28195

5909

5 sec

7 sec

The Count of Monte Cristo

466300

37545

75 sec

98 sec

  • TreeSet()

    constructs an empty tree set.

  • TreeSet(Collection<? extends E> elements)

    constructs a tree set and adds all elements from a collection.

Object Comparison

How does the TreeSet know how you want the elements sorted? By default, the tree set assumes that you insert elements that implement the Comparable interface. That interface defines a single method:

public interface Comparable<T>
{
   int compareTo(T other);
}

The call a.compareTo(b) must return 0 if a and b are equal, a negative integer if a comes before b in the sort order, and a positive integer if a comes after b. The exact value does not matter; only its sign (>0, 0, or < 0) matters. Several standard Java platform classes implement the Comparable interface. One example is the String class. Its compareTo method compares strings in dictionary order (sometimes called lexicographic order).

If you insert your own objects, you must define a sort order yourself by implementing the Comparable interface. There is no default implementation of compareTo in the Object class.

For example, here is how you can sort Item objects by part number.

   class Item implements Comparable<Item>
   {  
   public int compareTo(Item other)
   {   
   return partNumber - other.partNumber;
   }
   . . .
   }
   

If you compare two positive integers, such as part numbers in our example, then you can simply return their difference—it will be negative if the first item should come before the second item, zero if the part numbers are identical, and positive otherwise.

This trick only works if the integers are from a small enough range. If x is a large positive integer and y is a large negative integer, then the difference xy can overflow.

However, using the Comparable interface for defining the sort order has obvious limitations. A given class can implement the interface only once. But what can you do if you need to sort a bunch of items by part number in one collection and by description in another? Furthermore, what can you do if you need to sort objects of a class whose creator didn't bother to implement the Comparable interface?

In those situations, you tell the tree set to use a different comparison method, by passing a Comparator object into the TreeSet constructor. The Comparator interface declares a compare method with two explicit parameters:

public interface Comparator<T>
{
   int compare(T a, T b);
}

Just like the compareTo method, the compare method returns a negative integer if a comes before b, zero if they are identical, or a positive integer otherwise.

To sort items by their description, simply define a class that implements the Comparator interface:

class ItemComparator implements Comparator<Item>
{  
   public int compare(Item a, Item b)
   {  
      String descrA = a.getDescription();
      String descrB = b.getDescription();
      return descrA.compareTo(descrB);
   }
}

You then pass an object of this class to the tree set constructor:

ItemComparator comp = new ItemComparator();
SortedSet<Item> sortByDescription = new TreeSet<Item>(comp);

If you construct a tree with a comparator, it uses this object whenever it needs to compare two elements.

Note that this item comparator has no data. It is just a holder for the comparison method. Such an object is sometimes called a function object.

Function objects are commonly defined “on the fly,” as instances of anonymous inner classes:

SortedSet<Item> sortByDescription = new TreeSet<Item>(new
   Comparator<Item>()
   {  
   public int compare(Item a, Item b)
   {  
   String descrA = a.getDescription();
   String descrB = b.getDescription();
   return descrA.compareTo(descrB);
   }
   });
   

Actually, the Comparator<T> interface is declared to have two methods: compare and equals. Of course, every class has an equals method; thus, there seems little benefit in adding the method to the interface declaration. The API documentation explains that you need not override the equals method but that doing so may yield improved performance in some cases. For example, the addAll method of the TreeSet class can work more effectively if you add elements from another set that uses the same comparator.

If you look back at Table 2-3, you may well wonder if you should always use a tree set instead of a hash set. After all, adding elements does not seem to take much longer, and the elements are automatically sorted. The answer depends on the data that you are collecting. If you don't need the data sorted, there is no reason to pay for the sorting overhead. More important, with some data it is very difficult to come up with a sort order. Suppose you collect a bunch of rectangles. How do you sort them? By area? You can have two different rectangles with different positions but the same area. If you sort by area, the second one is not inserted into the set. The sort order for a tree must be a total ordering: Any two elements must be comparable, and the comparison can only be zero if the elements are equal. There is such a sort order for rectangles (the lexicographic ordering on its coordinates), but it is unnatural and cumbersome to compute. In contrast, hash functions are usually easier to define. They only need to do a reasonably good job of scrambling the objects, whereas comparison functions must tell objects apart with complete precision.

The program in Example 2-3 builds two tree sets of Item objects. The first one is sorted by part number, the default sort order of Item objects. The second set is sorted by description, by means of a custom comparator.

Example 2-3. TreeSetTest.java

 1. import java.util.*;
 2. 
 3. /**
 4.    This program sorts a set of items by comparing
 5.    their descriptions.
 6. */
 7. public class TreeSetTest
 8. {  
 9.    public static void main(String[] args)
10.    {  
11.       SortedSet<Item> parts = new TreeSet<Item>();
12.       parts.add(new Item("Toaster", 1234));
13.       parts.add(new Item("Widget", 4562));
14.       parts.add(new Item("Modem", 9912));
15.       System.out.println(parts);
16. 
17.       SortedSet<Item> sortByDescription = new TreeSet<Item>(new
18.          Comparator<Item>()
19.          {  
20.             public int compare(Item a, Item b)
21.             {  
22.                String descrA = a.getDescription();
23.                String descrB = b.getDescription();
24.                return descrA.compareTo(descrB);
25.             }
26.          });
27. 
28.       sortByDescription.addAll(parts);
29.       System.out.println(sortByDescription);
30.    }
31. }
32. 
33. /**
34.    An item with a description and a part number.
35. */
36. class Item implements Comparable<Item>
37. { 
38.    /**
39.       Constructs an item.
40.       @param aDescription the item's description
41.       @param aPartNumber the item's part number
42.    */
43.    public Item(String aDescription, int aPartNumber)
44.    {  
45.       description = aDescription;
46.       partNumber = aPartNumber;
47.    }
48. 
49.    /**
50.       Gets the description of this item.
51.       @return the description
52.    */
53.    public String getDescription()
54.    {  
55.       return description;
56.    }
57. 
58.    public String toString()
59.    {  
60.       return "[description=" + description
61.          + ", partNumber=" + partNumber + "]";
62.    }
63. 
64.    public boolean equals(Object otherObject)
65.    {  
66.       if (this == otherObject) return true;
67.       if (otherObject == null) return false;
68.       if (getClass() != otherObject.getClass()) return false;
69.       Item other = (Item) otherObject;
70.       return description.equals(other.description)
71.          && partNumber == other.partNumber;
72.    }
73. 
74.    public int hashCode()
75.    {  
76.       return 13 * description.hashCode() + 17 * partNumber;
77.    }
78. 
79.    public int compareTo(Item other)
80.    {  
81.       return partNumber - other.partNumber;
82.    }
83. 
84.    private String description;
85.    private int partNumber;
86. }
api_icon.gif
   java.lang.Comparable<T> 1.2
   
  • int compareTo(T other)

    compares this object with another object and returns a negative value if this comes before other, zero if they are considered identical in the sort order, and a positive value if this comes after other.

api_icon.gif
   java.util.Comparator<T> 1.2
   
  • int compare(T a, T b)

    compares two objects and returns a negative value if a comes before b, zero if they are considered identical in the sort order, and a positive value if a comes after b.

api_icon.gif
   java.util.SortedSet<E> 1.2
   
  • Comparator<? super E> comparator()

    returns the comparator used for sorting the elements, or null if the elements are compared with the compareTo method of the Comparable interface.

  • E first()
  • E last()

    return the smallest or largest element in the sorted set.

api_icon.gif
   java.util.TreeSet<E> 1.2
   
  • TreeSet()

    constructs a tree set for storing Comparable objects.

  • TreeSet(Comparator<? super E> c)

    constructs a tree set and uses the specified comparator for sorting its elements.

  • TreeSet(SortedSet<? extends E> elements)

    constructs a tree set, adds all elements from a sorted set, and uses the same element comparator as the given sorted set.

Priority Queues

A priority queue retrieves elements in sorted order after they were inserted in arbitrary order. That is, whenever you call the remove method, you get the smallest element currently in the priority queue. However, the priority queue does not sort all its elements. If you iterate over the elements, they are not necessarily sorted. The priority queue makes use of an elegant and efficient data structure, called a heap. A heap is a self-organizing binary tree in which the add and remove operations cause the smallest element to gravitate to the root, without wasting time on sorting all elements.

Just like a TreeSet, a priority queue can either hold elements of a class that implements the Comparable interface or a Comparator object you supply in the constructor.

A typical use for a priority queue is job scheduling. Each job has a priority. Jobs are added in random order. Whenever a new job can be started, the highest-priority job is removed from the queue. (Since it is traditional for priority 1 to be the “highest” priority, the remove operation yields the minimum element.)

Example 2-4 shows a priority queue in action. Unlike iteration in a TreeSet, the iteration here does not visit the elements in sorted order. However, removal always yields the smallest remaining element.

Example 2-4. PriorityQueueTest.java

 1. import java.util.*;
 2. 
 3. /**
 4.    This program demonstrates the use of a priority queue.
 5. */
 6. public class PriorityQueueTest
 7. {  
 8.    public static void main(String[] args)
 9.    {  
10.       PriorityQueue<GregorianCalendar> pq = new PriorityQueue<GregorianCalendar>();
11.       pq.add(new GregorianCalendar(1906, Calendar.DECEMBER, 9)); // G. Hopper
12.       pq.add(new GregorianCalendar(1815, Calendar.DECEMBER, 10)); // A. Lovelace
13.       pq.add(new GregorianCalendar(1903, Calendar.DECEMBER, 3)); // J. von Neumann
14.       pq.add(new GregorianCalendar(1910, Calendar.JUNE, 22)); // K. Zuse
15. 
16.       System.out.println("Iterating over elements...");
17.       for (GregorianCalendar date : pq)
18.          System.out.println(date.get(Calendar.YEAR));
19.       System.out.println("Removing elements...");
20.       while (!pq.isEmpty())
21.          System.out.println(pq.remove().get(Calendar.YEAR));
22.    }
23. }
api_icon.gif
   java.util.PriorityQueue 5.0
   
  • PriorityQueue()
  • PriorityQueue(int initialCapacity)

    construct a tree set for storing Comparable objects.

  • PriorityQueue(int initialCapacity, Comparator<? super E> c)

    constructs a tree set and uses the specified comparator for sorting its elements.

Maps

A set is a collection that lets you quickly find an existing element. However, to look up an element, you need to have an exact copy of the element to find. That isn't a very common lookup—usually, you have some key information, and you want to look up the associated element. The map data structure serves that purpose. A map stores key/value pairs. You can find a value if you provide the key. For example, you may store a table of employee records, where the keys are the employee IDs and the values are Employee objects.

The Java library supplies two general-purpose implementations for maps: HashMap and TreeMap. Both classes implement the Map interface.

A hash map hashes the keys, and a tree map uses a total ordering on the keys to organize them in a search tree. The hash or comparison function is applied only to the keys. The values associated with the keys are not hashed or compared.

Should you choose a hash map or a tree map? As with sets, hashing is a bit faster, and it is the preferred choice if you don't need to visit the keys in sorted order.

Here is how you set up a hash map for storing employees.

Map<String, Employee> staff = new HashMap<String, Employee>(); // HashMap implements Map
Employee harry = new Employee("Harry Hacker");
staff.put("987-98-9996", harry);
. . .

Whenever you add an object to a map, you must supply a key as well. In our case, the key is a string, and the corresponding value is an Employee object.

To retrieve an object, you must use (and, therefore, remember) the key.

String s = "987-98-9996";
e = staff.get(s); // gets harry

If no information is stored in the map with the particular key specified, then get returns null.

Keys must be unique. You cannot store two values with the same key. If you call the put method twice with the same key, then the second value replaces the first one. In fact, put returns the previous value stored with the key parameter.

The remove method removes an element with a given key from the map. The size method returns the number of entries in the map.

The collections framework does not consider a map itself as a collection. (Other frameworks for data structures consider a map as a collection of pairs, or as a collection of values that is indexed by the keys.) However, you can obtain views of the map, objects that implement the Collection interface, or one of its subinterfaces.

There are three views: the set of keys, the collection of values (which is not a set), and the set of key/value pairs. The keys and key/value pairs form a set because there can be only one copy of a key in a map. The methods

Set<K> keySet()
Collection<K> values()
Set<Map.Entry<K, V>> entrySet()

return these three views. (The elements of the entry set are objects of the static inner class Map.Entry.)

Note that the keySet is not a HashSet or TreeSet, but an object of some other class that implements the Set interface. The Set interface extends the Collection interface. Therefore, you can use a keySet as you would use any collection.

For example, you can enumerate all keys of a map:


Set<String> keys = map.keySet();
for (String key : keys)
{  
       do something with key
} 

If you want to look at both keys and values, then you can avoid value lookups by enumerating the entries. Use the following code skeleton:


for (Map.Entry<String, Employee> entry : staff.entrySet())
{  
   String key = entry.getKey();
   Employee value = entry.getValue();
   do something with keyvalue
}

If you invoke the remove method of the iterator, you actually remove the key and its associated value from the map. However, you cannot add an element to the key set view. It makes no sense to add a key without also adding a value. If you try to invoke the add method, it throws an UnsupportedOperationException. The entry set view has the same restriction, even though it would make conceptual sense to add a new key/value pair.

Example 2-5 illustrates a map at work. We first add key/value pairs to a map. Then, we remove one key from the map, which removes its associated value as well. Next, we change the value that is associated with a key and call the get method to look up a value. Finally, we iterate through the entry set.

Example 2-5. MapTest.java

 1. import java.util.*;
 2. 
 3. /**
 4.    This program demonstrates the use of a map with key type
 5.    String and value type Employee.
 6. */
 7. public class MapTest
 8. {  
 9.    public static void main(String[] args)
10.    {  
11.       Map<String, Employee> staff = new HashMap<String, Employee>();
12.       staff.put("144-25-5464", new Employee("Amy Lee"));
13.       staff.put("567-24-2546", new Employee("Harry Hacker"));
14.       staff.put("157-62-7935", new Employee("Gary Cooper"));
15.       staff.put("456-62-5527", new Employee("Francesca Cruz"));
16. 
17.       // print all entries
18. 
19.       System.out.println(staff);
20. 
21.       // remove an entry
22. 
23.       staff.remove("567-24-2546");
24. 
25.       // replace an entry
26. 
27.       staff.put("456-62-5527", new Employee("Francesca Miller"));
28. 
29.       // look up a value
30. 
31.       System.out.println(staff.get("157-62-7935"));
32. 
33.       // iterate through all entries
34. 
35.       for (Map.Entry<String, Employee> entry : staff.entrySet())
36.       {  
37.          String key = entry.getKey();
38.          Employee value = entry.getValue();
39.          System.out.println("key=" + key + ", value=" + value);
40.       }
41.    }
42. }
43. 
44. /**
45.    A minimalist employee class for testing purposes.
46. */
47. class Employee
48. { 
49.    /**
50.       Constructs an employee with $0 salary.
51.       @param n the employee name
52.    */
53.    public Employee(String n)
54.    {  
55.       name = n;
56.       salary = 0;
57.    }
58. 
59.    public String toString()
60.    {  
61.       return "[name=" + name + ", salary=" + salary + "]";
62.    }
63. 
64.    private String name;
65.    private double salary;
66. }
api_icon.gif
   java.util.Map<K, V> 1.2
   
  • V get(K key)

    gets the value associated with the key; returns the object associated with the key, or null if the key is not found in the map. The key may be null.

  • V put(K key, V value)

    puts the association of a key and a value into the map. If the key is already present, the new object replaces the old one previously associated with the key. This method returns the old value of the key, or null if the key was not previously present. The key may be null, but the value must not be null.

  • void putAll(Map<? extends K, ? extends V> entries)

    adds all entries from the specified map to this map.

  • boolean containsKey(Object key)

    returns true if the key is present in the map.

  • boolean containsValue(Object value)

    returns true if the value is present in the map.

  • Set<Map.Entry<K, V>> entrySet()

    returns a set view of Map.Entry objects, the key/value pairs in the map. You can remove elements from this set and they are removed from the map, but you cannot add any elements.

  • Set<K> keySet()

    returns a set view of all keys in the map. You can remove elements from this set and the keys and associated values are removed from the map, but you cannot add any elements.

  • Collection<V> values()

    returns a collection view of all values in the map. You can remove elements from this set and the removed value and its key are removed from the map, but you cannot add any elements.

api_icon.gif
   java.util.Map.Entry<K, V> 1.2
   
  • K getKey()
  • V getValue()

    return the key or value of this entry.

  • V setValue(V newValue)

    changes the value in the associated map to the new value and returns the old value.

api_icon.gif
   java.util.HashMap<K, V> 1.2
   
  • HashMap()
  • HashMap(int initialCapacity)
  • HashMap(int initialCapacity, float loadFactor)

    construct an empty hash map with the specified capacity and load factor (a number between 0.0 and 1.0 that determines at what percentage of fullness the hash table will be rehashed into a larger one). The default load factor is 0.75.

api_icon.gif
   java.util.TreeMap<K,V> 1.2
   
  • TreeMap(Comparator<? super K> c)

    constructs a tree map and uses the specified comparator for sorting its keys.

  • TreeMap(Map<? extends K, ? extends V> entries)

    constructs a tree map and adds all entries from a map.

  • TreeMap(SortedMap<? extends K, ? extends V> entries)

    constructs a tree map, adds all entries from a sorted map, and uses the same element comparator as the given sorted map.

api_icon.gif
   java.util.SortedMap<K, V> 1.2
   
  • Comparator<? super K> comparator()

    returns the comparator used for sorting the keys, or null if the keys are compared with the compareTo method of the Comparable interface.

  • K firstKey()
  • K lastKey()

    return the smallest or largest key in the map.

Specialized Set and Map Classes

The collection class library has several map classes for specialized needs that we briefly discuss in this section.

Weak Hash Maps

The WeakHashMap class was designed to solve an interesting problem. What happens with a value whose key is no longer used anywhere in your program? Suppose the last reference to a key has gone away. Then, there is no longer any way to refer to the value object. But because no part of the program has the key any more, the key/value pair cannot be removed from the map. Why can't the garbage collector remove it? Isn't it the job of the garbage collector to remove unused objects?

Unfortunately, it isn't quite so simple. The garbage collector traces live objects. As long as the map object is live, then all buckets in it are live and they won't be reclaimed. Thus, your program should take care to remove unused values from long-lived maps. Or, you can use a WeakHashMap instead. This data structure cooperates with the garbage collector to remove key/value pairs when the only reference to the key is the one from the hash table entry.

Here are the inner workings of this mechanism. The WeakHashMap uses weak references to hold keys. A WeakReference object holds a reference to another object, in our case, a hash table key. Objects of this type are treated in a special way by the garbage collector. Normally, if the garbage collector finds that a particular object has no references to it, it simply reclaims the object. However, if the object is reachable only by a WeakReference, the garbage collector still reclaims the object, but it places the weak reference that led to it into a queue. The operations of the WeakHashMap periodically check that queue for newly arrived weak references. The arrival of a weak reference in the queue signifies that the key was no longer used by anyone and that it has been collected. The WeakHashMap then removes the associated entry.

Linked Hash Sets and Maps

JDK 1.4 adds classes LinkedHashSet and LinkedHashMap that remember in which order you inserted items. That way, you avoid the seemingly random order of items in a hash table. As entries are inserted into the table, they are joined in a doubly linked list (see Figure 2-9).

02fig09.gif

Figure 2-9 A linked hash table

For example, consider the following map insertions from Example 2-5:

Map staff = new LinkedHashMap();
staff.put("144-25-5464", new Employee("Amy Lee"));
staff.put("567-24-2546", new Employee("Harry Hacker"));
staff.put("157-62-7935", new Employee("Gary Cooper"));
staff.put("456-62-5527", new Employee("Francesca Cruz"));

Then staff.keySet().iterator() enumerates the keys in the order:

144-25-5464
567-24-2546
157-62-7935
456-62-5527

and staff.values().iterator() enumerates the values in the order:

Amy Lee
Harry Hacker
Gary Cooper
Francesca Cruz

A linked hash map can alternatively use access order, not insertion order, to iterate through the map entries. Every time you call get or put, the affected entry is removed from its current position and placed at the end of the linked list of entries. (Only the position in the linked list of entries is affected, not the hash table bucket. An entry always stays in the bucket that corresponds to the hash code of the key.) To construct such a hash map, call

LinkedHashMap<K, V>(initialCapacity, loadFactor, true)

Access order is useful for implementing a “least recently used” discipline for a cache. For example, you may want to keep frequently accessed entries in memory and read less frequently accessed objects from a database. When you don't find an entry in the table, and the table is already pretty full, then you can get an iterator into the table and remove the first few elements that it enumerates. Those entries were the least recently used ones.

You can even automate that process. Form a subclass of LinkedHashMap and override the method

protected boolean removeEldestEntry(Map.Entry<K, V> eldest)

Adding a new entry then causes the eldest entry to be removed whenever your method returns true. For example, the following cache is kept at a size of at most 100 elements:

Map<K, V> cache = new 
   LinkedHashMap<K, V>(128, 0.75F, true)
   {
      protected boolean removeEldestEntry(Map.Entry<K, V> eldest)
      {
         return size() > 100;
      }
   };

Alternatively, you can consider the eldest entry to decide whether to remove it. For example, you may want to check a time stamp stored with the entry.

Enumeration Sets and Maps

The EnumSet is an efficient set implementation with elements that belong to an enumerated type. Because an enumerated type has a finite number of instances, the EnumSet is internally implemented simply as a sequence of bits. A bit is turned on if the corresponding value is present in the set.

The EnumSet class has no public constructors. You use a static factory method to construct the set:

enum Weekday { MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY };
EnumSet<Weekday> always = EnumSet.allOf(Weekday.class);
EnumSet<Weekday> never = EnumSet.noneOf(Weekday.class);
EnumSet<Weekday> workday = EnumSet.range(Weekday.MONDAY, Weekday.FRIDAY);
EnumSet<Weekday> mwf = EnumSet.of(Weekday.MONDAY, Weekday.WEDNESDAY, Weekday.FRIDAY); 

You can use the usual methods of the Set interface to modify an EnumSet.

An EnumMap is a map with keys that belong to an enumerated type. It is simply and efficiently implemented as an array of values. You need to specify the key type in the constructor:

EnumMap<Weekday, Employee> personInCharge = new EnumMap<Weekday, Employee>(Weekday.class);

In the API documentation for EnumSet, you will see odd-looking type parameters of the form E extends Enum<E>. This simply means “E is an enumerated type.” All enumerated types extend the generic Enum class. For example, Weekday extends Enum<Weekday>.

Identity Hash Maps

JDK 1.4 adds another class IdentityHashMap for another quite specialized purpose, where the hash values for the keys should not be computed by the hashCode method but by the System.identityHashCode method. That's the method that Object.hashCode uses to compute a hash code from the object's memory address. Also, for comparison of objects, the IdentityHashMap uses ==, not equals.

In other words, different key objects are considered distinct even if they have equal contents. This class is useful for implementing object traversal algorithms (such as object serialization), in which you want to keep track of which objects have already been traversed.

api_icon.gif
   java.util.WeakHashMap<K, V> 1.2
   
  • WeakHashMap()
  • WeakHashMap(int initialCapacity)
  • WeakHashMap(int initialCapacity, float loadFactor)

    construct an empty hash map with the specified capacity and load factor.

api_icon.gif
   java.util.LinkedHashSet<E> 1.4
   
  • LinkedHashSet()
  • LinkedHashSet(int initialCapacity)
  • LinkedHashSet(int initialCapacity, float loadFactor)

    construct an empty linked hash set with the specified capacity and load factor.

api_icon.gif
   java.util.LinkedHashMap<K, V> 1.4
   
  • LinkedHashMap()
  • LinkedHashMap(int initialCapacity)
  • LinkedHashMap(int initialCapacity, float loadFactor)
  • LinkedHashMap(int initialCapacity, float loadFactor, boolean accessOrder)

    construct an empty linked hash map with the specified capacity, load factor, and ordering. The accessOrder parameter is true for access order, false for insertion order.

  • protected boolean removeEldestEntry(Map.Entry<K, V> eldest)

    should be overridden to return true if you want the eldest entry to be removed. The eldest parameter is the entry whose removal is being contemplated. This method is called after an entry has been added to the map. The default implementation returns false—old elements are not removed by default. However, you can redefine this method to selectively return true; for example, if the eldest entry fits a certain condition or the map exceeds a certain size.

api_icon.gif
   java.util.EnumSet<E extends Enum<E>> 5.0
   
  • static <E extends Enum<E>> EnumSet<E> allOf(Class<E> enumType)

    returns a set that contains all values of the given enumerated type.

  • static <E extends Enum<E>> EnumSet<E> noneOf(Class<E> enumType)

    returns an empty set, capable of holding values of the given enumerated type.

  • static <E extends Enum<E>> EnumSet<E> range(E from, E to)

    returns a set that contains all values between from and to (inclusive).

  • static <E extends Enum<E>> EnumSet<E> of(E value)
  • static <E extends Enum<E>> EnumSet<E> of(E value, E... values)

    return a set that contains the given values.

api_icon.gif
   java.util.EnumMap<K extends Enum<K>, V> 5.0
   
  • EnumMap(Class<K> keyType)

    constructs an empty map whose keys have the given type.

api_icon.gif
   java.util.IdentityHashMap<K, V> 1.4
   
  • IdentityHashMap()
  • IdentityHashMap(int expectedMaxSize)

    construct an empty identity hash map whose capacity is the smallest power of 2 exceeding 1.5 * expectedMaxSize. (The default for expectedMaxSize is 21.)

api_icon.gif
   java.lang.System 1.0
   
  • static int identityHashCode(Object obj) 1.1

    returns the same hash code (derived from the object's memory address) that Object.hashCode computes, even if the class to which obj belongs has redefined the hashCode method.

  • + Share This
  • 🔖 Save To Your Account