Home > Guides > Programming > Java

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Technology Performance Assessment

Last updated Feb 17, 2006.

Part of my "day job" is to assess the performance impact of specific design and implementation decisions on the performance of an application. My primary focus is on enterprise applications and one of the many disparate moving parts that significantly impacts the performance of an application is the application itself. This might sound silly at first, but in reality, significant problems can arise from the configuration of the environment in which your application runs, and many tuners focus solely on the configuration of the environment. While an environment can be tuned to work with the idiosyncrasies of an application, it can never be optimal if the application has inherent architectural flaws.

As such, I started this section in the Java Reference Guide where I will periodically review technologies that you might choose to integrate into your applications, but from a performance analysis focus. I will try to present objective analyses of technologies providing you with the context from which you can make your most educated decision.

ConcurrentHashMap

With the introduction of Java 5 almost 18 months ago, Sun released a set of concurrency classes and that regiment included two new concurrency collection classes:

  • ConcurrentHashMap
  • ConcurrentLinkedQueue

The purpose of concurrency in collection classes it to permit more than one thread to concurrently access the contents of the collection. When a thread accesses a collection in a multithreaded application, there is the danger that one thread can modify, add to, or delete from the contents of the collection while another thread is accessing the collection. This can lead to false results, such as the scenario when one thread obtains an object from the collection, another thread deletes the object, and then the original thread returns the object to the caller. In this scenario, the client has an out-of-synch object and he doesn't know it.

In this section we focus on the ConcurrentHashMap class and compare and contrast that with the traditional HashMap implementation. A map is a collection that maintains a set of keys and for each key there is an associated value. For example, you may map names to records about a person, including their social security number, driver's license number, and contact information. When it comes time to locate a specific person's information, you can locate the person by their name.

There are two common underlying implementations for organizing map keys:

  • Hash Table
  • Tree

A hash table is an array type structure and an item is inserted into the table by computing a numerical equivalent to the key: the hash value. The benefits to using a hash table are defined as follows:

  • Constant time insertion
  • Constant time search
  • Constant time deletion

But it has the following drawbacks:

  • You need to maintain at least 20% more memory than you are using
  • If two keys compute the same hash value then you must implement a strategy for resolving these "collisions" (and several very strong collision resolution algorithms currently exist)
  • There is not intrinsic ordering of keys so there is no way to extract objects in any recognizable order

A tree solves all of the drawbacks of a hash table by organizing its keys in a binary search tree. This yields a natural ordering of keys, such as alphabetical ordering of people's names, which enables keys to be extracted in order. There is no such thing as resolving collisions with a tree, you need to potentially "rebalance" the tree and add the new node. Finally, a tree fits perfectly in memory — it does not need to maintain any additional space.

But it resolves the hash table drawbacks at a price, and that price is performance. Insertion, searching, and deleting objects in a tree takes O(log N) time to complete, meaning that the maximum time it could take to locate an object upon which to complete one of these operations is the logarithm of the number of items in the tree. In practice this is a very fast implementation, but if you do not need to extract items in a specific order and can spare the extra memory, then a hash table is a superior implementation.

The choice of whether or not to use a hash table or tree as the organization mechanism for your map's keys is dependent on your business requirements. But if you select a hash table, then the next question that Sun has made us ask is: will this class be accessed by multiple threads? If so, then is it better to use a HashMap or a ConcurrentHashMap? Furthermore, if you want to write generic code, can you just use a ConcurrentHashMap all of the time in case you need the functionality later? What is the impact of the thread management code?

In an attempt to answer these questions, I wrote a small test program, shown in listing 1.

Listing 1. ConcurrentHashMapTest.java

package com.informit.concurrent;

import java.util.*;
import java.util.concurrent.*;

public class ConcurrentHashMapTest
{

  public static void buildMap( Map map, int size )
  {
    map.clear();
    for( int i=0; i<size; i++ )
    {
      Integer ii = new Integer( i );
      map.put( ii, ii.toString() );
    }
  }

  public static void searchMap( Map map, int size )
  {
    for( int i=0; i<size; i++ )
    {
      Integer ii = new Integer( i );
      String value = ( String )map.get( ii );
    }
  }

  public static void testMap( Map m, int size, String label )
  {
    // build
    long before = System.currentTimeMillis();
    buildMap( m, size );
    long after = System.currentTimeMillis();
    long buildTime = after - before;
    System.out.println( label + " Build time (" + size + "): " + buildTime );

    // search
    before = System.currentTimeMillis();
    searchMap( m, size );
    after = System.currentTimeMillis();
    long searchTime = after - before;
    System.out.println( label + " Search time (" + size + "): " + searchTime );
  }

  public static void threadTestMap( Map m, int size, String label, int threadCount )
  {
    // build
    long before = System.currentTimeMillis();
    buildMap( m, size );
    long after = System.currentTimeMillis();
    long buildTime = after - before;
    System.out.println( label + " Build time (" + size + "): " + buildTime );

    for( int i=0; i<threadCount; i++ )
    {
      TestThread t = new TestThread( m, size, label );
      t.start();
    }
  }



  public static void main( String[] args )
  {
    Map hashMap = new HashMap();
    Map concurrentMap = new ConcurrentHashMap<Integer, String>();

    System.out.println( "SINGLE THREADED TEST" );
    System.out.println( "====================" );
    testMap( hashMap, 1000, "HashMap" );
    testMap( hashMap, 10000, "HashMap" );
    testMap( hashMap, 100000, "HashMap" );
    testMap( hashMap, 1000000, "HashMap" );
    hashMap.clear();
    hashMap = null;

    System.out.println();
    testMap( concurrentMap, 1000, "ConcurrentMap" );
    testMap( concurrentMap, 10000, "ConcurrentMap" );
    testMap( concurrentMap, 100000, "ConcurrentMap" );
    testMap( concurrentMap, 1000000, "ConcurrentMap" );
    concurrentMap.clear();
    concurrentMap = null;

    System.out.println( "MULTI THREADED TEST" );
    System.out.println( "===================" );
    hashMap = new HashMap();
    threadTestMap( hashMap, 1000000, "HashMap", 5 );
    threadTestMap( hashMap, 1000000, "HashMap", 15 );
    hashMap.clear();
    hashMap = null;

    hashMap = Collections.synchronizedMap( new HashMap() );
    threadTestMap( hashMap, 1000000, "Synchronized HashMap", 5 );
    threadTestMap( hashMap, 1000000, "Synchronized HashMap", 15 );
    hashMap.clear();
    hashMap = null;

    System.out.println();
    concurrentMap = new ConcurrentHashMap();
    threadTestMap( concurrentMap, 1000000, "ConcurrentMap", 5 );
    threadTestMap( concurrentMap, 1000000, "ConcurrentMap", 15 );
    concurrentMap.clear();
    concurrentMap = null;
  }

}
  
class TestThread extends Thread
{
  Map m;
  int size;
  String label;

  public TestThread( Map m, int size, String label )
  {
    this.m = m;
    this.size = size;
    this.label = label;
  }
  public static void searchMap( Map map, int size )
  {
    for( int i=0; i<size; i++ )
    {
      Integer ii = new Integer( i );
      String value = ( String )map.get( ii );
    }
  }
  public void run()
  {
    // search
    long before = System.currentTimeMillis();
    searchMap( m, size );
    long after = System.currentTimeMillis();
    long searchTime = after - before;
    System.out.println( label + " Search time (" + size + "): " + searchTime );
  }
}

Admittedly, listing 1 is not the cleanest code, but then again I just wanted the results and didn't care about the scalability or maintainability of this code. The purpose was to test two things:

  1. What is the performance impact between using a ConcurrentHashMap and a HashMap in a single threaded environment?
  2. What is the performance benefit for using a ConcurrentHashMap over a HashMap in a multi-threaded environment?

In order to facilitate this test, listing 1 generates a set of predefined key and value pairs: the key is an Integer and the value is its String equivalent. It generates these in order, which is not a good test when comparing a hash table to a tree, but for the purposes of this test, it should not matter (they are both hash tables.) Listing 2 shows the results of executing listing 1 on my notebook.

Listing 2. Test Results

SINGLE THREADED TEST
====================
HashMap Build time (1000): 15
HashMap Search time (1000): 0
HashMap Build time (10000): 32
HashMap Search time (10000): 15
HashMap Build time (100000): 1281
HashMap Search time (100000): 63
HashMap Build time (1000000): 5484
HashMap Search time (1000000): 1313

ConcurrentMap Build time (1000): 0
ConcurrentMap Search time (1000): 0
ConcurrentMap Build time (10000): 15
ConcurrentMap Search time (10000): 0
ConcurrentMap Build time (100000): 266
ConcurrentMap Search time (100000): 94
ConcurrentMap Build time (1000000): 14203
ConcurrentMap Search time (1000000): 1515

MULTI THREADED TEST
===================
HashMap Build time (1000000): 4094
HashMap Search time (1000000): 2906
HashMap Search time (1000000): 3171
HashMap Search time (1000000): 3281
HashMap Search time (1000000): 3469
HashMap Search time (1000000): 3750

HashMap Build time (1000000): 4547
HashMap Search time (1000000): 2922
HashMap Search time (1000000): 3360
HashMap Search time (1000000): 3781
HashMap Search time (1000000): 3906
HashMap Search time (1000000): 2953
HashMap Search time (1000000): 5188
HashMap Search time (1000000): 4328
HashMap Search time (1000000): 5000
HashMap Search time (1000000): 4797
HashMap Search time (1000000): 4969
HashMap Search time (1000000): 5234
HashMap Search time (1000000): 6078
HashMap Search time (1000000): 6359
HashMap Search time (1000000): 6406
HashMap Search time (1000000): 6562

Synchronized HashMap Build time (1000000): 3594
Synchronized HashMap Search time (1000000): 27563
Synchronized HashMap Search time (1000000): 35641
Synchronized HashMap Search time (1000000): 35906
Synchronized HashMap Search time (1000000): 35938
Synchronized HashMap Search time (1000000): 36781

Synchronized HashMap Build time (1000000): 29485
Synchronized HashMap Search time (1000000): 37375
Synchronized HashMap Search time (1000000): 59531
Synchronized HashMap Search time (1000000): 59859
Synchronized HashMap Search time (1000000): 60265
Synchronized HashMap Search time (1000000): 60328
Synchronized HashMap Search time (1000000): 61172
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937
Synchronized HashMap Search time (1000000): 69937


ConcurrentMap Build time (1000000): 8407
ConcurrentMap Search time (1000000): 797
ConcurrentMap Search time (1000000): 2172
ConcurrentMap Search time (1000000): 2094
ConcurrentMap Search time (1000000): 3125
ConcurrentMap Search time (1000000): 4250

ConcurrentMap Build time (1000000): 4625
ConcurrentMap Search time (1000000): 1250
ConcurrentMap Search time (1000000): 1735
ConcurrentMap Search time (1000000): 2531
ConcurrentMap Search time (1000000): 1844
ConcurrentMap Search time (1000000): 2687
ConcurrentMap Search time (1000000): 2703
ConcurrentMap Search time (1000000): 2718
ConcurrentMap Search time (1000000): 2765
ConcurrentMap Search time (1000000): 2188
ConcurrentMap Search time (1000000): 2750
ConcurrentMap Search time (1000000): 2156
ConcurrentMap Search time (1000000): 1734
ConcurrentMap Search time (1000000): 1750
ConcurrentMap Search time (1000000): 3047
ConcurrentMap Search time (1000000): 1797

When using a ConcurrentHashMap in a single threaded environment, the performance is relatively similar to its HashMap equivalent for a relatively small number of objects. It actually built the 100,000 item map faster than the HashMap but had a 30ms longer search time. But as soon as we increase the map size to include 1 million objects, the build time for the ConcurrentHashMap falls apart: it took 14.2 seconds to build its map as compared to the HashMap that only took 5.5 seconds. The search time was relatively close: it only took an extra 200ms to search through 1 million records. The conclusion is that for a small number of objects, a ConcurrentHashMap is reasonable to use over a HashMap as there is only a small performance degradation. But as soon as the number of objects increases substantially then the thread management code becomes problematic to performance.

The next test subjected the maps to 5 and then 15 threads accessing their 1 million record collections. As expected, the HashMap built faster than the ConcurrentHashMap, but then search times for concurrent access are dramatically improved when using the ConcurrentHashMap; response times for searching through all 1 millions records was between 2 and 3 seconds when using a ConcurrentHashMap but between 4 and 6.5 seconds when using a HashMap.

Furthermore, HashMaps are inherently unsafe when used with multiple threads so the preferred mechanism to protect concurrent conflicts is to synchronize the map. The general practice when using a map in a multithreaded environment is to synchronize it by calling Collections.synchronizeMap() to allow only one thread to access it at any given time. When updating or modifying records, this is required because you do not want multiple threads to "clobber" each other: write over each other's work. But when reading objects from the map, synchronization is not necessary. This is one of the key benefits the ConcurrentHashMap provides to you over simply synchronizing a HashMap: it does not synchronize reads. This benefit is evident from listing 2: searches in the ConcurrentHashMap took between 2 and 3 seconds while some of the synchronized HashMap queries took more than a minute!

Summary

When writing single threaded application components, there is a performance benefit to using a HashMap over a ConcurrentHashMap, but that benefit does not become obviously apparent until you have a measurable number of items in the map. When moving to a multithreaded environment, the ConcurrentHashMap provides performance benefits over a normal HashMap and significant benefits over a synchronized HashMap. In general if you are going to have a small number of objects, such as less than 100,000, then you may consider using a ConcurrentHashMap even if your component is not currently multithreaded; it will save you time later if your component has to exist in a multithreaded environment. But when your environment is multithreaded, the ConcurrentHashMap is an absolute must! Its search abilities completely destroy the performance of a synchronized HashMap.

Some choices of data structures can see arbitrary until you examine their performance in isolation and under significant load (whether that load be threads or size of data.) In this case such an exercise revealed a real benefit that we can take back to our multithreaded applications!

Related Resources

Rachel BaylessDay 9 of #17DaysofGiveaways - Software Engineering Week
By Rachel BaylessJuly 19, 2012Comments

The Day 9 giveaway will add a nice touch to your bookshelf. We’re giving away sets of “GEEK” bookends from Etsy seller Knob Creek Metal Arts. They are perfect for keeping your software engineering books together!

Rachel BaylessDay 7 of #17DaysofGiveaways - Software Engineering Week
By Rachel BaylessJuly 17, 2012Comments
Java Application Architecture: Modularity Patterns with Examples Using OSGi by Kirk Knoernschild is today’s eBook giveaway. Knoernschild brings a one-of-a-kind perspective to Java application architecture, focusing on the way modularity will change the development of Java applications.
Rachel BaylessDay 4 of #17DaysofGiveaways - Web Development Week
By Rachel BaylessJuly 12, 2012Comments

It’s Day 4 of InformIT’s 17 Days of Giveaways. Visit yesterday’s blog for a recap of the first three day’s prizes. Today’s prize is an eBook copy of Programming in CoffeeScript by Mark Bates. Bates begins with the basics, delivers tips and techniques throughout, and finishes off with the practical applications of CoffeeScript. He teaches you how to use the robust toolset to create code, free of the bugs of JavaScript. Comment on this post for your chance to win your own copy.

See More Blogs