Home > Blogs > Going Atomic

Going Atomic

By  Aug 31, 2008

Topics: Programming, Java

The performance benefits of the non-blocking, thread safe classes in Java 5 seems mystical. It was great to finally get a deep understanding of how this works thanks to JCIP. It also provided a nice opportunity to test this performance benefit.
The following is taken from the Javadoc for ConcurrentHashMap:

"A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access."

This is also described in Java Concurrency in Practice. Neither the Javadoc or JCIP go into too much detail about how such remarkable behavior is accomplished. And it really is remarkable if you think about it. It is a thread safe hashtable with no blocking on reads. None. At JCIP tells you that they will talk about it later in the book, and indeed they do in Chapter 15, Nonblocking Synchronization.

I expected some clever little algorithm. Actually the very existence of such a thing made me so glad that I am not a young developer. Why? Because the existence of such a clever algorithm would inevitably become something that oh-so clever senior developers would start using as interview questions for young developers. I can just imagine being asked "now how can we make this hashtable thread safe but without any potential blocking on reads?" Responding that it took the Java creators a dozen years to figure this out would not matter, any developer would be expected to be able to "figure it out" in thirty minutes or less.

So If I was a young developer, I would have been relieved that there is no algorithmic answer. That you have to rely on low level machine instructions, in particular the now much esteemed CAS instruction. Of course this really makes sense, but never occurred to me before reading about it in JCIP.

JCIP also has some nice charts showing the performance advantage of the atomic classes in Java that use CAS if it is available on your system. Of course I had to give this a try. So I downloaded the sample code with the intent of running it on my desktop computer. I was surprised to see that while the two pseudo random number generators are included, the driver program is not included. Luckily I had learned enough from JCIP to write my own:

import java.math.BigInteger;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

public class PseudoRandomDriver {

    private static final int NUM = 100;
    private static final int NUM_THREADS = 20;
    private static final CountDownLatch latch = new CountDownLatch(NUM);

    public static void main(String[] args) throws InterruptedException {
        final int seed = new Long(System.nanoTime()).intValue();
        final PseudoRandom locker = new ReentrantLockPseudoRandom(seed);
        //final PseudoRandom locker = new AtomicPseudoRandom(seed);
        ExecutorService exec = Executors.newFixedThreadPool(NUM_THREADS);
        long startTime = System.nanoTime();
        for (int j=0;j
            Runnable rL = new Runnable(){
                int start = seed;
                public void run() {
                    for (int i=0;i<100;i++){
                        start = locker.calculateNext(start);
                        // do some local stuff that takes awhile
                        BigInteger big = new BigInteger(String.valueOf(start));
                        big = big.abs();
                        big = big.nextProbablePrime();
        // wait until all tasks are complete before we measure duration
        long duration = System.nanoTime() - startTime;
        System.out.println("Duration = " + duration);

This let me vary the number of threads (NUM_THREADS) while keeping the number of tasks (NUM) constant and thus compare the performance. Theoretically performance should degrade as the thread count exceeds the number of processors on the system. In this case that number is two, as my desktop machine has a dual core AMD processor. Here is the chart that I generated.

Everything is pretty much as you would expect. Under low contention (low number of threads) the performance is similar. As the threads increase and thus contention increases, the non-blocking Atomic based implementation is consistently faster than the re-entrant lock based one that has to block during contention.