 Overview

Table of Contents
 J2SE: Standard Java
 Java Windows NT Services
 Apache Velocity
 Advanced J2SE
 Bytecode Instrumentation
 Dynamic Languages and the JVM
 J2SE 1.5.0: "Tiger"
 Java SE 6
 Java 7

Core Computer Science Principles in Java (Data Structures)
 The Difference Between Trees and Hash Tables
 The Difference Between Arrays and Linked Lists
 The Difference Between Stacks and Queues
 Binary Search Trees
 AVL Trees
 RedBlack Trees
 Hash Tables
 Linked Lists
 Summary
 Annotations
 Java Generics
 Java New I/O
 Java Sound
 Java Applets
 JavaFX
 Java SE Threading
 Resource Management Using Semaphores
 Java Atomic Operations
 JavaTemplate Pages
 Executing Templates with the JtpExecutor
 Java Cryptography Extensions (JCE)
 Java Database Connectivity (JDBC) API
 Jakarta Commons  Net Class Library
 Jakarta Commons HttpClient
 Apache POI
 Regular Expressions
 JavaMail
 Cool Tools
 Building an Really Simple Syndication (RSS) Java App
 Embedding JavaScript in Java with Rhino
 Logging with Log4J
 Inside Swing
 Swing Components
 SwingX
 Swing Styled Documents
 Web Rendering in Java Swing Applications
 Java LookandFeel Graphics Repository
 Java Media Framework
 Quicktime for Java
 Media in Java Review 2008
 External Multimedia in Java
 Graphs and Charts
 Holiday Special: Electronic Greeting Card
 Media Framework: Presenter Application
 Standard Widget Toolkit
 JFace
 Java Performance Tuning
 J2EE Performance Tuning
 Caches and Pools
 Java Caching System
 EHCache
 Java Compression and Decompression
 Obfuscating Java Applications
 Continuous Integration
 Load Testing
 Tomcat Clustering
 High Scalability with Terracotta
 Troubleshooting Production Performance Issues
 Enterprise Java Testing
 Automated Unit Testing with JUnit and Ant
 Unit Testing: Tips From The Trenches
 Custom Ant Tasks
 Extensible Markup Language (XML)
 Java Web Technologies
 Web Frameworks
 Struts 2
 Wicket
 JavaServer Faces
 Distributed Programming / RMI
 Behavior Tracking Servlet Filter
 Servlet Filters
 Building a Robust Java Server
 J2EE: Enterprise Java
 Spring
 Spring 3
 Java Design Patterns
 ModelDriven Architecture
 Enterprise Messaging with ActiveMQ
 EventDriven Architecture
 XDoclet
 Hibernate
 Developing Standalone Database Applications with Hypersonic DB
 Project Backup
 J2EE Project: HandsOn
 Enterprise Java Beans (EJB) 3.0
 Disaster Recovery
 Java Management Extensions (JMX)
 ServiceOriented Architecture
 Web Services
 RESTful Web Services
 Web Services with Apache CXF
 Atom Syndication
 Project: Building a Web Photo Gallery
 J2ME: Micro Java
 Specialized J2ME
 Optional Packages
 Other Java Technologies
 Derivatives and Competitors
 Java, Engineered for Integration

Additional Resources
 The World of Java Tools
 Building Java Applications with Ant
 Managing Java Build Lifecycles with Maven
 Acceptance Testing with FitNesse
 Source Control with Subversion
 Inversion of Control and Dependency Injection
 Certification
 Roadmap: Becoming an Enterprise Java Developer
 Roadmap: Becoming an Enterprise Java Developer in 2007
 The Business of Enterprise Software
 JavaOne 2006
 JavaOne 2007
 JavaOne 2008 WrapUp
 JavaOne 2009 WrapUp
 JavaOne 2010
 JavaOne 2011
 How to Survive in a Turbulent Job Market
 How to Hire the Best Talent
 Unified Modeling Language (UML)
 Cloud Computing
 Amazon EC2 and Java
 MongoDB
 Enterprise Java in 2008 and Beyond
 Predictions for 2018
The Difference Between Trees and Hash Tables
Last updated Mar 14, 2003.
Java provides two primary Set implementations: the TreeSet and the HashSet. Furthermore, it provides two primary Map implementations: the TreeMap and the HashMap. When would you want to use one over the other? What’s the difference?
The short answer is that the TreeMap and TreeSet classes store their data (or keys in the case of the TreeMap) in a Tree whereas the HashMap and HashSet store their data in a Hash Table. The longer, and more important answer, lies in the difference between a tree and a hashtable. The difference can be described in two ways: (1) implementation details and (2) behavior, where behavior describes the functionality that is available as well as the performance under different conditions.
I’ll start with the hash table. A hash table is a contiguous region of memory, similar to an array, in which objects are hashed (a numerical value is computed for the object) into an index in that memory. For example, consider a hash table that holds 100 elements. A hash function would compute a number between 0 and 99 and then insert the item at that location. When that item is requested then the same hash function would compute the same number and know exactly where to go to retrieve that object. There are some details that I’ve overlooked, such as sizing a hash table appropriately, the effect of resizing a hash table, and how to handle collisions (two objects hash to the same value), but I’ll address those as they come up. The key takeaways for using a hash table are:
 Inserting an object into a hash table is a constant time operation
 Retrieving an object from a hash table is a constant time operation
Trees, or more specifically binary search trees, store their data in hierarchical a tree. Figure 1 shows an example of a tree.
Figure 1. The Structure of a Tree
From figure 1, the top node is the “root” of the tree. The root has one child to its left and one child to its right, in which the child to its left has a lower value and the child to the right has a higher value. This requirement is maintained throughout the entirety of the tree. In this example, the root node has a value of 7: all of its left children are less than 7 and all of its right children are greater than 7. To add a new item to the tree, such as 3, we would examine the root node 7, see that 3 is less than 7, so we would go it its left child. The left child is 5, so we would again traverse to the left child. Finally, because 3 is less than 4, the new node is inserted as the left child of 4. A tree support inserting new elements at a speed that is of the order log(n), meaning that as long as the tree maintains its balance (each node has approximately the same number of children in its left subtree as in its right subtree), you will need to examine the logarithm (base 2) of the number of elements (n) contained within the tree.
Recall that log base 2 is defined as follows:
2^n = number of nodes
So in this case we have 7 nodes, so
2^n = 7, which is just a little under 3 (2^3 = 8)
This means that in order to find an object in a tree with 7 (or 8) nodes, you would need to examine 3 nodes. And the size of the tree grows rapidly with this exponent: 2^4 = 16, 2^5 = 32, 2^6 = 64, 2^7 = 128, and so on. If your tree has 128 elements, it will take 7 operations to find a location to insert a new item and it will take a maximum of 7 operations to find an object. It is not as fast as a hash table (constant time means one operation), but it is still rather fast.
And, as with hash tables, trees are not a panacea because you might have noticed that I added the caveat “as long as the tree maintains its balance.” Unfortunately great effort must be expended in order to keep a tree balanced, which complicates its implementation, but there are several well established algorithms to keep a tree balanced.
Given this background that hash tables store its data in an array like structure with constant time inserts, searches, and deletes and trees store its data in a hierarchical tree with log(n) inserts, log(n) searches, and log(n) deletes, when would you use one over the other?
Looking at performance alone, the hash table is the preferred data structure to hold your set and map data, but it has some significant drawbacks:
 It requires more memory than is needed to hold its data. This is true because as a hash table becomes more and more populated, the chances for collision (two objects hashing to the same index), become more and more likely. In general, a hash table should not be more than 75%  80% full. This means that if you want to hold 100 items, you need to have a hash table that can hold 125 – 133 elements.
 You need to know approximately how many elements you want to hold in the hash table to avoiding having to resize the hash table. For example, consider a hash table that can hold 125 elements, has 100 elements and a load factor (how full it can become before it is resized) or 80%, and the 101st is added. The hash table has crossed that 80% threshold and must be resized: simply operation, right? Wrong! The hash function computes the index for an item in the table, based on the size of the table. Therefore, as the table is resized, to say 150, all 100 objects must be rehashed to an index between 0 and 149 (instead of the previous hash of values between 0 and 124.) Rehashing is very expensive operation that requires “n” operations – so when it occurs, your constant time insert becomes of order O(n), which is far worse than the performance of a tree.
 Because hash tables compute seemingly arbitrary values for objects, there is no way to extract objects in any natural order. For example, the numbers 1, 2, 3, and 4 may come out as 2, 1, 3, 4 or 4, 1, 3, 2 and so forth. The order is determined by where the hashing algorithm inserts the objects, not on any natural order
Contrasting hash tables with trees yields the following benefits that trees provide:
 Trees only use the amount of memory needed to hold its items
 Because trees can grow and shrink as needed, they never suffer from needing to be rehashed – the balancing operation may become complex, but at no point do the insert operation approach an order of O(n) (it never needs to examine every object in the tree during an insert)
 Because trees store objects in a hierarchical structure, it is simple to extract items in their natural order (as long as that natural order defines the condition by which the tree established its hierarchical structure.)
In short, if you know approximately how many items you need to maintain in your collection, you can maintain additional unused memory in order to avoid hash collisions, and you do not need to extract items in a natural order, then hash tables are your best choice because they offer constant time insertion, search, and deletion. On the other hand, if memory is tight, you do not know how many items you may need to store in memory, and/or you need to be able to extract objects in a natural order, then a tree is a better choice.
Another condition you might want to consider when making your decision is the balance of operations that will be performed on your collection. For example, you may not know how many items you need to store in your collection (so you wouldn’t know how to best size a hash table), but you know that data will be loaded and inserted into your collection on startup and then all future operations will be searches. In this situation, you might be willing to accept rehashing your table (possibly multiple times) during startup to gain constant time search. On the other hand, if items will be consistently added and removed and you do not know how many items will be stored in your collection, then you might be willing to accept O(log n) search time to avoid potentially rehashing a table at runtime.
The point is that there is no general correct answer to whether a hash table or a tree is better, the nature of the data and the nature of the operations you will be performing on that data will determine the correct solution. The key is for you to understand how both solutions work and the benefits and tradeoffs of both solutions so that given the constraints of the problem, you are equipped to make the best decision.