Home > Articles > Programming > Java

This chapter is from the book

This chapter is from the book

Data Structure Resizing

Java applications tend to make high use of Java SE's StringBuilder or StringBuffer for assembling Strings and also make high use of Java objects that act as containers of data such as the Java SE Collections classes. Both StringBuilder and StringBuffer use an underlying char[] for their data storage. As elements are added to a StringBuilder or StringBuffer, the underlying char[] data storage, may be subject to resizing. As a result of resizing, a new larger char[] array is allocated, the char elements in the old char[] are copied into the new larger char[] array, and the old char[] discarded, that is, available for garbage collection. Similar resizing can also occur in Java SE Collections classes that use an array for their underlying data store.

This section explores ways to identify data structure resizing, in particular StringBuilder, StringBuffer, and Java SE Collections classes resizing.

StringBuilder/StringBuffer Resizing

When a StringBuilder or StringBuffer becomes large enough to exceed the underlying data storage capacity, a new char array of a larger size, 2x larger in the OpenJDK StringBuilder and StringBuffer implementation (used by Java HotSpot Java 6 JDK/JRE), is allocated, the old char array elements are copied into the new char array, and the old char array is discarded. A version of the implementation used by StringBuilder and StringBuffer follows:

char[] value;
int count;

public AbstractStringBuilder append(String str) {
  if (str == null) str = "null";
    int len = str.length();
  if (len == 0) return this;
  int newCount = count + len;
  if (newCount > value.length)
      expandCapacity(newCount);
  str.getChars(0, len, value, count);
  count = newCount;
  return this;
}

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2;
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
      newCapacity = minimumCapacity;
  }
    value = Arrays.copyOf(value, newCapacity);
}

Continuing with the fictitious tax payer program example from the previous section (full listing of the source code used in this section can be found in Appendix B in the section "First Resizing Variant"), StringBuilder objects are used to assemble random Strings representing tax payer names, addresses, cities, states, social security numbers, and a tax payer id. It also uses the no argument StringBuilder constructor. Hence, the program is likely to be subject to StringBuilder's underlying char[] being resized. A capture of a memory or heap profile with a profiler such as NetBeans Profiler confirms that is the case. Figure 6-18 shows a heap profile from NetBeans Profiler.

Figure 6-18

Figure 6-18 Heap profile

In Figure 6-18, you can see that char[], StringBuilder, and String are the most highly allocated objects and also have the largest amount of live objects. In the NetBeans Profiler, selecting and right-clicking on the char[] class name in the far left column as shown in Figure 6-19 shows the allocation stack traces for all char[] objects.

Figure 6-19

Figure 6-19 Showing allocation stack traces

In the char[] stack allocation traces, shown in Figure 6-20, you can see an entry for java.lang.AbstractStringBuilder.expandCapacity(int), which is called from AbstractStringBuilder.append(char) and AbstractStringBuilder.append(String) methods. The expandCapacity(int) method calls java.util.Arrays.copyOf(char[], int). Looking back at the previous source code listing, you can see where AbstractStringBuilder.append(String str) calls expandCapacity(int) and calls Arrays.copyOf(char[] int).

Figure 6-20

Figure 6-20 char[] allocations from expanding StringBuilders

You can also see from Figure 6-20, over 11% of the current live char[] objects are from resized StringBuilder char[]. In addition, there are a total of 2,926,048 char[] objects that have been allocated, and of those, 390,988 char[] allocations occurred as a result of StringBuilder char[] resizing. In other words, about 13% (390,988/2,926,048) of all char[] allocations are coming from resized StringBuilder char[]s. Eliminating these char[] allocations from resizing improves the performance of this program by saving the CPU instructions needed to perform the new char[] allocation, copying the characters from the old char[] into the new char[], and the CPU instructions required to garbage collect the old discarded char[].

In the Java HotSpot JDK/JRE distributions, both the StringBuilder and StringBuffer offer no argument constructors that use a default size of 16 for their underlying char array data storage. These no argument constructors are being used in this program. This can be seen in the profile by expanding the java.lang.AbstractStringBuilder.<init>(int) entry seen in Figure 6-20. The expansion of the java.lang.AbstractStringBuilder.<init>(int) entry, shown in Figure 6-21, shows it is called by a no argument StringBuilder constructor.

Figure 6-21

Figure 6-21 Uses of StringBuilder default constructor

In practice, few StringBuilder or StringBuffer object instances result in having consumed 16 or fewer char array elements; 16 is the default size used with the no argument StringBuilder or StringBuffer constructor. To avoid StringBuilder and StringBuffer resizing, use the explicit size StringBuilder or StringBuffer constructor.

A modification to the example program follows, which now uses explicit sizes for constructing StringBuilder objects. A full listing of the modified version can be found in Appendix B in the section "Second Resizing Variant."

public static String getRandomTaxPayerId()      {
    StringBuilder sb = new StringBuilder(20);
    for (int i = 0; i < 20; i++) {
        int index =
            threadLocalRandom.get().nextInt(alphabet.length);
        sb.append(alphabet[index]);
    }
    return sb.toString();
}

public static String getRandomAddress()  {
StringBuilder sb = new StringBuilder(24);
    int size = threadLocalRandom.get().nextInt(14) + 10;
    for (int i = 0; i < size; i++) {
        if (i < 5) {
            int x = threadLocalRandom.get().nextInt(8);
            sb.append(x + 1);
        }
        int index =
             threadLocalRandom.get().nextInt(alphabet.length);
        char c = alphabet[index];
        if (i == 5) {
            c = Character.toUpperCase(c);
        }
        sb.append(c);
    }
    return sb.toString();
}

Recent optimizations in Java 6 update releases of the Java HotSpot VM analyze the usage of StringBuilder and StringBuffer and attempt to determine the optimal char array size to use for a given StringBuilder or StringBuffer object allocation as means to reduce unnecessary char[] object allocations resulting from StringBuilder or StringBuffer expansion.

Measuring the performance impact after addressing StringBuilder and StringBuffer resizing will be done in combination with addressing any Java Collection classes resizing, the topic of the next section.

Java Collections Resizing

The addition of the Java Collections to Java SE offered an enormous boost to developer productivity by providing containers with interfaces allowing the ability to easily switch between alternative concrete implementations. For example, the List interface offers an ArrayList and LinkedList concrete implementation.

Some of the Collections' concrete implementations are subject to potential expensive resizing as the number of elements added to the Collection grows such as ArrayList, Vector, HashMap, and ConcurrentHashMap since their underlying data store is an array. Other Collections such as LinkedList or TreeMap often use one or more object references between the elements stored to chain together the elements managed by the Collection. The former of these, those that use an array for the Collection's underlying data store, can be subject to performance issues when the underlying data store is resized due to the Collection growing in the number of elements it holds. Although these Collections classes have constructors that take an optional size argument, these constructors are often not used, or the size provided in an application program is not optimal for the Collection's use.

As is the case with StringBuilder or StringBuffer, resizing of a Java Collections class that uses an array as its data storage requires additional CPU cycles to allocate a new array, copy the old elements from the old array, and at some point in the future garbage collect the old array. In addition, the resizing can also impact Collection's field access time, the time it takes to dereference a field, because a new underlying data store, again typically an array, for the Collection's underlying data store may be allocated in a location in the JVM heap away from the object references stored within the data store and the other fields of the Collection. After a Collection resize occurs, it is possible an access to its resized field can result in CPU cache misses due to the way a modern JVM allocates objects in memory, in particular how those objects are laid out in memory. The way objects and their fields are laid out in memory can vary between JVM implementations. Generally, however, since an object and its fields tend to be referenced frequently together, an object and its fields laid out in memory within close proximity generally reduce CPU cache misses. Hence, the impact of Collections resizing (this also applies to StringBuffer and StringBuilder resizing) may extend beyond the additional CPU instructions spent to do the resizing and the additional overhead put on the JVM's memory manager to having a lingering higher field access time due to a change in the layout of the Collection's fields in memory relative the Collection object instance.

The approach to identifying Java Collections resizing is similar to what was described earlier for identifying StringBuilder and StringBuffer resizing, collecting heap or memory profile with a profiler such as NetBeans Profiler. Looking at the source code for the Java Collection classes helps identify the method names that perform the resizing.

Continuing with the fictitious tax payer program, the program variant in which tax payer records were populated into multiple HashMaps using a tax payer's state of residence as a key into a second HashMap where a tax payer's id is used as an index is a good example of where Collections resizing can occur. A full source code listing from this variant can be found in Appendix B in the section "First Resizing Variant." The source code, found in TaxPayerBailoutDbImpl.java, that allocates the HashMaps follows:

private final Map<String, Map<String,TaxPayerRecord>> db;

public TaxPayerBailoutDbImpl(int numberOfStates)  {
    db = new HashMap<String,Map<String,TaxPayerRecord>>();
    for (int i = 0; i < numberOfStates; i++)  {
        Map<String,TaxPayerRecord> map =
                Collections.synchronizedMap(
                    new HashMap<String,TaxPayerRecord>());
        db.put(BailoutMain.states[i], map);
    }
}

Here you can see the HashMaps are using a HashMap constructor that takes no arguments. As a result, the HashMap relies on a default size for its underlying mapping array. The following is a portion of OpenJDK's HashMap.java source code that shows the default size chosen for a HashMap's underlying data storage.

static final int DEFAULT_INITIAL_CAPACITY = 16;
static final float DEFAULT_LOAD_FACTOR = 0.75f;

    public HashMap() {
        this.loadFactor = DEFAULT_LOAD_FACTOR;
        threshold =
               (int)(DEFAULT_INITIAL_CAPACITY * DEFAULT_LOAD_FACTOR);
        table = new Entry[DEFAULT_INITIAL_CAPACITY];
        init();
    }
    void init() {
    }

Two factors decide when the data storage for a HashMap is resized: the capacity of the data storage and the load factor. The capacity is the size of the underlying data storage. That's the HashMap.Entry[]'s size. And the load factor is a measure of how full the HashMap is allowed to reach before the HashMap's data storage, the Entry[], is resized. A HashMap resize results in a new Entry[] being allocated, twice as large as the previous Entry[], the entries in the Entry[] are rehashed and put in the Entry[]. The CPU instructions required to resize a HashMap are greater than what is required by StringBuilder or StringBuffer resizing due to the rehashing of the Entry[] elements.

In Figure 6-18, you can see a row for java.util.HashMap$Entry[]. For this entry you can see there are 67 allocated objects, and 37 of them are live at the time of the profile snapshot. This suggests that 37/67, about 55%, are still live. That also suggests 45% of those Entry[] objects that had been allocated have been garbage collected. In other words, the HashMaps are experiencing resizing. Notice that the total bytes consumed by HashMap.Entry[] objects is much less than those consumed by char[] objects. This suggests the impact of eliding the HashMap resizing is likely to be less than the impact realized from eliding the StringBuilder resizing.

Figure 6-22 shows the allocation stack traces for HashMap.Entry[]. Here you can see some of those HashMap.Entry[] allocations result from a HashMap.resize(int) method call. In addition, you can see the no argument HashMap constructor is being used, which also allocates a HashMap.Entry[].

Figure 6-22

Figure 6-22 HashMap.Entry[] allocation stack traces

Since this example program populates 50 different HashMaps with a total of 2,000,000 fictitious records, each of those 50 HashMaps hold about 2,000,000 / 50 = 40,000 records. Obviously, 40,000 is much greater than the default size of 16 used by the no argument HashMap constructor. Using the default load factor of .75, and the fact that each of the 50 HashMap holds 40,000 records, you can determine a size for the HashMaps so they will not resize (40,000 / .75 = ~ 53,334). Or simply passing the total number of records to store divided by the number of states, divided by the default load factor, i.e., (2,000,000 / 50) / .75, to the HashMap constructor that holds the records. Following is the modified source code for TaxPayerBailoutDbImpl.java that elides HashMap resizing:

private final Map<String, Map<String,TaxPayerRecord>> db;
private final int dbSize = 2000000;

public TaxPayerBailoutDbImpl(int dbSize, int numberOfStates) {
    final int outerMapSize = (int) Math.ceil(numberOfStates / .75);
    final int innerMapSize =
             (int) (Math.ceil((dbSize / numberOfStates) / .75));
    db =
       new HashMap<String,Map<String,TaxPayerRecord>>(outerMapSize);
    for (int i = 0; i < numberOfStates; i++) {
        Map<String,TaxPayerRecord> map =
             Collections.synchronizedMap(
                 new HashMap<String,TaxPayerRecord>(innerMapSize));
        db.put(BailoutMain.states[i], map);
    }
}

In this example program, both StringBuilder and HashMap resizing occur during the initialization phase of the program, the phase of the program that populates a Map of Maps with fictitious, randomly generated tax payer records. Hence, to measure the performance impact of eliding the StringBuilder and HashMap resizing, the initialization phase of this program has been instrumented with a time stamp at the beginning of the program and after the Map of Maps has been populated. A modified version of this example program, one that uses the no argument HashMap constructor, calculates and reports the time it takes to populate the HashMaps with 2,000,000 records, can be found in Appendix B in the section "First Resizing Variant."

When this variant of the program is run on a Sun SPARC Enterprise T5120 Server configured with 64 virtual processors (the same value as that returned by the Java API Runtime.availableProcessors()), the amount of time it takes to complete the initialization phase is 48.286 seconds.

Updating this program variant with the changes described in this section to address both StringBuilder and HashMap resizing and running on the same UltraSPARC T5120 system with the same JVM command line options reports it takes 46.019 seconds to complete its initialization phase. That's about a 5% improvement in elapsed time. The source code for this variant can be found in Appendix B in the section "Second Resizing Variant."

Applying the data resizing strategy reduces the application's path length, the total number of CPU instructions required to execute the program, and potentially more efficient use of CPU cycles due to fewer possibilities of CPU cache misses as a result of frequently accessed data structure fields being laid out in memory next to each other.

You may have noticed that the initialization phase in this program is single threaded. But the system it is being executed on has a CPU that is multicore and multithreaded per core. The Sun SPARC Enterprise T5120 Server this program is executing on has 8 cores, and 8 hardware threads per core. It is a chip multithreading type of CPU chip, CMT for short. In other words, 8 cores and 8 hardware threads per core means it has 64 virtual processors. That also means the Java API, System.availableProcessors(), returns a value of 64. A next step to improve the performance of the initialization phase of this program is to refactor it to utilize all of those 64 virtual processors. This is the topic of the next section.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020