Home > Articles > Programming > C/C++

C++ Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Choosing the Right Container II

Last updated Jan 1, 2003.

Last week, I outlined the major features of STL's sequence containers. Today I continue this discussion and explain the effects of reallocation and its consequences with regards to every sequence container type.

Container Reallocation

All containers manage their storage automatically. If a container has to accommodate more elements than its current capacity , it allocates more storage as needed. Likewise, certain operations, say calling vector::erase() or list::merge(), cause a container to shrink.

As much as you might like to consider this automatic storage management an "implementation detail" that shouldn't concern users, it does raise a problem: what happens to an iterator that is pointing to an element when the container reallocates? In certain cases, such an iterator becomes invalid, as it is now pointing to a piece of memory that is no longer owned by the container or it is pointing to a piece of memory that contains a different object.

Let's see how this can happen. Suppose you have a vector that has 256 elements:

vector <int> vi(256); //vi contains 256 int's

At this point, you define an iterator that is pointing to the first element like this:

for (vector<int>::iterator it= vi.begin();...) 

Then, you call push_back() to insert a new element:

vi.push_back(99);

From this moment, there's no guarantee that it is valid because the push_back() call may have triggered reallocation. Of course, not every push_back() call does that, but you can't tell with certainty whether reallocation has taken place without poking into the intimate implementation details of your STL distribution.

To explain the effects of reallocation, I'll use an analogy. Shrimp have a hard shell that doesn't grow. When a shrimp's body grows, it has to shed its shell. It has to grow another, larger shell that fits its new size, discarding the old shell. Vectors use a similar technique when they grow: they copy their contents into a new memory block large enough to store their elements, discarding the previous memory block on which the elements were stored originally.

A reallocation process is actually more complicated than simply switching to a new memory block: because vectors store objects, not raw data, they can't just copy raw memory from one location to another. Therefore, a complete reallocation cycle consists of the following steps:

  • The vector allocates a new chunk of contiguous memory that is large enough for the current number of elements plus some room for extra elements. Thus, if a vector has 100 elements, it may allocate room for 128 or 256 elements, for example.

  • The vector copy-constructs the current elements into the new storage (this is why STL elements must be copy-constructible).

  • Once the copying operation has been completed successfully, the vector destroys the old elements by invoking their destructors (again, this is why destructors of STL elements must be accessible).

  • The memory on which the destroyed elements reside is released, very much like a shrimp's old shell.

These four stages make a single atomic operation. Therefore, a multithreaded application must lock the entire sequence of reallocation stages so that other threads don't access a vector's elements before the reallocation has been completed.

Here's the problem: because elements may have been copied to a new memory address, iterators that were assigned before the reallocation might become dangling iterators. They are still pointing to the old memory block. Using such invalid iterators would cause undefined behavior. To avoid this, the Standard specifies for every container type which operations invalidate iterators. The actual effect of each operation depends on the container type. For example, calling push_back() on a vector might causes reallocation and consequently -- iterator invalidation, whereas calling push_back() for a list object doesn't.

One way to avoid reallocation with vectors is to ensure that they have sufficient capacity in advance. This can be done by specifying the desired capacity by calling reserve():

vi.reserve(1000); //make room for at least 1000 elements

If vi never exceeds this capacity, calling push_back() won't invalidate iterators. However, notice that mid-position insertion of elements could cause old iterator to point to the wrong elements, even if the vector hasn't reallocated. We can conclude that with respect to vectors, any operation that inserts new elements might invalidate iterators. By contrast, vector's const member functions don't invalidate iterators.

Capacity vs. Size

The distinction between a container's capacity and its size is crucial: capacity refers to the actual storage that the container owns, which is equal to or larger than its size. A container's size is the actual number of elements it's currently holding. These values are always expressed as the number of elements, not bytes. Reallocation takes place only if the container's size exceeds its capacity. To see the difference between the two, let's look at a concrete example:

vector<int> vi; //size==0
cout<<"size: "<<vi.size()<<" capacity: "<<vi.capacity()<<endl;

My STL implementation produces the following output:

Figure 10Figure 10

Now let's change this program a bit:

vector<int> vi; //size==0
vi.push_back(1); //size==1
cout<<"size: "<<vi.size()<<" capacity: "<<vi.capacity()<<endl;

This time, the output is:

Figure 11Figure 11

We can conclude that this particular implementation uses a "lazy allocation" strategy; it allocates storage only when necessary. In addition, the initial minimal allocation block consists of 256 elements. This means that you can insert 255 more elements to vi without causing reallocation.

Notice, however, that allocation policies are implementation-dependent. Portable code should never rely on specific allocation policies, because another implementation might work differently.

Invalidation of list Iterators

Lists, unlike vectors, don't use contiguous memory. Therefore, inserting new elements at any position doesn't rigger reallocation. In other words, calling push_back(), push_front(), and insert() don't invalidate list iterators. However, there are operations that invalidate list iterators. The splice() operation, which is unique to list, splices a sequence into an existing list. For example, you can splice one list into another like this:

list<int> li, li2;
//populate the lists
li.splice(li.begin(), li2);

This causes the elements in li2 to be spliced before li.begin(). While iterators to li aren't affected by this operation, li2 becomes empty after the operation. Consequently, all references and iterators to its elements become invalid. The pop_front(), pop_back(), clear(), and erase() operations invalidate iterators and references to the erased elements.

Invalidation of deque Iterators

Calling insert() in the middle of the deque invalidates all the iterators and references to its elements. Calling insert() at either end of the deque invalidates all its iterators but has no effect on the validity of references to its elements. The last sentence means that if you have a reference or a pointer to a certain element, say:

deque<int> di;
int p* = &di[5];
deque<int>::iterator it=di.begin()+5;
di.push_front(8);

After the push_front() call, the iterator it becomes invalid whereas the pointer p remains valid.

Similarly, calling erase() in the middle of the deque invalidates all the iterators and references to its elements. Calling erase() at either end of the deque invalidates only the iterators and the references to the erased elements.

Summary

Iterator invalidation is just as bad as plain old dangling pointers. Therefore, when using STL containers, it's important to know which operations affect the validity of its iterators and references to its elements. The most invalidation-prone container is vector because every insertion operation, regardless of its position, might trigger reallocation. However, if you have a rough estimate of the actual number of elements that a vector will use, you can preallocate sufficient storage by calling reseve().