- Overview
- Table of Contents
- Special Member Functions: Constructors, Destructors, and the Assignment Operator
- Operator Overloading
- Memory Management
- Templates
- Namespaces
- Time and Date Library
- Streams
- Object-Oriented Programming and Design Principles
- The Standard Template Library (STL) and Generic Programming
- Exception Handling
- Runtime Type Information (RTTI)
- Signal Processing
- Creating Persistent Objects
- Bit Fields
- New Cast Operators
- Environment Variables
- Variadic Functions
- Pointers to Functions
- Function Objects
- Pointers to Members
- Lock Files
- Design Patterns
- Dynamic Linking
- Tips and Techniques
- Using the Swap() Algorithm
- Using class stopwatch for Performance Measurements
- Extending <iostream> to Support User-Defined Types
- Using auto_ptr To Automate Memory Management
- Using auto_ptr To Automate Memory Management, Part II
- Using auto_ptr To Automate Memory Management, Part III
- Using enums as Mnemonic Indexes
- Create Objects on Pre-Allocated Memory Using Placement-new
- Online Books: Placement-new
- Bitwise Operators
- Bitwise Operators II
- Who's this?
- A Reference Guide
- The Virtues of Multiple Inheritance
- Interfaces
- Multiple Inheritance: Construction and Destruction Order
- nothrow new
- POD Initialization
- Object Initialization
- const Declarations
- The Semantics of volatile
- inline Functions
- Project Organization Guidelines
- All About bool
- typedef Declarations
- State of the union
- Dynamic Cast Uses
- Integrating C and C++
- const-Correctness
- const-Correctness: Advanced Issues
- Sprucing Up Legacy Code
- Virtual Constructors
- Naming Names
- Function Calls
- Speaking Standardese (updated)
- Speaking Standardese: the One Definition Rule
- Declarations and Definitions
- More on Declarations and Definitions
- The Most Vexing Parse
- Finally, At Last
- Sound Bytes (Admittedly Off Topic)
- Local Classes
- Complex Arithmetic
- Floating Point Woes
- String Manipulation
- The Object Model
- The Object Model II
- The Object Model III
- Temporary Objects
- Temporary Objects: Advanced Techniques
- Over-Engineering
- Security Enhancements
- Drop the (automatic) Pilot
- Choosing the Right Container
- Choosing the Right Container II
- Choosing the Right Container, Part III
- Arrays and Pointers
- Low-Level File I/O
- Low-Level File I/O Part II
- static Declarations, Part I
- static Declarations, Part II
staticInitialization Order- Revisiting the Deprecation of File-Scope Static
- Virtual Memory and Memory Mapping
- Cellular Phone Programming Guidelines
- The Handle/Body Idiom
- Whole Program Optimization, Part I
- Whole Program Optimization, Part II
- Manipulating Directories
- Window Dressing
friendDeclarationsfriendPart II: the Interaction of Friendship and Template Classes- Forcing Object Allocation on Specific Storage Types
- Lazy Evaluation
- Cache and Carry
- Controlling a Container’s Capacity
- Non-Blocking I/O, Part I
- Non-Blocking I/O, Part II
- Using Unions for Automatic Conversion
- Launching a Child Process
- switch Statements
- Introducing the "struct Hack"
- Scoped Enumerators
- Doing Statistics with STL
- Fixing the "Unresolved External" Linkage Error
- Understanding Calling Conventions
- Understanding the Empty Base Optimization
- Implementing RPC with the door Library, Part 1
- Implementing RPC with the door Library, Part 2
- Eliminating Two Common Pointer and sizeof Bugs
- Command Line Arguments
- Performance Myths Busting
- Tag Names And Types Part I
- Tag Names And Types Part II
- The Infamous goto
- Trimming Strings
- Can Objects Live Forever? Part I
- Can Objects Live Forever? Part II
- Five Ways to Improve Your Functions
- Member Aggregate Initialization
- Five Futile Coding-Style Debates
- The Good Parasite Idiom: An Exercise in OOD
- The Good Parasite Idiom: An Exercise in OOD, Part II
- The Good Parasite Idiom: An Exercise in OOD, Part III
- Ten Techniques to Reduce the Size of Your Classes, Part I
- Ten Techniques to Reduce the Size of Your Classes, Part II: Inheritance Issues
- Ten Techniques to Reduce the Size of Your Classes, Part III
- Ten Techniques to Reduce the Size of Your Classes IV
- Taking the Address of an Object with an Overloaded Operator &
- strcpy() -- How and Why Does It "Just Work"?
- Anonymous Structs
- Five Easy Ways to Reduce The Size of your Executables
- Standard Layout Classes and Trivially Copyable Types, Part I
- Standard Layout Classes and Trivially Copyable Types, Part II
- Five Simple Code Sanity Checks
- Five Things You Need to Know About C++11 Unions
- A Tour of C99
- A Tour of C1X
- C++0X: The New Face of Standard C++
- C++0x Concurrency
- The Reflecting Circle
- We Have Mail
- The Soapbox
- Numeric Types and Arithmetic
- Careers
- Locales and Internationalization
Choosing the Right Container II
Last updated Oct 8, 2004.
Last week, I outlined the major features of STL's sequence containers. Today I continue this discussion and explain the effects of reallocation and its consequences with regards to every sequence container type.
Container Reallocation
All containers manage their storage automatically. If a container has to accommodate more elements than its current capacity , it allocates more storage as needed. Likewise, certain operations, say calling vector::erase() or list::merge(), cause a container to shrink.
As much as you might like to consider this automatic storage management an "implementation detail" that shouldn't concern users, it does raise a problem: what happens to an iterator that is pointing to an element when the container reallocates? In certain cases, such an iterator becomes invalid, as it is now pointing to a piece of memory that is no longer owned by the container or it is pointing to a piece of memory that contains a different object.
Let's see how this can happen. Suppose you have a vector that has 256 elements:
vector <int> vi(256); //vi contains 256 int's
At this point, you define an iterator that is pointing to the first element like this:
for (vector<int>::iterator it= vi.begin();...)
Then, you call push_back() to insert a new element:
vi.push_back(99);
From this moment, there's no guarantee that it is valid because the push_back() call may have triggered reallocation. Of course, not every push_back() call does that, but you can't tell with certainty whether reallocation has taken place without poking into the intimate implementation details of your STL distribution.
To explain the effects of reallocation, I'll use an analogy. Shrimp have a hard shell that doesn't grow. When a shrimp's body grows, it has to shed its shell. It has to grow another, larger shell that fits its new size, discarding the old shell. Vectors use a similar technique when they grow: they copy their contents into a new memory block large enough to store their elements, discarding the previous memory block on which the elements were stored originally.
A reallocation process is actually more complicated than simply switching to a new memory block: because vectors store objects, not raw data, they can't just copy raw memory from one location to another. Therefore, a complete reallocation cycle consists of the following steps:
The vector allocates a new chunk of contiguous memory that is large enough for the current number of elements plus some room for extra elements. Thus, if a vector has 100 elements, it may allocate room for 128 or 256 elements, for example.
The vector copy-constructs the current elements into the new storage (this is why STL elements must be copy-constructible).
Once the copying operation has been completed successfully, the vector destroys the old elements by invoking their destructors (again, this is why destructors of STL elements must be accessible).
The memory on which the destroyed elements reside is released, very much like a shrimp's old shell.
These four stages make a single atomic operation. Therefore, a multithreaded application must lock the entire sequence of reallocation stages so that other threads don't access a vector's elements before the reallocation has been completed.
Here's the problem: because elements may have been copied to a new memory address, iterators that were assigned before the reallocation might become dangling iterators. They are still pointing to the old memory block. Using such invalid iterators would cause undefined behavior. To avoid this, the Standard specifies for every container type which operations invalidate iterators. The actual effect of each operation depends on the container type. For example, calling push_back() on a vector might causes reallocation and consequently -- iterator invalidation, whereas calling push_back() for a list object doesn't.
One way to avoid reallocation with vectors is to ensure that they have sufficient capacity in advance. This can be done by specifying the desired capacity by calling reserve():
vi.reserve(1000); //make room for at least 1000 elements
If vi never exceeds this capacity, calling push_back() won't invalidate iterators. However, notice that mid-position insertion of elements could cause old iterator to point to the wrong elements, even if the vector hasn't reallocated. We can conclude that with respect to vectors, any operation that inserts new elements might invalidate iterators. By contrast, vector's const member functions don't invalidate iterators.
Capacity vs. Size
The distinction between a container's capacity and its size is crucial: capacity refers to the actual storage that the container owns, which is equal to or larger than its size. A container's size is the actual number of elements it's currently holding. These values are always expressed as the number of elements, not bytes. Reallocation takes place only if the container's size exceeds its capacity. To see the difference between the two, let's look at a concrete example:
vector<int> vi; //size==0 cout<<"size: "<<vi.size()<<" capacity: "<<vi.capacity()<<endl;
My STL implementation produces the following output:
Now let's change this program a bit:
vector<int> vi; //size==0 vi.push_back(1); //size==1 cout<<"size: "<<vi.size()<<" capacity: "<<vi.capacity()<<endl;
This time, the output is:
We can conclude that this particular implementation uses a "lazy allocation" strategy; it allocates storage only when necessary. In addition, the initial minimal allocation block consists of 256 elements. This means that you can insert 255 more elements to vi without causing reallocation.
Notice, however, that allocation policies are implementation-dependent. Portable code should never rely on specific allocation policies, because another implementation might work differently.
Invalidation of list Iterators
Lists, unlike vectors, don't use contiguous memory. Therefore, inserting new elements at any position doesn't rigger reallocation. In other words, calling push_back(), push_front(), and insert() don't invalidate list iterators. However, there are operations that invalidate list iterators. The splice() operation, which is unique to list, splices a sequence into an existing list. For example, you can splice one list into another like this:
list<int> li, li2; //populate the lists li.splice(li.begin(), li2);
This causes the elements in li2 to be spliced before li.begin(). While iterators to li aren't affected by this operation, li2 becomes empty after the operation. Consequently, all references and iterators to its elements become invalid. The pop_front(), pop_back(), clear(), and erase() operations invalidate iterators and references to the erased elements.
Invalidation of deque Iterators
Calling insert() in the middle of the deque invalidates all the iterators and references to its elements. Calling insert() at either end of the deque invalidates all its iterators but has no effect on the validity of references to its elements. The last sentence means that if you have a reference or a pointer to a certain element, say:
deque<int> di; int p* = &di[5]; deque<int>::iterator it=di.begin()+5; di.push_front(8);
After the push_front() call, the iterator it becomes invalid whereas the pointer p remains valid.
Similarly, calling erase() in the middle of the deque invalidates all the iterators and references to its elements. Calling erase() at either end of the deque invalidates only the iterators and the references to the erased elements.
Summary
Iterator invalidation is just as bad as plain old dangling pointers. Therefore, when using STL containers, it's important to know which operations affect the validity of its iterators and references to its elements. The most invalidation-prone container is vector because every insertion operation, regardless of its position, might trigger reallocation. However, if you have a rough estimate of the actual number of elements that a vector will use, you can preallocate sufficient storage by calling reseve().

Figure
10
Figure
11

