- Overview
-
Table of Contents
- Special Member Functions: Constructors, Destructors, and the Assignment Operator
- Operator Overloading
- Memory Management
- Templates
- Namespaces
- Time and Date Library
- Streams
- Object-Oriented Programming and Design Principles
- The Standard Template Library (STL) and Generic Programming
- Exception Handling
- Runtime Type Information (RTTI)
- Signal Processing
- Creating Persistent Objects
- Bit Fields
- New Cast Operators
- Environment Variables
- Variadic Functions
- Pointers to Functions
- Function Objects
- Pointers to Members
- Lock Files
- Design Patterns
- Dynamic Linking
-
Tips and Techniques
- Using the Swap() Algorithm
- Using class stopwatch for Performance Measurements
- Extending <tt><iostream></tt> to Support User-Defined Types
- Using <tt>auto_ptr</tt> To Automate Memory Management
- Using <tt>auto_ptr</tt> To Automate Memory Management, Part II
- Using <tt>auto_ptr</tt> To Automate Memory Management, Part III
- Using <tt>enum</tt>s as Mnemonic Indexes
- Create Objects on Pre-Allocated Memory Using <tt>Placement-new</tt>
- Online Books: <tt>Placement-new</tt>
- Bitwise Operators
- Bitwise Operators II
- Who's <tt>this</tt>?
- A Reference Guide
- The Virtues of Multiple Inheritance
- Interfaces
- Multiple Inheritance: Construction and Destruction Order
- nothrow new
- POD Initialization
- Object Initialization
- <tt>const</tt> Declarations
- The Semantics of <tt>volatile</tt>
- <tt>inline</tt> Functions
- Project Organization Guidelines
- All About <tt>bool</tt>
- <tt>typedef</tt> Declarations
- State of the <tt>union</tt>
- Dynamic Cast Uses
- Integrating C and C++
- <tt>const</tt>-Correctness
- <tt>const</tt>-Correctness: Advanced Issues
- Sprucing Up Legacy Code
- Virtual Constructors
- Naming Names
- Function Calls
- Speaking Standardese (updated)
- Speaking Standardese: the One Definition Rule
- Declarations and Definitions
- More on Declarations and Definitions
- The Most Vexing Parse
- Finally, At Last
- Sound Bytes (Admittedly Off Topic)
- Local Classes
- Complex Arithmetic
- Floating Point Woes
- String Manipulation
- The Object Model
- The Object Model II
- The Object Model III
- Temporary Objects
- Temporary Objects: Advanced Techniques
- Over-Engineering
- Security Enhancements
- Drop the (automatic) Pilot
- Choosing the Right Container
- Choosing the Right Container II
- Choosing the Right Container, Part III
- Arrays and Pointers
- Low-Level File I/O
- Low-Level File I/O Part II
- static Declarations, Part I
- static Declarations, Part II
- <code>static</code> Initialization Order
- Revisiting the Deprecation of File-Scope Static
- Virtual Memory and Memory Mapping
- Cellular Phone Programming Guidelines
- The Handle/Body Idiom
- Whole Program Optimization, Part I
- Whole Program Optimization, Part II
- Manipulating Directories
- Window Dressing
- <code>friend</code> Declarations
- <code>friend</code> Part II: the Interaction of Friendship and Template Classes
- Forcing Object Allocation on Specific Storage Types
- Lazy Evaluation
- Cache and Carry
- Controlling a Container’s Capacity
- Non-Blocking I/O, Part I
- Non-Blocking I/O, Part II
- Using Unions for Automatic Conversion
- Launching a Child Process
- <tt>switch</tt> Statements
- Introducing the "struct Hack"
- Scoped Enumerators
- Doing Statistics with STL
- Fixing the "Unresolved External" Linkage Error
- Understanding Calling Conventions
- Understanding the Empty Base Optimization
- Implementing RPC with the door Library, Part 1
- Implementing RPC with the door Library, Part 2
- Eliminating Two Common Pointer and <tt>sizeof</tt> Bugs
- Command Line Arguments
- Performance Myths Busting
- Tag Names And Types Part I
- Tag Names And Types Part II
- The Infamous goto
- Trimming Strings
- Can Objects Live Forever? Part I
- Can Objects Live Forever? Part II
- Five Ways to Improve Your Functions
- Member Aggregate Initialization
- Five Futile Coding-Style Debates
- The Good Parasite Idiom: An Exercise in OOD
- The Good Parasite Idiom: An Exercise in OOD, Part II
- The Good Parasite Idiom: An Exercise in OOD, Part III
- Ten Techniques to Reduce the Size of Your Classes, Part I
- Ten Techniques to Reduce the Size of Your Classes, Part II: Inheritance Issues
- Ten Techniques to Reduce the Size of Your Classes, Part III
- Ten Techniques to Reduce the Size of Your Classes IV
- Taking the Address of an Object with an Overloaded Operator <tt>&</tt>
- strcpy() -- How and Why Does It "Just Work"?
- Anonymous Structs
- Five Easy Ways to Reduce The Size of your Executables
- Standard Layout Classes and Trivially Copyable Types, Part I
- Standard Layout Classes and Trivially Copyable Types, Part II
- Five Simple Code Sanity Checks
- Five Things You Need to Know About C++11 Unions
- A Tour of C99
- A Tour of C1X
- C++0X: The New Face of Standard C++
- C++0x Concurrency
- The Reflecting Circle
- We Have Mail
- The Soapbox
- Numeric Types and Arithmetic
- Careers
- Locales and Internationalization
State of the <tt>union</tt>
Last updated Jan 1, 2003.
Unions are one of the C relics that C++ has retained. On the one hand, they are an example of intrusive, highly implementation-dependent programming style that is the anathema of object-oriented programming. Yet, even in C++ programs, they can have certain useful applications as I will show you in the following passages.
What's in a union?
In the olden days, when memory was scarce and static type-checking was often overlooked ("We're serious programmers and we know what we are doing!"), programming languages such as FORTRAN, PL/1 and C offered a means of storing multiple objects on the same chunk of memory. Of course, one could only use a single object at a time, but this technique could save memory because the decision regarding which object was needed was often delayed to runtime, whereas the objects themselves needed to be declared at compile time.
Think, for example, of a database query that retrieves an employee's record. The record in question can be retrieved using various criteria: the employee's name, his or her ID, telephone number, and so on. Obviously, you don't need all these keys at once, but you can't decide at compile time which one the user will decide to use when querying the database. To solve this problem, it was customary to pack all the keys within a single data structure called a union:
union Key
{
int ID;
char * name;
char phone[8];
};
The size of a union is sufficient to contain the largest of its data members. In the case of Key, it's typically 8 bytes -- the size of phone. By contrast, a struct containing the same data members occupies the cumulative size of its members, i.e.,
sizeof(ID) + sizeof(name) + sizeof (phone)
which is 16 bytes on most 32-bit systems (with the possible addition of padding bytes). These savings might not impress you, but in those days, when a system's RAM consisted of a few kilobytes, every byte counted, especially when a program used arrays of unions. A typical program for accessing a database would determine the actual key at runtime using a type-encoding enumeration:
/*C style example of using type-coding enum + union*/
enum KeyType
{
by_id,
by_name,
by_phone
};
Accessing a union's member is similar to accessing a member of a struct or a class. The crucial difference is that while objects and structs store each data member on a distinct memory address, all members of a union are stored on the same address. Therefore, the programmer must be careful to access the correct member:
Employee * retrieve(union Key * thekey, enum KeyType type)
{
switch (type)
{
case of by_id:
access_by_id(thekey.id);
break;
case of by_name:
access_by_name(thekey.name);
break;
//..
}
}
This programming style has gone out favor with the advent of object-oriented programming. Not only does it rely heavily on implementation details, it's also error-prone. If the user accesses the wrong data member of the union, the results will be meaningless, just like accessing a random piece of memory. Yet this dangerous characteristic was also an advantage in some systems that didn't support typecasting.
union-based Typecasting
In C++, operator reinterpret_cast performs low-level typecasting between pointers and references that preserves the original binary layout of the source data. For example, in order to examine the bytes of an int, you could do something like this:
int num=2000; unsigned char * p = reinterpret_cast<unsigned char *> (&num); for (int i=0; i<sizeof (num); i++) //display the decimal value of every byte of num cout<<"byte "<<i<<": " << (int) p[i] <<endl;
Before the days of reinterpret_cast, programmers would use a union to achieve the same effect (union initialization rules are explained in this article):
union Cast
{
int n;
char str[sizeof (n) ];
};
Cast c=2000;
for (int i=0; i<sizeof (int); i++)
printf("%d\n", c.str[i]);
Anonymous unions
C++ introduced a special union type called an anonymous union. Unlike an ordinary union, it doesn't has neither a tag name nor a named instance. As such, it's mostly used as a data member of a class. For example:
class Employee
{
private:
union //anonymous
{
int key_ID;
char * key_name;
char key_phone[8];
};
double salary;
string name;
int rank;
//...
public:
Employee();
};
The advantage of using an anonymous union is that you access its members directly, as if they were ordinary data members of the class:
Employee::Employee() : key_ID(0),//member of an anon. union
salary(0.0), rank(0) //ordinary data members
{}
Anonymous unions aren't confined to classes; you can declare an anonymous union in a file scope or a namespace scope. In these cases, however, it must be declared static and its members have internal linkage:
static union //declared globally, has internal linkage
{
int x;
void *y;
};
namespace NS
{
static union //a namespace's scope, has internal linkage
{
int w;
char s[4];
};
}
int main()
{
x=0;
NS::s[0]='a';
}
A union Facelift
C++ introduced another enhancement, namely the ability to declare member functions in a union, including constructors, destructors etc. Note, however, that virtual member functions (including a virtual destructor) are not allowed:
union Key
{
private:
int ID;
char * name;
char phone[8];
public:
//ctor, dtor, copy ctor and assignment op
Key::Key();
~Key();
Key(const Key & ref);
Key& operator=(const Key & ref);
//ordinary member functions are also allowed
int Assign(int n)
{
ID=n;
}
};
That said, a union shall not be a base class nor can it be derived from another class. Note also that only ordinary unions may contain member functions; anonymous unions can't have member functions of any kind, nor can they contain static, private and protected data members.
Summary
In high-level applications, unions have limited usage nowadays, if any. Yet it's important to know how to use them because they are still widely used in legacy code and in low-level APIs. The C++ creators attempted to upgrade unions into an object-oriented entity by adding the ability to declare member functions and private and protected data members in a union. An anonymous union is a special type of a union that has no tag name or instance name.
