Home > Articles > Programming > C/C++

C++ Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Unrestricted Unions, Part I

Last updated Jan 1, 2003.

Unions seem at first as stale relic from the K&R C days. However, if you examine state-of-the-art libraries and frameworks you will notice that unions play an important role in their design. This widespread use of unions has C++ standard committee member to propose the removal of most of the C++98 restrictions on unions. In the following sections I will explain what makes unions so popular among C++ code writers today and which C++98 restrictions on unions are about to be removed in C++0x.

Union Usage

Historically, unions were added to C as a memory saving mechanism. Today, saving memory is still important but the main impetus for using unions in C++ libraries and frameworks is attributed to two properties of unions that have nothing to do with saving memory:

  • Automatic type casting.
  • Enforcing stricter alignment of data objects.

Let's look see some examples more closely.

Automatic Type Casting

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at a certain point of time. Library designers take advantage of this by packing multiple objects of different types into a single union. Such a union is called a variant. Variants are used for creating a kind of an Any type -- a heterogeneous object. Here's one such example:

//libgcj-2.95.1/libjava/include/jni.h
typedef union jvalue
{
 jboolean z;
 jbyte  b;
 jchar  c;
 jshort s;
 jint  i;
 jlong  j;
 jfloat f;
 jdouble d;
 jobject l;
} jvalue;

By using a type-field or some other means of encoding the actual type of the union's sole active member, programs can store each time a different type of objects in jvalue. This compaction reduces the number of overloaded functions in a library and facilitates the implementation of heterogeneous arrays and heterogeneous containers.

Enforcing Stricter Memory Alignment

Each fundamental type has a natural alignment value. For example, char and char arrays have an alignment value of one byte, whereas double usually has an eight byte alignment value in implementations that support double precision. In standard C and C++, a union is automatically aligned according to the member with the strictest (highest) alignment requirement. Consider:

union U 
{
 char c; // 1 byte aliged
 double d; //8 bytes
} ;

All U instances are allocated on a memory address that is properly aligned for type double, which is the type with the strictest alignment value in the union U. Implementers often take advantage of this guarantee and pack types with different alignment values into a union, thus enforcing a stricter alignment for a union member whose default alignment value is much lower. Let's look at a concrete example:

union VariantType 
{
 char text [1000];
 long long align_;
};

A VariantType has the same alignment value of long long (eight-bytes). As a result, the array text is always 8-byte aligned even though char arrays have an alignment value of one byte by default. Obviously, with the addition of alignment support to C++0x, this hack will become less needed. In C++0x you can declare a char array with an extended alignment like this:

alignas (8) char text [1000];

However, the use of unions for overriding the default alignment of a union's member will still remain in use, especially in legacy code or code that is used in a multilingual project.

Current Restrictions on Unions

The C++98 and C++03 standards impose draconian restrictions on unions that limit their use, and in some cases even compromise code safety. Such restrictions include:

  • A union shall not have static or reference members
  • A union shall not have a non-POD types as members

The first restriction seems strange, considering that it has no real reason. The recent proposal to remove many of the restrictions on unions addresses this: "...there appears to be no reason for this restriction, and static members are as useful for unions as for other class types." However, the restriction on reference members still prevails. According to the new proposal, unions shall not contain reference members. The reason for this restriction isn't explained but it seems obvious. If a reference member were allowed, you could easily create dangling references simply by not initializing a union that contains a reference as a data member, or initializing that union with a value of another member.

The more interesting restriction applies to the PODness of union members. In C++0x, unions cannot contain data members that have at least one non-trivial special member function. The following unions are all ill-formed:

struct Point {
 Point(int x=0, int y=0) : x_(x), y_(y) {}
 int x, y;
};
union T {
 Point p; //error, non-POD type
 int twoints[2];
};

The problem is that Point has a non-trivial constructor. Similarly, you cannot create unions with complex<T> members:

union W {
 double d;
 complex<double> cd; //error, complex isn't a POD type
};

Paradoxically, unions containing complex numbers are legal in C!

Removing the PODness Requirement

According to the new proposal, C++0x unions may have data members with non-trivial member functions. For example, the following union is valid in C++0x:

union StringVariant {
 char *p
 std::string s;
};

Obviously, the presence of a data member with a non-trivial initializer raises some questions. How does the programmer guarantee that s isn't accessed before it has been initialized? Similarly, how does one guarantee that s is destroyed only if it has been accessed, and at the right time?

In the next part I will show how C++0x's unrestricted unions cope with members that have non-trivial member functions and how the rules of anonymous unions are also relaxed in the new C++ standard.