Home > Articles > Programming > C/C++

C++ Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

State of the <tt>union</tt>

Last updated Jan 1, 2003.

Unions are one of the C relics that C++ has retained. On the one hand, they are an example of intrusive, highly implementation-dependent programming style that is the anathema of object-oriented programming. Yet, even in C++ programs, they can have certain useful applications as I will show you in the following passages.

What's in a union?

In the olden days, when memory was scarce and static type-checking was often overlooked ("We're serious programmers and we know what we are doing!"), programming languages such as FORTRAN, PL/1 and C offered a means of storing multiple objects on the same chunk of memory. Of course, one could only use a single object at a time, but this technique could save memory because the decision regarding which object was needed was often delayed to runtime, whereas the objects themselves needed to be declared at compile time.

Think, for example, of a database query that retrieves an employee's record. The record in question can be retrieved using various criteria: the employee's name, his or her ID, telephone number, and so on. Obviously, you don't need all these keys at once, but you can't decide at compile time which one the user will decide to use when querying the database. To solve this problem, it was customary to pack all the keys within a single data structure called a union:

union Key
{
 int ID;
 char * name; 
 char phone[8];
};

The size of a union is sufficient to contain the largest of its data members. In the case of Key, it's typically 8 bytes -- the size of phone. By contrast, a struct containing the same data members occupies the cumulative size of its members, i.e.,

sizeof(ID) + sizeof(name) + sizeof (phone)

which is 16 bytes on most 32-bit systems (with the possible addition of padding bytes). These savings might not impress you, but in those days, when a system's RAM consisted of a few kilobytes, every byte counted, especially when a program used arrays of unions. A typical program for accessing a database would determine the actual key at runtime using a type-encoding enumeration:

/*C style example of using type-coding enum + union*/
enum KeyType
{
 by_id,
 by_name,
 by_phone
};

Accessing a union's member is similar to accessing a member of a struct or a class. The crucial difference is that while objects and structs store each data member on a distinct memory address, all members of a union are stored on the same address. Therefore, the programmer must be careful to access the correct member:

Employee * retrieve(union Key * thekey, enum KeyType type)
{
 switch (type)
 {
 case of by_id:
 access_by_id(thekey.id);
 break;
 case of by_name:
 access_by_name(thekey.name);
 break;
//..
 }
}

This programming style has gone out favor with the advent of object-oriented programming. Not only does it rely heavily on implementation details, it's also error-prone. If the user accesses the wrong data member of the union, the results will be meaningless, just like accessing a random piece of memory. Yet this dangerous characteristic was also an advantage in some systems that didn't support typecasting.

union-based Typecasting

In C++, operator reinterpret_cast performs low-level typecasting between pointers and references that preserves the original binary layout of the source data. For example, in order to examine the bytes of an int, you could do something like this:

int num=2000;
unsigned char * p = reinterpret_cast<unsigned char *> (&num);
for (int i=0; i<sizeof (num); i++)
//display the decimal value of every byte of num
 cout<<"byte "<<i<<": " << (int) p[i] <<endl; 

Before the days of reinterpret_cast, programmers would use a union to achieve the same effect (union initialization rules are explained in this article):

union Cast
{
int n;
char str[sizeof (n) ];
};
Cast c=2000;
for (int i=0; i<sizeof (int); i++)
 printf("%d\n", c.str[i]);

Anonymous unions

C++ introduced a special union type called an anonymous union. Unlike an ordinary union, it doesn't has neither a tag name nor a named instance. As such, it's mostly used as a data member of a class. For example:

class Employee
{
private:
 union //anonymous
 {
 int key_ID;
 char * key_name; 
 char key_phone[8];
 };
 double salary;
 string name;
 int rank;
//...
public:
Employee();
};

The advantage of using an anonymous union is that you access its members directly, as if they were ordinary data members of the class:

Employee::Employee() : key_ID(0),//member of an anon. union
salary(0.0), rank(0) //ordinary data members
{}

Anonymous unions aren't confined to classes; you can declare an anonymous union in a file scope or a namespace scope. In these cases, however, it must be declared static and its members have internal linkage:

static union //declared globally, has internal linkage
{
 int x;
 void *y;
};
namespace NS
{
 static union //a namespace's scope, has internal linkage
 {
 int w;
 char s[4];
 };
}
int main()
{
x=0;
NS::s[0]='a';
}

A union Facelift

C++ introduced another enhancement, namely the ability to declare member functions in a union, including constructors, destructors etc. Note, however, that virtual member functions (including a virtual destructor) are not allowed:

union Key
{
private:
 int ID;
 char * name;
 char phone[8];
public:
//ctor, dtor, copy ctor and assignment op
 Key::Key();
 ~Key();
 Key(const Key & ref);
 Key& operator=(const Key & ref);
//ordinary member functions are also allowed
 int Assign(int n) 
 {
 ID=n;
 }
};

That said, a union shall not be a base class nor can it be derived from another class. Note also that only ordinary unions may contain member functions; anonymous unions can't have member functions of any kind, nor can they contain static, private and protected data members.

Summary

In high-level applications, unions have limited usage nowadays, if any. Yet it's important to know how to use them because they are still widely used in legacy code and in low-level APIs. The C++ creators attempted to upgrade unions into an object-oriented entity by adding the ability to declare member functions and private and protected data members in a union. An anonymous union is a special type of a union that has no tag name or instance name.