Home > Articles > Programming > C/C++

C++ Reference Guide

Hosted by

Bit Fields

Last updated Jan 1, 2003.

Networking, file management, cryptography, data compression and serial port communication are only a few of the application domains in which programmers need to manipulate bits directly. Although the smallest native datatype of C++ is char, which is equivalent to a byte, the language does enable you to manipulate bits directly without resorting to assembly programming or inefficient libraries, as I will show momentarily.

Performance Analysis

Consider a typical billing system of an international telecom company. This system monitors every telephone call and records it in a database. A typical billing record contains the customer's ID, a timestamp, codes that indicate the type of the call, e.g., local, long distance and so on, and the tariff. The database stores millions of new records every day. Furthermore, it must keep the customer's billing information for at least one year. Consequently, an entire database contains around one billion records at any given time. Backups increase the amount of stored data even further. There could be 5 billion records stored in the main database and backup copies at any given time. Obviously, a billing record should be as small as possible. A non-optimized definition of a billing record might look like this:

struct BillingRec
{
 long cust_id;
 long timestamp;
 enum CallType
 {
 toll_free,
 local,
 regional,
 long_distance,
 international,
 cellular
 } type;
 enum CallTariff
 {
 off_peak,
 medium_rate,
 peak_time
 } tariff;
};

BillingRec occupies no fewer than 16 bytes of memory on a typical 32-bit system. To detect how many bytes it occupies on your machine, use the following expression:

sizeof(BillingRecord);

The members cust_id and timestamp occupy four bytes each, as expected. However, the two enums occupy additional eight bytes when we can safely store both of them in a single byte. Remember: one redundant bit causes a waste of more than 0.5 GB of storage! Clearly, this data structure needs to be optimized in order to save space.

Bit Field Usage

A bit-field is a data member of a struct or a class which contains 1 or more bits. In the following example, we declare a struct that contains two bit fields f1 and f2, each of which occupies 4 bits:

struct A
{
 unsigned int f1: 4; // range: 0 - 15
 unsigned int f2: 4; 
};

A bit-field declaration begins with its underlying type followed by the field's name. The underlying type can be signed char, short, int, long, their unsigned counterparts (an unsigned bit field may not hold a negative value) and bool. The declaration ends with a colon followed by the number of bits.

Space Optimization

Our first optimization step is to check the order of BillingRec members. In some cases, it's possible to save space simply by changing the members' order. In this case, however, all members occupy the same size so we have to try another strategy, namely squeezing the enum values into two bit fields:

struct BillingRec // optimized version
{
 //...cust_id and timestamp remain as before
 enum CallType
 {
 toll_free,
 local,
 regional,
 long_distance,
 international,
 cellular
 };
 enum CallTariff
 {
 off_peak,
 medium_rate,
 peak_time
 };
 unsigned int call: 3; // range: 0-7
 unsigned int tariff: 2; // range: 0-3
};

The trick is that instead of creating instances of the enum types, we use two bit fields that have enough bits to store every valid enumerator. You treat bit fields as ordinary data members. For example, you can zero-initialize a struct that contains bit fields. The following program creates a BillingRec instance and fills its members with data:

#include <ctime>
int main()
{
 BillingRec rec;
 rec.cust_id = 1425;
 rec.timestamp = time();
 rec.call = BillingRec::cellular;
 rec.tariff = BillingRec::off_peak;
}

The introduction of bit fields has reduced the size of BillingRec to 12 bytes. Although there's still a waste of 3 bytes here due member alignment restrictions, we have shaved 4 off bytes, thereby shrinking the billing database to 67% of its original size. More aggressive optimizations, such as changing the first two members of BillingRec to unsigned short if possible, or replacing them with bit fields would save more even space.

Summary

Often, programmers are concerned about optimization unnecessarily. If our database consisted of a few hundred records, there would be no reason to lean over backward and squeeze members into bit fields. Remember that almost everything about bit fields is implementation dependent. For example, whether bits are stored left-to-right or right-to-left depends on the actual hardware architecture. Furthermore, each compiler uses a different member alignment model, which is why the size of the optimized BillingRec is 12 bytes rather than 9. You cannot take a bit field's address nor can you create an arrays of bits. Finally, on most implementations the use of bit fields incurs speed overhead. Therefore, when you optimize your code, measure the effect of a certain optimization and its tradeoffs before you decide to use it.