Home > Articles > Programming > C/C++

C++ Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

A Garbage Collector for C++

Last updated Jan 1, 2003.

After the approval of the ANSI C++ standard in 1997, standards committee members were all but certain that the next C++ revision round would incorporate hashed containers and a garbage collector (GC). With hashed containers, they were right. However, a standard GC seems more elusive than ever before. Here, I discuss some of the difficulties that incorporating this feature into C++ might introduce.

A Garbage Collector's Role

The so called "new programming languages" boast their built-in GC. Yet garbage collection (not unlike other "novel" features, say multithreading), is anything but new. The earliest commercially-available implementation known to me was Lisp (ca. 1958), although there must have been experimental models that predate it.

If garbage collection is almost 50 years old, one wonders why many programming languages have deliberately chosen not to support it. The answer, as usual, is that a GC is no silver bullet. Although it solves the onerous problem of memory management, its associated costs -- both in terms of performance and design complications -- outweigh its advantages, at least in some application domains. But there's a more crucial problem here: many users mistakenly assume that by plugging a GC into their programming language, all resource management problems disappear magically. This is certainly not the case. Let's see why.

The primary cause for using a GC is automating raw-memory management. As such, a built-in GC fixes memory leaks of the following types:

int *pi=new int;
int n;
pi=&n; //memory leak

It also enables you to use dynamically-allocated objects in novel ways:

struct S
{
 void func() {}
private:
 int x;
};
int main()
{
 new S->func(); //create a temp-like object on the free-store
}

Adding a GC only to enable such programming styles isn't worth the trouble. In the first example, the memory leak problem is easily averted by using an automatic or static int. In the second example, the use of a real (i.e., auto) temporary object avoids the memory leak problem.

Some of you may have noticed a potential trap in the latter example. To observe it more clearly, replace class S with a realistic example:

class File
{
public:
 File(const char * name);
 ~File() { close(pf) };
private
FILE * pf;
};

File *pfile=new File("mydata.dat");
pfile->Insert(record);
pfile=new File("salaries.dat");// #1
//..

The line marked #1 doesn't only cause a memory leak; it might also cause data corruption. The file mydata.dat is never closed after the insert() operation. Normally, File's destructor would be responsible for that. But, since there is no explicit delete statement before pfile is assigned a new value, that destructor never gets called! If this code were executed in a garbage collected environment, the memory leak problem wouldn't be an issue; the GC at some point would recycle the raw memory to which pfile pointed initially.

The data corruption problem is a different story, though. Even if the GC in question knows to invoke the destructors of unreferenced objects, you can't tell when this will actually happen. A GC may remain dormant for hours and days before it awakens. This means that the file might remain locked indefinitely! For this reasons, object-oriented languages with built-in GC require that the programmer shall release resources explicitly.

In other words, the GC is only responsible for reclaiming raw memory; it doesn't manage other types of resources. If you wanted to make this code GC-proof, you'd need to define a special member function and call it explicitly before disposing of that object:

File *pfile= new File("mydata.dat");
pfile->Insert(record);
pfile->dispose(); //release all locked resources
pfile=new File("salaries.dat");// #1
//..

"Wait a minute, what's the destructor's role then?" I'm glad you asked! In garbage-collected environments, you have to design your classes differently: resources that must be released as soon as possible -- are freed manually. Other cleanup operations that aren't time critical may be left in the destructor.

This idiom complicates your design more than it might seem at first. The first challenge is to decide which resources must be released deterministically (that is, by calling an explicit member function at a well-known time) and which ones needn't be released deterministically. Experience with GC-enabled languages shows that non-deterministic resource release functions (known as finalizers) are used rarely. If you examine a typical class library written in these languages, you will notice that it hardly contains finalizers. Instead, the classes contain plenty of release(), dispose(), and cleanup() member functions that are actually called in a finally() block.

GC and RAII

We can conclude that a GC-enabled programming language necessitates a different design of classes. Therefore, if C++ were to incorporate a built-in GC, its Standard Library would have to be rewritten from scratch; worse yet, C++ programmer would have to learn different resource management idioms. Forget about smart pointers such as auto_ptr, or RAII in general.

Is this the worst thing that can happen? Not quite. Many believe that the solution to the wholesale redesign problem is by making GC optional. I suspect that doing so would complicate matters even further.

Let's look at the File class example once more. When the class is used in a GC-enabled environment, you have to call dispose() explicitly and remove any resource release operations from the destructor. Now imagine that this class is part of a third party library that may be installed on both GC-enabled and GC-free environments. The poor vendor would have to support two different versions of this class -- one for each environment. A similar phenomenon is witnessed today with vendors supporting single-threaded and multithreaded versions of the same library (the burden of which must have been so overwhelming that Microsoft, for example, recently decided to get rid of its single-threaded APIs).

Conclusions

My impression is that a GC has lost its appeal with the C++ community in recent years. The use of Standard Library containers, smart pointers and the reliance on automatic and static storage types make a GC rather redundant in state-of-the-art C++ programs. More importantly, developers have realized that GC can manage only one resource type, namely raw-memory.

By contrast, the complications that a GC incurs in an object-oriented environment are rather overwhelming. Will C++0x get a standard GC? I very much doubt it.