Home > Articles > Programming > Windows Programming

Exploring the CLR

  • Print
  • + Share This
This chapter is from the book

Garbage Collection

Dynamically allocated memory is the bane of most programmers' existence. Although you need heap memory to write most non-trivial applications, managing this memory correctly is an error-prone nightmare. Only the most careful and diligent programmers get it right, and failure to do so results in applications that leak memory (because the programmer forgot to free memory that is no longer being used) or that crash sporadically because the programmers have deallocated memory that is still needed by their application. These two bugs are the two biggest time-wasters in software development for programmers and for testers who spend countless hours finding and documenting these bugs. They also take a huge toll on the productivity of end-users who waste countless hours dealing with the results of such bugs: brittle software that crashes or gradually eats up all the memory on a client until it or some other application crashes.

The basic idea of garbage collection is that programmers should be able to allocate as much memory as they see fit (within reason of course), and, when they are not using a block of memory, the system should simply reclaim it. In this model, there is no need for delete, dealloc, or release functions.

The managed heap in the CLR uses garbage collection to automatically free memory that is no longer being used. As a programmer using the .NET Framework, you don't have to do anything special to use the managed heap. You simply allocate instances of objects from the managed heap using the new operator in your language of choice. When a request for more memory cannot be satisfied with the available memory, the CLR's garbage collection algorithm will run (or you can explicitly run the garbage collection algorithm by calling the Collect method on the System.GC class). The garbage collector figures out which blocks of memory are no longer being used by your application, frees that memory, and compacts the used memory into a contiguous block. The rest of this section explains how the .NET Framework's garbage collection algorithm works.

The first premise that you must accept before you can understand garbage collection is that, in order for a block of memory to be used (now or in the future) by an application, that memory must be reachable through a pointer/reference, that is, at least one pointer/reference must point to it. If there are no pointers/references pointing to a block of memory, it can no longer be used by an application. COM took advantage of this fact to implement its life cycle management scheme. With COM, each object was responsible for maintaining a count of the references that currently point to it by implementing the AddRef and Release methods in the IUnknown interface. Consumers of a COM object use the AddRef method in IUknown to increment the reference count and the Release method to decrement the reference count. The object is supposed to delete itself when its reference count goes to zero. The problem with this approach is that it is manual. In order for COM reference counting to work correctly, component developers must implement the IUnknown interface correctly, and consumers of those components must use the interface correctly.

Garbage collection takes us error-prone developers out of the memory management process. The CLR determines if a block of memory can still be used or not by first assuming that all memory is garbage. It then starts at the roots of the application and builds a graph of all the objects that are reachable from the roots. The roots of an application include static and global pointers, local variables, method parameters on the stack, and even CPU registers. If an object is not part of this graph, it is unreachable from any reference/pointer within the application and is therefore garbage. The garbage collector then compacts all the nongarbage objects by shifting them down in memory using the memcpy function so that no gaps are in the heap.

This process is best illustrated by Figure 3–15. In the scenario illustrated by this picture, Object 1 is currently loaded in a CPU register and contains a pointer to Object 7. Objects 3 and 4 are pointed to by stack pointers (either parameters to a method or local variables to a method). Object 5 is referenced by a static object pointer. Object 5 also contains a pointer to Object 3. Objects 2 and 6 are not currently pointed to by any of the roots or any objects reachable from the roots, so they are garbage. The garbage collector will start at the roots and build a graph. Objects 2 and 6 will obviously not be in the graph because they are not referenced by the roots or any objects reachable from the roots. The garbage collector will then remove all the gaps in memory and position the next object pointer that contains the address of the next available block of memory after Object 7, as shown in Figure 3–16.

Figure 15Figure 3–15 The heap prior to garbage collection.


Figure 16Figure 3–16 After garbage collection.


For performance reasons, the garbage collector may elect not to compact memory if most of the objects survive the collection. The CLR also maintains, for performance reasons, a separate managed heap for large objects. Objects in this heap are garbage collected like the regular heap, but, to avoid copying large objects, the CLR does not compact this heap.

So far so good, but there's actually a lot more to the garbage collection algorithm than this simple explanation. First, many objects contain cleanup logic that must be run when the object is destroyed. For instance, if a business object holds a connection to a database, you may want the connection to be closed when the object is destroyed. Most object-oriented programming languages support the notion of a destructor, which is a method that gets called automatically when the object is destroyed. Cleanup logic, such as closing a database connection or freeing any other resource used by the object, is usually placed in this method. The cleanup method in the .NET Framework is called Finalize, and the process is called Finalization. Curiously, the C# language uses the same destructor syntax as C++, and it also refers to its cleanup method as a destructor. C# destructors also will automatically call the destructor of their base class. The Finalize method in other .NET programming languages does not do this. The destructor for a C# class is declared as follows:

public class Manager : Employee
{
  public Manager(int id,string name,decimal salary,
    decimal bonus) : base(id,name,salary)
  {
    this.mBonus=bonus;
  }
  public override decimal GetSalary()
  {
    return base.GetSalary()+mBonus;
  }
  ~Manager()
  {
    // This destructor will also call the destructor 
    // in its base class.
    MessageBox.Show(
      "Finalize method called in Manager.");
  }
  private decimal mBonus;
}

Therefore, in this case, where I have a Manager class that inherits from an Employee class, the destructor for the Employee class will be called immediately after the destructor for the Manager class.

You can use this new knowledge of MSIL and ildasm to see what is really going on behind the scenes when you create a destructor. Here is the (slightly simplified) MSIL code for the destructor in the Manager class:

void Finalize() 
{
 .try
 {
  IL_0000: ldstr  "Finalize method called in Manager."
  IL_0005: call  MessageBox::Show(string)
  IL_000a: pop
  IL_000b: leave.s  IL_0014
 } // end .try
 finally
 {
  IL_000d: ldarg.0
  IL_000e: call  gctest.Employee::Finalize()
  IL_0013: endfinally
 } // end handler
 IL_0014: ret
} // end of method Manager::Finalize

Notice that the method is actually called Finalize in the generated MSIL code. The logic in the Manager destructor displays a message box, and then it calls the Finalize in the Employee base class of the Manager class. In beta 1 and 2 of the .NET Framework, you had to override the Finalize method in the System.Object class to implement a cleanup method in C#. The destructor syntax and terminology is unique to the release version of C#. Visual Basic .NET still uses a Finalize method in the release version of the .NET Framework. The following code shows how you would implement a Finalize method in Visual Basic .NET:

Public Class Class1
  Sub New()
    MessageBox.Show("Constructor called")
  End Sub
  Protected Overrides Sub Finalize()
    MessageBox.Show("Destructor called")
  End Sub
End Class

NOTE

Even though Microsoft in the release version of .NET decided to use the term destructor with C#, I prefer a term that I saw in the .NET Framework SDK docs, finalize destructor, and this is the term that I will use throughout the rest of this chapter.

In order to implement finalize destructors, the CLR maintains a pair of queues called the Finalization and the Freachable queue. The Finalization queue contains a list of all nongarbage objects that have Finalize destructors. The Freachable queue contains a list of garbage objects that are waiting for a special runtime thread to execute their finalize destructors. When an object that has a Finalize destructor is instantiated, a pointer to that object is inserted into the Finalization queue; this indicates to the CLR that this object will require Finalization when it is destroyed. If an object that has a destructor is determined to be garbage when the garbage collector runs, the pointer to the object is removed from the Finalization queue and appended to the Freachable queue; this indicates that the object is no longer being used and is waiting for a special thread to run its finalize destructor. The CLR does not run the Finalize destructor immediately because poorly written Finalize destructors may take a long time to execute and cause the garbage collection process to take an unacceptably long period of time. An object that has a Finalize destructor will actually survive a garbage collection in a sort of zombie state, even if the garbage collector determined that the object is garbage. After the garbage collector runs, the object will be referenced by the Freachable queue which is considered to be a root. At some point, a special thread in the CLR will wake up and start calling the Finalize destructors on all of the objects in the Freachable queue. After this thread calls the Finalize destructor on an object, it will remove the reference to the object from the Freachable queue. Now the object is truly garbage, and, the next time the garbage collector runs, the memory occupied by the object will be reclaimed.

There are a few key points to glean from this explanation: (1) Finalize destructors are expensive. A class that has a Finalize destructor will actually require two garbage collections before its memory can be reclaimed. Therefore, think carefully before you add one to your classes. (2) You should not make any assumptions about the thread that your Finalize destructor will run on. It will run on a unique thread provided by the CLR, so you will need to avoid accessing thread-local resources in a Finalize destructor. (3) The actual time when a Finalize destructor will run is indeterminate. The CLR will not call the Finalize destructor on a class until (a) the garbage collector runs, (b) the object is determined to be garbage, and (c) the special thread assigned to executing Finalize destructors completes its work. The Finalize destructor may run any time from when the last reference to the object is removed to when the application shuts down. This is totally different than what most developers are used to. With languages like C++, the destructor is called for a stack object as soon as it goes out of scope; the destructor is called for a heap object when you use the "delete" operator on the object. Because you cannot know when a Finalize destructor will run, it is unwise to leave the cleanup or reclamation of scarce resources to a Finalize destructor. For instance, in most cases in the .NET Framework, it is a bad idea to close a database connection in a Finalize destructor. This is a common thing that people (myself included) did in C++. If your object uses scarce resources like database connections, you should instead put the logic to close the database connection in a Dispose or Close method. By convention, you should use a Close method if the object may be used again after the call to the Close method. You should use Dispose if the object will not be used again. There actually is an IDisposable interface in the System namespace that contains a Dispose method. You should implement this interface to provide your Dispose method. The recommended semantics for this method are as follows: A Dispose method should release all resources that the object on which it was called owns. It should also remove the object from the Finalization queue so its destructor will not get called. Therefore, if you had an Employee class that used a database connection, you would implement the IDisposable interface as follows:

public class Employee : IDisposable
{
  public Employee(int id,string name,decimal salary)
  {
   this.mName=name;
   this.mID=id;
   this.mSalary=salary;
  }
  public int ID
  {
   get { return mID; }
   set { mID=value; }
  }
  public virtual void Dispose()
  {
   Dispose(true);
   GC.SuppressFinalize(this);
  }
  protected virtual void Dispose(bool disposing)
    {
   if(!disposed)
      {
   // if disposing = true cleanup 
// managed resources 

// Close database connection here...

   }
   disposed = true; 
  }
  public string Name
  {
   get { return mName; }
   set { mName=value; }
  }
  public virtual decimal GetSalary()
  {
   return mSalary;
  }
  ~Employee()
  {
   Dispose(false);
  }
  private string mName;
  int mID;
  decimal mSalary;
  private bool disposed = false;
}

This code shows the recommended design pattern for implementing the IDisposable interface. There are a number of reasons why this code is so complicated. First, remember that once you implement the IDisposable interface, your class must be able to handle 2 different "cleanup" scenarios. One is where the finalize destructor is called by the garbage collector, in this case you will need to close the database connection (or free any other unmanaged [non-garbage collected] resources) there is no need to cleanup managed resources, the garbage collector will do that for you. The other scenario is where the user has explicitly called the Dispose method. In this scenario you should close the database connection (or free any other unmanaged resources) and cleanup any managed resources—if necessary. Microsoft recommends that you put both the managed and unmanaged cleanup logic in a protected, virtual method called Dispose that takes a boolean parameter; this method is an overload of the Dispose method from the IDisposable interface. If you call this method with "true" specified for the parameter it should cleanup both the managed and unmanaged resources, if you pass in "false" it should cleanup just the unmanaged resources. You should call this method with "false" specified for the parameter from the destructor (the garbage collector will handle the cleanup of managed resources). You should call this method from the IDisposable.Dispose method with "true" specified for the parameter, because the method call there is not made within the context of a garbage collection. The IDisposable.Dispose method should also call the SuppressFinalize method on the GC class. The SuppressFinalize method will remove the object from the Finalization queue, so its Finalization method will not be called. The Finalization call is no longer necessary because I have already disposed of the object.

The GC class in the System namespace contains methods for interacting with the garbage collector, and it includes methods for removing an object from the Finalization list (SuppressFinalize) and re-adding an object to the Finalization queue (ReRegisterForFinalize). You will typically only call ReRegisterForFinalize if you decide to resurrect an object during its Finalize method. Remember I mentioned that an object with a Finalize destructor will exist in a zombie state after the garbage collector has determined that it is garbage. The garbage collector will remove the object's entry from the Finalization queue, and add it to the Freachable queue. After the runtime thread in the CLR executes the Finalize destructor, it is possible that the Finalize destructor could resurrect the object by assigning the object's "this" pointer to a global or static variable. The garbage collector will not collect the object the next time it runs because it will be reachable from a root. Of course, the object is in a weird state now because its Finalize destructor has been called. Even if you reinitialize the object's state, you still have a problem, because its entry has been removed from the Finalization queue, Finalization will not be called again. You can remedy this situation by calling ReRegisterForFinalize, which will add the object's entry back to the Finalization queue. In almost all cases, resurrecting an object like this is a bad idea, and it should be avoided.

The CLR will determine when to run the garbage collector, but if you want to explicitly start a garbage collection, the GC class contains a method called Collect that allows you to explicitly cause the garbage collector to run at a particular time. There are two forms of this method: one takes no parameters as follows:

GC.Collect();

The other form of the method takes an integer parameter, which is the generation that you want to collect:

int gen=0;
GC.Collect(gen);

NOTE

The .NET garbage collector is highly optimized and in most cases you are better off letting it decide when to perform a garbage collection rather than trying to do it manually.

Generations are a technique that the garbage collector uses to optimize the garbage collector for speed. The basic ideas underlying generations are that (1) it is faster to compact a portion of the managed heap instead of the entire heap, (2) newer objects will have shorter lifetimes, and (3) older objects will have longer lifetimes. Of course, these three points aren't always true, but they have been found through research to be true for most applications. To take advantage of these ideas, the garbage collector in the CLR assumes that all new objects are in generation 0. When the garbage collector runs, any objects that survive the collection are considered to be in generation 1. Any new objects that are created after the garbage collection go into generation 0. When another garbage collection needs to occur, the garbage collector has two choices: (1) It can collect only generation 0, or (2) it can collect generations 0 and 1 (actually, there are three choices because there is also a generation 2, which I will talk about shortly). In most circumstances, the garbage collector will only attempt to garbage-collect generation 0. The exact algorithm that the garbage collector uses to determine whether to garbage-collect only generation 0 or 0 and 1 is obviously a Microsoft secret, but, in general, the garbage collector will only collect generation 1 If performing a garbage collection on generation 0 does not free up enough memory to satisfy a memory allocation request. Any objects that survive a collection on generation 0 and 1 are promoted to generation 2. There are again more heuristics built into the garbage collector that determine when it will run a collection on all three generations. Any objects that survive a collection on all 3 generations will remain in generation 2 because the garbage collector currently only supports three generations (0,1, and 2). Microsoft does seem to be leaving the door open to support more generations in the future because the GC class in the System namespace does support a MaxGeneration property that you can use to determine the highest generation number. This method currently returns 2 but this may change in the future.

The last garbage collector-related topic that I will discuss is weak references. Weak references give you a way to maintain a reference to an object, while allowing the garbage collector to collect the object if a collection occurs. Normal references are called strong references because, if the garbage collector runs while you are holding a strong reference to an object, the object will not be collected. In order to use the object pointed to by a weak reference, you must first obtain a strong reference from the weak reference. If the garbage collector has collected the object, the conversion from a weak reference to a strong reference will fail, so you will have to re-create the object. Weak references are good for objects that take up a lot of memory, but are easy to re-create. A good example is a directory tree for a file system. A directory tree can be extremely large and therefore may take a lot of time to re-create. For performance reasons, you may like to keep this tree in memory, but requiring the system to keep this tree in memory will put a lot of memory pressure on your system. So you may choose to keep the directory tree in memory, but still allow the garbage collector to reclaim the memory used by the tree if it needs to. Let's look at some code that should make this much clearer.

You can create a weak reference on an object using the code shown in the cmdCreateWeak_Click method that follows. Notice that I first check to see that the object has not been collected using the IsAlive property on the WeakReference class before I attempt to use the Manager object.

public class Form1 : System.Windows.Forms.Form
{
  private WeakReference wkRef;
  // Other code omitted from this class.
  //
  private void cmdCreateWeak_Click(object sender,
    System.EventArgs e)
  {
    Manager mgr=new Manager(1,"Alan Gordon",500,100);
    wkRef=new WeakReference(mgr);
  } 
  private void cmdUseWeak_Click(object sender,
    System.EventArgs e)
  {
    if (wkRef.IsAlive)
    {
      aManager=(Manager)wkRef.Target;
      MessageBox.Show("The object is alive");
      // Use the manager object
      // 
    }
    else
      MessageBox.Show(
        "The manager has been collected");
  }
}

You can specify a Boolean trackResurrection parameter in the WeakReference constructor as follows:

wkRef=new WeakReference(mgr,true);

If you specify false for this second parameter (the default), the WeakReference will not track the underlying object (that is, the IsAlive property will return false) after its Finalize Destructor has run. This is called a short weak reference. If you specify true for the trackResurrection parameter, the WeakReference will continue to track the object while it exists in the zombie state after its Finalize Destructor has run, but before a second garbage collection has finished off the object. This is called a long weak reference. Essentially, specifying true for the trackResurrection property allows you to specify whether you can use the WeakReference to resurrect an object whose Finalize method has been run.

  • + Share This
  • 🔖 Save To Your Account