Multithreading and the C++ Type System

Date: Feb 8, 2002

Article is provided courtesy of Addison Wesley.

Multithreaded programming is unwieldy, to say the least. You need all the help you can get, especially when your own compiler can graciously provide it. Andrei Alexandrescu describes C++ idioms that exploit the C++ type system to achieve detection of race conditions at compile time.

Introduction

Multithreaded programming is so troublesome from both a theoretical and practical viewpoint that it certainly would not have continued as a valid programming paradigm—were it not for its formidable payoff in efficiency.

On shared-memory multiprocessor machines, multithreading is the best, most natural way to exploit machine resources. Plus, many interactive programs spend a significant amount of time waiting for keyboard/mouse input—time that can be used to perform useful ancillary tasks (precomputing hidden edges of a 3D graphic, gathering document statistics, and so on). This explains why multithreading is so attractive, even though, if true parallelism is not present, a multithreaded program will always be slower than its single-threaded equivalent.

Despite its attractiveness, multithreading has consistently escaped formalization attempts. After Dijkstra's seminal work¹, little theoretical progress has been made in perfecting efficient and portable threading models, despite a significant accumulation of practical experience.

Attempts to integrate threads in programming languages have registered a pale success at best. More often than not, they offer limited idioms and inefficient implementations. To date, multithreaded programs that need to be efficient use the same mutexes, semaphores, and events that Dijkstra described 35 years ago, seasoned with platform-specific low-level idiosyncrasies.

This unfortunate state of affairs makes multithreaded programs difficult to design, debug, ensure correct, optimize, maintain, and analyze formally. Consequently, building tools that automate detection of race conditions and deadlocks is highly desirable.

Recent research papers², ³ describe how to create type systems that make it easier to detect multithreading-specific errors. For example, a race condition will cause a type error that is detected at compile time.

An article of mine⁴ describes practical compile-time race condition detection in C++. The method exploits the volatile keyword not as a semantic vehicle, but only as a participant to the C++ type system. The programmer qualifies the shared user-defined objects with volatile. Those objects can be manipulated only by applying a const_cast. Finally, a helper object can ensure that the const_cast is performed only in conjunction with locking a synchronization object. Effectively, an object's type (volatile-qualified or not) depends on whether its corresponding synchronization object is locked or not. The main caveat of the technique is that the use of the obscure volatile qualifier might appear confusing to the unwitting maintainer.

The following text describes more C++ idioms that exploit the C++ type system to achieve detection of race conditions at compile time.

Multithreaded Programming in Object-Oriented Languages

First, let's recap some typical multithreaded programming idioms in object-oriented languages.

Typically, object-oriented programs use object-level locking by associating a synchronization object (mutex) with each object that is susceptible to be shared between threads. Then, code that manipulates the state of the object can synchronize by locking that object. Inside a synchronized section, the mutex associated with the object is locked, and consequently that object's fields can be accessed safely.

In C++, this fundamental idiom is typically implemented with a helper Lock object. Consider, for example, modeling a bank account class that supports simultaneous deposits and withdrawals from multiple locations (arguably the "Hello, World" of multithreaded programming). In the code below, guard's constructor locks the passed-in object this, and guard's destructor unlocks this.

class BankAccount {
  Mutex mtx_;
  int balance_;
public:
  void Deposit(int amount) {
         Lock guard(*this);
    balance_ += amount;
  }
  void Withdraw(int amount) {
         Lock guard(*this);
    balance_ -= amount;
  }
  void AcquireMutex() {
    mtx_.Acquire();
  }
  void ReleaseMutex() {
    mtx_.Release();
  }
};

Lock is a separate class whose implementation is independent of BankAccount's semantics. All Lock has to do with the BankAccount object is call a method such as AcquireMutex() in its constructor and ReleaseMutex() in its destructor. This applies to any type that supports locking. To work with all such types of objects, Lock is usually a template class, implemented as shown below:

template <class T> class Lock {
  T& obj_;
public:
  Lock(T& obj) : obj_(obj) {
    obj.AcquireMutex();
  }
  ~Lock() {
    obj_.ReleaseMutex();
  }
};

So now you use Lock<BankAccount> to lock a BankAccount, Lock<Widget> to lock a Widget, and so on. Notice that the template approach generates a different type of lock for each type locked (Lock<BankAccount> is a distinct type from Lock<Widget> and so on). As in a detective movie, this detail will later become of paramount importance.

The object-level locking idiom doesn't cover the entire richness of a threading model. For example, the model above is quite deadlock-prone when you try to coordinate multi-object transactions. Nonetheless, object-level locking is useful in many cases, and in combination with other mechanisms can provide a satisfactory solution to many threaded access problems in object-oriented programs.

Doomed If You Do, and Doomed If You Don't

When designing a class susceptible to sharing by multiple threads, one inevitable question occurs: Should that class use external, caller-provided, or internal locking?

External locking is easiest to support because it puts the entire burden on the user of the class. External locking means that all synchronization occurs in client code. You only guarantee that separate threads can manipulate distinct instances of your class concurrently. All you have to do to support external locking is to make sure that your class doesn't use static/global data—or if it does, that it uses proper synchronization. (Recently, the C++ community began using the term thread-neutral to describe such classes.)

The BankAccount class above uses internal locking. Basically, a class that uses internal locking guarantees that any concurrent calls to its public member functions don't corrupt an instance of that class. This is typically ensured by having each public member function acquire a lock on the object upon entry. This way, for any given object of that class, there can be only one member function call active at any moment, so the operations are nicely serialized.

This approach is reasonably easy to implement and has an attractive simplicity. Unfortunately, "simple" might sometimes morph into "simplistic."

Internal locking is insufficient for many real-world synchronization tasks. Imagine that you want to implement an ATM withdrawal transaction with the BankAccount class. The requirements are simple. The ATM transaction consists of two withdrawals—one for the actual money and one for the $2 commission. The two withdrawals must appear in strict sequence; that is, no other transaction can exist between them.

The obvious implementation is erratic:

void ATMWithdrawal(BankAccount& acct, int sum) {
  acct.Withdraw(sum);
  acct.Withdraw(2);
}

The problem is that between the two calls above, another thread can perform another operation on the account, thus breaking the second design requirement.

In an attempt to solve this problem, let's lock the account from the outside during the two operations:

void ATMWithdrawal(BankAccount& acct, int sum) {
     Lock<BankAccount> guard(acct);
  acct.Withdraw(sum);
  acct.Withdraw(2);
}

Notice that now acct is being locked by Withdraw after it has already been locked by guard. When running such code, one of two things happens.

Your mutex implementation might support the so-called recursive mutex semantics. This means that the same thread can lock the same mutex several times successfully. In this case, the implementation works but has a performance overhead due to unnecessary locking. (The locking/unlocking sequence in the two Withdraw calls is not needed but performed anyway—and that costs time.)
Your mutex implementation might not support recursive locking, which means that as soon as you try to acquire it the second time, it blocks—so the ATMWithdrawal function enters the dreaded deadlock.

The caller-ensured locking approach is more flexible and the most efficient, but very dangerous. In an implementation using caller-ensured locking, BankAccount still holds a mutex, but its member functions don't manipulate it at all. Deposit and Withdraw are not thread-safe anymore. Instead, the client code is responsible for locking BankAccount properly.

Obviously, the caller-ensured locking approach has a safety problem. BankAccount's implementation code is finite, and easy to reach and maintain, but there's an unbounded amount of client code that manipulates BankAccount objects. In designing applications, it's important to differentiate between requirements imposed on bounded code and unbounded code. If your class makes undue requirements on unbounded code, that's usually a sign that encapsulation is out the window.

To conclude, if in designing a multithreaded class you settle on internal locking, you expose yourself to inefficiency or deadlocks. On the other hand, if you rely on caller-provided locking, you make your class error-prone and difficult to use. Finally, external locking completely avoids the issue by leaving it all to the client code.

Note, however, that classes supporting external locking are not the bottom of the pile. Some classes are even less friendly to threads. For example:

class Counted {
  static unsigned int instances_;
public:
  Counted() { ++instances_; }
  Counted(const Counted&) { ++instances_; }
  ~Counted() { --instances_; }
};
unsigned int Counted::instances_ = 0;

The purpose of Counted is to tally its own number of instances at any moment. If integer arithmetic is not atomic on your system (and it isn't on many), you need to serialize (with a global mutex) the creation and destruction of all Counted objects throughout the system. This is feasible but often impractical.

Locks as Permits

So what to do? Ideally, the BankAccount class should do the following:

Support both locking models (internal and external).
Be efficient; that is, use no unnecessary locking.
Be safe; that is, BankAccount objects cannot be manipulated without appropriate locking.

Let's make a worthwhile observation: Whenever you lock a BankAccount, you do so by using a Lock<BankAccount> object. Turning this statement around, wherever there's a Lock<BankAccount>, there's also a locked BankAccount somewhere. Thus, you can think of—and use—a Lock<BankAccount> object as a permit. Owning a Lock<BankAccount> gives you rights to do certain things. The Lock<BankAccount> object should not be copied or aliased (it's not a transmissible permit). As long as a permit is still alive, the BankAccount object stays locked. When the Lock<BankAccount> is destroyed, the BankAccount's mutex is released.

The net effect is that at any point in your code, having access to a Lock<BankAccount> object guarantees that a BankAccount is locked. (You don't know exactly which BankAccount is locked, however—an issue that we'll address soon.)

For now, let's make a couple of enhancements to the Lock class template defined in the previous section. We'll call the enhanced version StrictLock. Essentially, a StrictLock's role is only to live on the stack as an automatic variable. StrictLock must adhere to a non-copy and non-alias policy. StrictLock disables copying by making the copy constructor and the assignment operator private. While we're at it, let's disable operator new and operator delete; StrictLocks are not intended to be allocated on the heap. StrictLock avoids aliasing by using a slightly less orthodox and less well-known technique: disable address taking.

template <class T> class StrictLock {
  T& obj_;
  StrictLock(); // disable default constructor
  StrictLock(const StrictLock&); // disable copy constructor
  StrictLock& operator=(const StrictLock&); // disable assignment
  void* operator new(std::size_t); // disable heap allocation
  void* operator new[](std::size_t);
  void operator delete(void*);
  void operator delete[](void*);
  StrictLock* operator&(); // disable address taking
public:
  StrictLock(T& obj) : obj_(obj) {
    obj.AcquireMutex();
  }
  ~StrictLock() {
    obj_.ReleaseMutex();
  }
};

Silence can be sometimes louder than words—what's forbidden to do with a StrictLock is as important as what you can do. Let's see what you can and what you cannot do with a StrictLock instantiation:

You can create a StrictLock<T> only starting from a valid T object. Notice that there is no other way you can create a StrictLock<T>.
```
BankAccount myAccount("John Doe", "123-45-6789");
StrictLock<BankAccount> myLock(myAccount); // ok
```
You cannot copy StrictLocks to one another. In particular, you cannot pass StrictLocks by value to functions or have them returned by functions:
```
extern StrictLock<BankAccount> Foo(); // compile-time error
extern void Bar(StrictLock<BankAccount>); // compile-time error
```

However, you still can pass StrictLocks by reference to and from functions:

// ok, Foo returns a reference to Lock<BankAccount>
extern StrictLock<BankAccount>& Foo();
// ok, Bar takes a reference to Lock<BankAccount>
extern void Bar(StrictLock<BankAccount>&);

You cannot allocate a StrictLock on the heap. However, you still can put StrictLocks on the heap if they're members of a class.
```
StrictLock<BankAccount>* pL =
  new StrictLock<BankAccount>(myAcount); //error!
  // operator new is not accessible
class Wrapper {
  Lock memberLock_;
  ...
};
Wrapper* pW = new Wrapper; // ok
```
(Making StrictLock a member variable of a class is not recommended. Fortunately, disabling copying and default construction makes StrictLock quite an unfriendly member variable.)
You cannot take the address of a StrictLock object. This interesting feature, implemented by disabling unary operator&, makes it very unlikely to alias a StrictLock object. Aliasing is still possible by taking references to a StrictLock:
```
Lock<BankAccount> myLock(myAccount); // ok
Lock<BankAccount>* pAlias = &myLock; // error!
  // Lock<BankAccount>::operator& is not accessible
Lock<BankAccount>& rAlias = myLock; // ok
```
Fortunately, references don't engender as bad aliasing as pointers because they're much less versatile (references cannot be copied or reseated).
You can even make Lock final; that is, impossible to derive from. This task is left in the form of an exercise to the reader.

All these rules were put in place with one purpose—enforcing that owning a StrictLock<T> is a reasonably strong guarantee that (1) you locked a T object, and (2) that object will be unlocked at a later point.

Now that we have such a strict StrictLock, how do we harness its power in defining a safe, flexible interface for BankAccount? The idea is as follows:

Each of BankAccount's interface functions (in our case, Deposit and Withdraw) comes in two overloaded variants.
One version keeps the same signature as before, and the other takes an additional argument of type StrictLock<BankAccount>. The first version is internally locked; the second one requires external locking. External locking is enforced at compile time by requiring client code to create a StrictLock<BankAccount> object.
BankAccount avoids code bloating by having the internal locked functions forward to the external locked functions, which do the actual job.

A little code is worth 1,000 words, a (hacked into) saying goes, so here's the new BankAccount class:

class BankAccount {
  int balance_;
public:
     void Deposit(int amount, StrictLock<BankAccount>&) {
    // Externally locked
    balance_ += amount;
  }
  void Deposit(int amount) {
    Lock guard(*this); // Internally locked
    Deposit(amount, guard);
  }
    void Withdraw(int amount, StrictLock<BankAccount>&) {
    // Externally locked
    balance_ -= amount;
  }
  void Withdraw(int amount) {
    Lock guard(*this); // Internally locked
    Withdraw(amount, guard);
  }
};

Now, if you want the benefit of internal locking, you simply call Deposit(int) and Withdraw(int). If you want to use external locking, you lock the object by constructing a StrictLock<BankAccount> and then you call Deposit(int, StrictLock<BankAccount>&) and Withdraw(int, StrictLock<BankAccount>&). For example, here's the ATMWithdrawal function implemented correctly:

void ATMWithdrawal(BankAccount& acct, int sum) {
  StrictLock<BankAccount> guard(acct);
  acct.Withdraw(sum, guard);
  acct.Withdraw(2, guard);
}

This function has the best of both worlds—it's reasonably safe and efficient at the same time.

It's worth noting that StrictLock being a template gives extra safety compared to a straight polymorphic approach. In such a design, BankAccount would derive from a Lockable interface. StrictLock would manipulate Lockable references so there's no need for templates. This approach is sound; however, it provides fewer compile-time guarantees. Having a StrictLock object would only tell that some object derived from Lockable is currently locked. In the templated approach, having a StrictLock<BankAccount> gives a stronger guarantee—it's a BankAccount that stays locked.

Whose Lock Is It, Anyway?

There's a weasel word in there—I mentioned that ATMWithdrawal is reasonably safe. It's not really safe because there's no enforcement that the StrictLock<BankAccount> object locks the appropriate BankAccount object. The type system only ensures that some BankAccount object is locked. For example, consider the following phony implementation of ATMWithdrawal:

void ATMWithdrawal(BankAccount& acct, int sum) {
  BankAccount fakeAcct("John Doe", "123-45-6789");
  StrictLock<BankAccount> guard(fakeAcct);
  acct.Withdraw(sum, guard);
  acct.Withdraw(2, guard);
}

This code compiles warning-free but obviously doesn't do the right thing—it locks one account and uses another.

It's important to understand what can be enforced within the realm of the C++ type system and what needs to be enforced at runtime. The mechanism we've put in place so far ensures that some BankAccount object is locked during the call to BankAccount::Withdraw(int, StrictLock<BankAccount>&). We must enforce at runtime exactly what object is locked.

If our scheme still needs runtime checks, how is it useful? An unwary or malicious programmer can easily lock the wrong object and manipulate any BankAccount without actually locking it.

First, let's get the malice issue out of the way. C is a language that requires a lot of attention and discipline from the programmer. C++ made some progress by asking a little less of those, while still fundamentally trusting the programmer. These languages are not concerned with malice (as Java is, for example). After all, you can break any C/C++ design simply by using casts "appropriately" (if appropriately is an, er, appropriate word in this context).

The scheme is useful because the likelihood of a programmer forgetting about any locking whatsoever is much greater than the likelihood of a programmer who does remember about locking, but locks the wrong object.

Using StrictLocks permits compile-time checking of the most common source of errors, and runtime checking of the less frequent problem.

Let's see how to enforce that the appropriate BankAccount object is locked. First, we need to add a member function to the StrictLock class template. The StrictLock<T>::GetObject function returns a reference to the locked object.

template <class T> class StrictLock {
  ... as before ...
public:
  T* GetObject() const { return &obj_; }
};

Second, BankAccount needs to compare the locked object against this:

class BankAccount {
  int balance_;
public:
  void Deposit(int amount, StrictLock<BankAccount>& guard) {
    // Externally locked
    if (guard.GetObject() != this)
      throw "Locking Error: Wrong Object Locked";
    balance_ += amount;
  }
  ...
};

The overhead incurred by the test above is much lower than locking a recursive mutex for the second time.

Improving External Locking

Now let's assume that BankAccount doesn't use its own locking at all, and has only a thread-neutral implementation:

class BankAccount {
  int balance_;
public:
  void Deposit(int amount) {
    balance_ += amount;
  }
  void Withdraw(int amount) {
    balance_ -= amount;
  }
};

Now you can use BankAccount in single-threaded and multi-threaded applications alike, but you need to provide your own synchronization in the latter case.

Say we have an AccountManager class that holds and manipulates a BankAccount object:

class AccountManager {
  Mutex mtx_;
  BankAccount checkingAcct_;
  BankAccount savingsAcct_;
  ...
};

Let's also assume that, by design, AccountManager must stay locked while accessing its BankAccount members. The question is, how can we express this design constraint using the C++ type system? How can we state "You have access to this BankAccount object only after locking its parent AccountManager object"?

The solution is to use a little bridge template ExternallyLocked that controls access to a BankAccount.

template <class T, class Owner>
class ExternallyLocked {
  T obj_;
public:
  ExternallyLocked() {}
  ExternallyLocked(const T& obj) : obj_(obj) {}
  T& Get(StrictLock<Owner>&) { return obj_; }
  void Set(const T& obj, Lock<Owner>&) { obj_ = obj; }
};

ExternallyLocked cloaks an object of type T, and actually provides full access to that object through the Get and Set member functions, provided you pass a reference to a Lock<Owner> object.

Instead of making checkingAcct_ and savingsAcct_ of type BankAccount, AccountManager holds objects of type ExternallyLocked<BankAccount, AccountManager>:

class AccountManager {
  Mutex mtx_;
  ExternallyLocked<BankAccount, AccountManager> checkingAcct_;
  ExternallyLocked<BankAccount, AccountManager> savingsAcct_;
  ...
};

The pattern is the same as before—to access the BankAccount object cloaked by checkingAcct_, you need to call Get. To call Get, you need to pass it a Lock<AccountManager>. The one thing you have to take care of is to not hold pointers or references you obtained by calling Get. If you do that, make sure that you don't use them after the Lock has been destroyed. That is, if you alias the cloaked objects, you're back from "the compiler takes care of that" mode to "you must pay attention" mode.

Typically, you use ExternallyLocked as shown below. Suppose you want to execute an atomic transfer from your checking account to your savings account:

void AccountManager::Checking2Savings(int amount) {
  Lock<AccountManager> guard(*this);
  checkingAcct_.Get(guard).Withdraw(amount);
  savingsAcct_.Get(guard).Deposit(amount);
}

We achieved two important goals. First, the declaration of checkingAcct_ and savingsAcct_ makes it clear to the code reader that that variable is protected by a lock on an AccountManager. Second, the design makes it impossible to manipulate the two accounts without actually locking a BankAccount. ExternallyLocked is what could be called active documentation.

Conclusion

Writing multithreaded code requires skill, discipline, and attention. This article presents idioms and techniques that let the compiler help you write correct, race condition-free multithreaded code. By using locks as permits, you can have the C++'s type system remind the programmer that locks are needed in order to access certain functions or objects.