C++

Introduction to Multithreading, Part I

Last updated Dec 5, 2008.

After the approval of the Committee Draft in September 2008, it's time to introduce the new concurrency facilities of C++0x. Before we can start with the technical details of threads, condition variables, mutexes etc, it's important to establish a common framework. I will first explain the rationale behind multithreading and contrast it with the traditional single-threaded execution model using a real-world analogy.

Concurrency In The 21st Century

Concurrency is a general term that refers to various models of parallel execution of software. Concurrency can be implemented by multiprocessing, or executing multiple processes simultaneously, by launching lightweight processes inside a parent process, or by launching threads inside the context of a single process. Before we delve into the gory details of each concurrency model, let's see what concurrency is all about and what its advantages and disadvantages are.

Concurrency and the Hollywood Principle

The "Hollywood principle" is a jocular term based on the "don't call us, we'll call you" cliché. In the Design Patterns parlance, the Hollywood principle refers to the familiar notion of callbacks and other forms of event-driven programming. I took the liberty to borrow this phrase and assign a completely new meaning to it.

Suppose your journal editor has given your team of three talented authors an assignment: watch three DVD films and review them. Skipping certain portions of any of these films, fast-forwarding or basing your review on the trailer alone are all considered cheating. However, the team is allowed to split this Hollywood-related task at their convenience. Either each author will watch one film and write one review, or all three members will watch the same films and write the three reviews together. Assume that you have only one DVD player that can take one disc at a time, and a single TV screen. The film viewing process (to which I henceforth refer as "the task") now becomes the subject of our discussion. What is the fastest way to complete the task?

Don't try to be too creative; the only legal option is to watch the three films in a row, one after the other, alone or together. The total time needed for that task can't be lower than the constant x which is defined as:

x=duration(fim1)+ duration (film2)+ duration (film3);

In practice, you will notice that the actual time needed for completing the task is higher than x because of the extra context switching time needed for ejecting a disk once you have finished watching it, inserting a new disc, getting back to you comfortable seats and pressing PLAY. Context switching time will be represented as the variable c. Thus, the total time needed for completing the task is w which is defined as:

w=x+c; //total time needed for task completion

Context switching may also include popcorn preparation, ordering a pizza, and checking emails on your blackberry between films. Considering the restrictions stated earlier, the only legal way to reduce w is minimizing c. For example, you can order a special high volume disc that includes all three films without any copyright nonsense, ads and "the making of". You can pre-order a pizza; turn off your mobile phone and other external distractions that might get in the way. Still, w will never be smaller than x. Now let's get back to the computer world.

When Hollywood Meets Reality

Imagine a program (a task) that has to complete three operations. Each operation is the equivalent of watching a film. The computer running that program has only one CPU and one monitor. These are equivalent to a single DVD player connected to a single TV screen, respectively. The fastest way to run this program is by performing the three operations serially, or watching all three films in a row without any breaks. Frankly, such serial execution is what single processor machines do best, and fastest. However, you probably know that computers with a single processor supported some form of concurrency even in the 1960s. How did they accomplish that feat? They cheated. They didn't execute anything in parallel. Instead, they would switch context frequently, giving their user the impression that several operations were taking place simultaneously. In our Hollywood analogy, this is similar to watching a single scene from film1, then pressing STOP, ejecting the disc, inserting film2, watching a scene, pressing STOP and so on. If you tried this at home you would notice two things:

Is Concurrency A Bluff?

Trust me, single-processor computers that pretend to be executing programs concurrently are exhausted too. They spend precious system resources on frequent context switching. So why did so many applications, operating systems and programming languages use concurrency when it increases the overall time of completing a task instead of reducing it? First, most users are completely unaware of the significant amount of resources spent on context switching. Secondly, interactive applications often wait for a human operator to press a key, insert some data to a spreadsheet cell or read the End User License Agreement before pressing "I accept". Under these circumstances, it makes sense to interrupt an idle process waiting for I/O and switch to another process briefly. Finally, hardware architectures have been designed and optimized to enable efficient context switching. Thus, instead of pressing STOP, ejecting a disc, and inserting a new disk, these optimized architectures can skip from frame T of film U to frame X of film Y and back to frame T within microseconds. Today, single processor machines are very efficient at context switching but at the end of the day, they are still most efficient when performing one operation after another without any context switching. In other words, the most efficient way to complete your editor's assignment is watching the three films in a row, wasting as little time as possible on context switching. For convenience, I will refer to this conclusion as the "Hollywood Principle theorem".

Reality Bites

The Hollywood Principle theorem holds true so long as the original assumptions are unchanged. What would happen if instead of one DVD player, your team had two players? What if your team also had two TV screens? In the next part I will show how these factors affect your optimization options.