An Advanced Architecture
The GCC design has gradually evolved since the project’s creation, and adopted some more modern design principles. Static single assignment (SSA) was one of the more recent large changes, back in 2005. The principle of SSA is that each variable should hold only one value ever. This rule is enforced in the language in something like Erlang, or in the intermediate form in a C compiler. Consider some C code of the following form:
a = b; a += c;
This code doesn’t conform to the SSA principle, so the compiler would replace it with something like this:
a1 = b; a2 = c;
All future uses of a would be replaced with references to a2, until the next assignment to a. This form allows a number of optimizations to be accomplished quite easily. Describing it as "new" is somewhat misleading, however; the original paper proposing it was published in 1985. Most modern compilers, including the newer versions of PCC, use some kind of SSA form.
While GCC has accreted design components, another compiler was written from scratch based on the latest ideas in compiler research. The Low Level Virtual Machine (LLVM) is designed to focus heavily on optimization. Like Java and .NET, it uses a virtual machine to define an intermediate form; however (as the name suggests), this machine is quite low-level, and not tied to any particular language.
Originally, LLVM used code taken from GCC to handle parsing. This approach changed slightly with release 2.1 in September 2007. A new, Apple-developed front end was introduced, with support for C, C++, and Objective-C, under the name "clang.’’
Part of the motivation for developing clang came from a criticism often leveled at GCC: that it’s difficult to separate out the front-end and back-end code. When you edit code in something like Microsoft’s Visual Studio, you can use the same code for parsing the code to generate syntax-highlighting information as you use for code generation. The same is true of most LISP and Smalltalk environments. This isn’t the case with something like Apple’s Xcode, however, which has to implement its own parser for syntax highlighting and code completion. This setup is less than ideal for developers, because parsing errors in the IDE don’t necessarily correspond to code that won’t compile, and vice versa.
Extracting the front end from GCC for this purpose is difficult for two reasons. First, GCC is released under the GNU General Public License, which means that any other program built using it is required to be under the same license. Second, GCC intentionally ties the front and back ends into the rest of the code quite closely, to avoid "semi-proprietary" forks. In contrast, LLVM is BSD-licensed and has comparatively clean separation between the various layers.