When C was created, it was very fast because it was almost trivial to turn C code into equivalent machine code. But this was only a short-term benefit; in the 30 years since C was created, processors have changed a lot. The task of mapping C code to a modern microprocessor has gradually become increasingly difficult. Since a lot of legacy C code is still around, however, a huge amount of research effort (and money) has been applied to the problem, so we still can get good performance from the language.
As a simple example, consider the vector unit found in most desktop processors (for example, MMX, SSE, or 3DNow! in x86 chips; AltiVec in PowerPC). These are also known as Single Instruction Multiple Data (SIMD) units because they perform the same operation on multiple inputs simultaneously. One instruction might take four integers, add them to four other integers, and provide the result as a set of four more integers. Now imagine some code that could be adapted to take advantage of this capability.
In C, the usual representation of a vector is as an array. Unfortunately, you can’t define operations on arrays, so adding the values in two arrays would be done as a loop in C, with scalar operations in the loop body. A C compiler needs to be able to spot the following facts:
- There are no dependencies between loop iterations.
- The operations in the loop are capable of being mapped to vector operations.
This is non-trivial. Now consider a slightly higher-level language, FORTRAN (which predates C by more than two decades, incidentally). FORTRAN has a vector datatype, and operations on it. The only difference between FORTRAN vectors and those understood by the CPU is that FORTRAN vectors are an arbitrary length. All the compiler needs to do is split the vectors into chunks of the correct length.