C: The Cross-Platform Assembler
The original intention of C was to provide a portable substitute for assembly language for implementing UNIX. C semantics are very similar to those of the PDP-11; for example, C includes shift operations but not rotation, because the PDP-11 didn’t have a rotate instruction. C did register naming for you, but everything else was designed to be trivial to map to an assembly language.
Because it was so close to the real hardware, C code written by a competent programmer typically ran quickly. In recent years, by contrast, high-performance code has gained a big boost from running on the vector units found on most modern CPUs.
Traditional CPU instructions are single instruction, single data (SISD), also called scalar operations. A single instruction does something to a single set of operands, such as adding a single integer to another integer. These operations are easily represented in C. Vector units, on the other hand, execute single instruction, multiple data (SIMD) operations. Each instruction does the same thing to several sets of operations. A vector add operation, for example, might take a vector of four integers, add them to another four integers, and give four integers as a result.
These days, most C compilers try to output vector instructions. This is non-trivial. For one thing, vector units are often quite picky about alignment; while a CPU might only require alignment on 4-byte boundaries for loads, its vector unit could need data aligned on 16-byte boundaries. For another, the compiler has to make sure that it can make the operations happen at the same time without altering the program semantics of the code.