Home > Articles > Programming > C/C++

  • Print
  • + Share This
From the author of Problems with Memory

Problems with Memory

C exposes a PDP-11's memory model to programmers. It tries to do so in an abstract way, but the abstraction is quite low-level. For example, you can do arithmetic on pointer values in C. The result of doing arithmetic that goes beyond the range of an object (a block of allocated memory) is undefined, but because C pointers are typically implemented as just integers, it often still works.

The biggest problem that you're likely to find relates to alignment. A misaligned load or store is typically very expensive on modern CPUs—especially across a cache-line boundary. Some, like x86, are quite forgiving. SPARC, at the other extreme, will just trap and abort. ARM may or may not work, depending on the version of the core and the OS.

If you declare a struct in C, the compiler will automatically insert enough padding to make sure that the loads and stores are aligned. If you declare an array, it will do the same.

In some cases, it's possible to confuse the compiler. For example, if you have a long long*, the compiler will assume that it's aligned on an 8-byte boundary on these systems, if that's the alignment requirement for long long. It will then emit an aligned load or store instruction, and your code will fail if the pointer is aligned incorrectly. If the pointer is generated by pointer arithmetic, or by a cast from another type, it might not have the correct alignment. This is particularly common when you store pointers in collections that use void* for the elements and then cast the result to the correct type.

If the compiler knows that a pointer may be unaligned, it can work around this problem. This isn't very fast: It needs to test whether the lowest few bits are 0; if they are, it jumps to the fast path. Otherwise, it performs two loads of adjacent words and then masks and rotates the results together. You can typically tell the compiler that this is a possibility by the use of a compiler-specific keyword or attribute. For example, Microsoft C allows you to use __unaligned when specifying a pointer to generate this behavior.

Unaligned stores come with one additional problem; they break atomicity. This problem is actually slightly more general. If you do a 32-bit store, the memory location will always contain either the old version or the new version. If the value is misaligned, you'll need two store instructions, so the cache will contain a value that contains some bits from the old value and some bits from the new one. Multithreaded code checking a monotonically incrementing counter, for example, can break.

This problem is more subtle, because on some x86 chips you'll encounter the problem only when the unaligned load and store spans a cache line. It's quite possible to write code that works fine on x86 but crashes early on with other architectures. Worse, because this is a threading-related bug, it's incredibly difficult to find.

  • + Share This
  • 🔖 Save To Your Account