Memory Ordering and Atomicity
The initial x86 chips were very simple, executing instructions one at a time, in order. Because backward compatibility was a significant requirement for subsequent chips, the architecture became strongly ordered. This means that memory operations must happen in the order in which they appear in the instruction stream. More technically, they must appear to happen in that order, from the perspective of other code.
At the opposite end of the spectrum was the Alpha. This chip was allowed to reorder memory accesses as much as it liked, and came with a large collection of memory barriers to force specific orderings where required.
In single-threaded code, order isn't very important. In multithreaded code, it becomes a major issue. You can maintain consistency in some lockless data structures by ensuring that updates are performed in a certain order. If the CPU reorders them, it might suddenly break.
These problems are among the hardest to debug, because they depend on two threads being in correct relative position to each other and on the CPU performing the reordering. The latter part is the worst, as it means that code may work perfectly on one CPU and then fail on the next revision of the same architecture.
The performance gain from allowing memory reordering is small, and it doesn't make up for the extra headaches that come from difficult-to-find failures.