Home > Articles > Programming > C/C++

  • Print
  • + Share This
From the author of Floating-Point Accuracy

Floating-Point Accuracy

One thing that can confuse people new to C is the handling of floating-point values on x86, and how this differs from pretty much every other architecture. The 8087 floating-point coprocessor used an 80-bit internal representation, which continued for all subsequent x87 FPUs.

When you declare a floating-point value in C, you use either float or double, defining either a 32-bit or 64-bit floating-point quantity. Most modern compilers use the IEEE representations for these in memory. On x87, this gives some unpredictable results. A sequence of floating-point operations that fit in the x87 registers will result in the calculations being performed with 80-bit precision, and then the final result truncated to either 32 or 64 bits.

If you compile with SSE, or for a non-x86 chip, the calculations will instead be performed at the precision specified with the type. This design can cause confusion when you use floats, because compiling for a different architecture can mean that you suddenly go from running a calculation at 80-bit precision to 32-bit precision. If you have a tight loop that does a lot of floating-point operations, you may do a few hundred operations without needing to spill to the stack. The cumulative errors when you reduce the precision can then be quite large.

This isn't just a problem between architectures, but also between compilers—and even between flags with the same compiler.

Floating-point these days is generally quite portable, in the "your code will run" sense, but not always in the "your code will run at an acceptable speed" sense. On x86, there's generally little or no speed difference between using float and double, so there's a tendency to use double everywhere. It will be slightly slower if your compiler is able to vectorize your operations, because you can only do two operations at a time instead of one, and it will cause slightly more cache churn, but that's about it. In microbenchmarks, they'll be the same speed.

A modern high-end ARM chip, in contrast, has a fast vector unit that can do 32-bit floating-point arithmetic, and a slow FPU that can do 64-bit arithmetic. A cheaper core aimed at embedded systems won't have an FPU at all. 64-bit floating-point operations will be much slower than 32-bit ones in all cases. On the cheaper cores, they'll be significantly slower than integer operations.

With SSE for floating-point operations, a modern x86 chip runs floating-point multiplies, for example, almost as fast as integer multiples (with x87, they're much slower). In contrast, an ARM chip with a hardware FPU is likely to be noticeably slower. One without a hardware FPU will run floating-point operations between 1% and 10% of the speed of integer operations. Given that such chips tend to be quite slow anyway, code that performs a lot of floating-point calculations can be amazingly slow.

There's a temptation when writing code for x86 to use floating-point in cases where fixed-point might be more appropriate, because it's only slightly slower and more flexible. This can make your code completely unusable on other platforms.

  • + Share This
  • 🔖 Save To Your Account