Intrinsics or Assembly
The point of C is that it's low-level and gives you a programmer model that's close to how the hardware actually works. Unfortunately, it often isn't quite low-level enough. For example, C99 doesn't define atomic operations (this is fixed in C1x). If you need to use these features, you have two options:
- Use inline (or out-of-line) assembly, limiting your code to one architecture.
- Use intrinsic functions, limiting your code to one compiler.
The second isn't quite such a limitation, because compilers often implement each other's intrinsics. For example, clang and ICC implement a lot of GCC-intrinsic functions. In these cases, nothing that you can do will make your code portable, but you can make it easier to port.
In general, it's better to use intrinsic functions than inline assembly. The reason is very simple: If you want to use a compiler that doesn't provide the same intrinsics, then you can typically provide a header that implements them with some inline assembly.
If you need to do something that doesn't have support in intrinsics, consider putting it into a separate inline function, rather than inlining it directly. This technique makes it easier to replace with an intrinsic later, which is preferable from an optimization standpoint, because the compiler can reason about the behavior of intrinsics, but knows nothing about arbitrary blobs of assembly beyond what registers they clobber.
It's also worth making sure that there are no library functions that do what you want. I came across one example where code was randomly crashing on FreeBSD with SIGFPE. There was a comment in the code with some inline assembly saying "This is needed to stop it crashing on FreeBSD." The assembly code contained x87 instructions disabling floating-point exceptions.
Unfortunately, poking the FPU status registers in the debugger showed that exceptions were being turned back on. Replacing this assembly with a call to fedisableexcept() meant that the kernel was responsible for ensuring that they were disabled, setting the FPU status correctly on context switch. This was both more reliable and more portableit will work with every FPU type that may signal an exception for some invalid operations, not just for x87.