Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

  • Print
  • + Share This
Like this article? We recommend

Like this article? We recommend

Caveats

Shared memory is relatively cheap, but makes it difficult to port your code onto a system without a unified address space, such as a cluster. Newer AMD chips have a unified address space, but not a uniform memory architecture (some parts of memory are faster for some chips), so shared memory performance can be a bit harder to determine in advance.

Operations in C often are translated into multiple instructions. Just because something looks like a single operation in C doesn’t mean that it will be handled atomically.

Locks typically are very expensive. Getting or releasing a lock requires a system call, at least. The example in the preceding section had no code being executed outside of the lock body. In cases like this, where you’re likely to spend a lot of your time on the locking operations, you may be able to optimize things considerably by using atomic read > modify > write instructions. Unfortunately, this setup renders your code non-portable between CPU architectures.

Locks are also easy to forget to use, and tracking down the one place where you access a data structure without the correct locking can be a pain.

A good rule is that debugging complexity for an application using shared memory scales exponentially with the number of threads accessing a data structure. In part 2 of this series, we’ll look at using message-passing approaches to mitigate this problem.

Read Part 2 in the POSIX programming series here.

  • + Share This
  • 🔖 Save To Your Account