One good primitive to have in our toolbox is a technique for locking files, so we don't accidentally create a race condition. Note, however, that file locking on most operating systems is discretionary, and not mandatory, meaning that file locks are only enforced by convention, and can be circumvented. If you need to make sure that unwanted processes do not circumvent locking conventions, then you need to make sure that your file (and any lock files you may use) are in a directory that cannot be accessed by potential attackers. Only then can you be sure that no one malicious is circumventing your file locking.
At a high level, performing file locking seems really easy. We can just use the open() call, and pass in the O_EXCL flag, which asks for exclusive access to the file. When that flag is used, the call to open() only succeeds when the file is not in use. Unfortunately, this call does not behave properly if the file we are trying to open lives on a remote NFS-mounted file system running version 1 or version 2 of NFS. This is because the NFS protocol did not begin to support O_EXCL until version 3. Unfortunately, most NFS servers still use version 2 of the protocol. In such environments, calls to open() are susceptible to a race condition. Multiple processes living on the network can capture the lock locally, even though it is not actually exclusive.
Therefore, we should not use open() for file locking if we may be running in an environment with NFS. Of course, you should probably avoid NFS-mounted drives for any security-critical applications, because NFS versions prior to version 3 send data over the network in the clear, and are subject to attack. Even in NFS version 3 (which is not widely used at the time of this writing), encryption is only an option. Nevertheless, it would be nice to have a locking scheme that would work in such an environment. We can do so by using a lock file. The Linux manual page for open(2) tells us how to do so:
The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile. If link() returns 0, the lock is successful. Otherwise, use stat(2) on the unique file to check if its link count has increased to 2, in which case the lock is also successful.
Of course, when we wait for a lock to be released, we risk waiting forever. In particular, sometimes programs crash after locking a file, without ever unlocking the file, leading to a classic case of deadlock. Therefore, we'd like to be able to "steal" a lock, grabbing it, even though some other process holds it. Unfortunately, "stealing" a lock subjects us to a possible race condition. The more we steal locks, the more likely it is that we'll end up corrupting data by writing to a file at the same time that some other process does, by not giving processes enough time to make use of their lock. If we wait a long time before stealing a lock, however, we may see unacceptable delays. The proper value for a timeout depends on your application.
On this book's companion Web site, we provide example code for file locking that behaves properly in NFS environments.