Racing the Kernel
Some people still think that shared-memory concurrency is a good programmers' model for taking advantage of multicore processors. This makes reasoning about your code very difficult, which encourages bugs (and security holes).
The most striking example was found in most system-call interception frameworks a little while ago. They all tend to work in roughly the same way:
- The userspace process issues a system call.
- The interception framework validates the arguments and decides which privilege level the call should have (or if it should proceed at all).
- The kernel handles the call.
Unfortunately, this has a slightly non-obvious flaw. Many system calls take pointers as arguments. The kernel typically is mapped into each process's address space (but marked as only accessible in privileged mode), so the system call handler in the kernel can access the destination of these pointers cheaply (without copying). Even if it can't access them directly on platforms where the kernel has a completely separate address space, typically it still can map the relevant parts of the process's address space cheaply. If you issue a system call with a pointer, there's an optional step 2a:
2a. Another thread modifies the data pointed to by the pointer argument.
In this case the system call handler proceeds with something that it believes to have been validated—but now it isn't. A trivial example might be the bind() system call, which takes the information about the local address as a pointer argument. The interception framework would first check that you were requesting to bind to a non-privileged port, and then allow it. Your second thread would change the port number to a privileged port, and the call would proceed. This issue leads to a number of privilege-elevation vulnerabilities.
The same problem is possible in userspace code using privilege separation, if it uses shared memory. The simplest solution is always to copy an entire memory region into the privileged process before processing it. This technique is fine for small amounts of data, but not ideal for larger ones. Unfortunately, no good solution exists for this problem, other than not to use shared memory, and that's typically slower. Even something like a pipe requires data to be copied into a shared buffer and then copied out. Future operating systems may include something partway between a pipe and a shared memory buffer, where the buffer is in the receiver's address space but writes can be done only via the kernel, and permitted only when the receiver indicates that space is available. This capability isn't likely to appear in the short term, though.