File Descriptors as Capabilities
Capsicum extends file descriptors to include the notion of what you are allowed to do with the file. They already have some limited support for this. If, for example, you specify O_RDONLY to the open() system call, then you will get an error if you try writing to the resulting file descriptor. This is largely advisory: There is nothing stopping you from using fstat() to get the original path, and then opening it in a new mode.
This is where Capsicum enters the picture. After a call to cap_enter(), the program is in capability mode and is not allowed to create any new file descriptors via most of the standard mechanisms.
In particular, system calls like open() and socket() will simply fail. This has the advantage that it's a very simple test to perform and therefore quite easy to get right: Just check one flag and give up if it's cleared.
Capability file descriptors behave just like normal ones. You can pass them to any system call that expects a file descriptor, but you may get an error if you don't have the correct rights. These include read and write permissions—and also a variety of other things.
For example, you can create a socket that can be used with bind() and accept() system calls to accept incoming connections, but not with anything else. This means that you can easily create a server that is incapable of creating outgoing connections, so a worm that infected it would not be able to dial home and report the infection, nor propagate further except to the clients that connect. This can be quite useful because it would require both a client and a server exploit to propagate; it wouldn't be able to just jump from server to server.
The basic structure of a simple program using Capsicum is to create some file descriptors with open(), socket(), or similar calls; use cap_new() to restrict them to a certain subset of operations; and then call cap_enter() and run the rest of the code. In this limited mode, any bugs are very difficult to exploit.
Being able to limit the set of files that you can access is useful, but in most cases you don't know exactly what set of files you will need to access, and you often don't. A simple example is temporary files, where a program may create a variety of them over its lifetime. You certainly don't want to hold a file descriptor at program start for every temporary file that you might ever need over the program's lifetime.
The simplest solution is provided via the openat() system call. This is similar to the open() system call, but takes a file descriptor as the base path. If you wanted to access a temporary directory, then you would do something like this:
char template = "/tmp/myAppXXXXX"; template = mkdtemp(template); int tmp = open(template, O_RDWR); int tmpdir = cap_new(tmp, CAP_LOOKUP | CAP_CREATE | CAP_READ | CAP_WRITE); close(tmp); cap_enter(); int tmpfile = openat(tmp, "newtempfile", O_CREAT | O_RDWR
This new temporary file can be created and used, giving roughly equivalent functionality to a chroot. There are several differences, most notably that the process can have several of these environments. For example, a web server could hold one file descriptor open for each user's public_html directory, but still be unable to see any of the rest of the filesystem.