Date: Mar 14, 2008
Article is provided courtesy of Prentice Hall Professional.
David Chisnall takes a look at the GNU Project's infamous HURD kernel, exploring some of the features that make it unique and some that have found their way into other systems.
In 1984, Richard Stallman founded the GNU Project with the goal of creating a completely Free Software UNIX-like operating system. By the late ’80s, the project had more or less succeeded. Only one major component was missing: a kernel. A kernel is both the most and least important part of the system. The most important, because without it the computer can’t run any of the other programs. The least important, because as long as a kernel implements the POSIX interfaces then it’s more or less interchangeable with any other kernel that does so.
In 1987, the GNU Project began looking for a kernel. One option was 4.2 BSD, with the remaining AT&T code rewritten. Another was a new research kernel being developed at CMU, called Mach.
Since those days, HURD has acquired the same reputation in operating system circles that Duke Nukem Forever has among gamers.
The Speed of Sound
The Mach kernel was very different from the BSD kernel. I’ve discussed a few BSD kernels in this series of articles, and they all have one thing in common: They’re very boring. This isn’t necessarily a criticism. The main task for a kernel is to stay out of the way of users, and a boring kernel does this very well. If you’re planning on writing code for a kernel, however, a boring kernel just isn’t as much fun.
Mach was a very exciting kernel. It was one of the first attempts at a new idea in kernel design: the microkernel. The microkernel took the UNIX philosophy of small tools doing one thing well, and applied that principle to the kernel. A microkernel should provide process isolation well. Everything else should be handled by other processes. Drivers should be in their own process, with limited access to the hardware; each filesystem should be in its own process; and so on.
A few operating systems have been written based on Mach, with OS X being the most popular. All of these systems use what’s known as the "single server" approach. The microkernel provides some basic hardware abstraction, and all of the other operating system services are provided by a single process, typically a modified version of BSD. Porting is relatively easy, because only the small Mach kernel is hardware-specific. This approach also makes it possible to run two or more complete BSD-like environments side by side.
Otherwise, however, there aren’t many advantages to the "single server" approach, and there are some disadvantages. Mach generally is considered to be one of the major reasons that microkernels are not popular today. The problem is the way in which system calls are handled on BSD-like versus Mach-like kernels. A system call is the way in which a typical userspace process interacts with the system. Any time you want to do something that a userspace process can’t do for itself—allocating more memory; reading from or writing to a device such as the screen, keyboard, or disk; and so on—you do it via a system call.
In a BSD-like kernel, you push the arguments onto the stack as you would for a normal function call. Then, rather than issuing a call (or jump) instruction, you write the number of a system call to a specific register and issue an interrupt. The kernel’s interrupt handler reads the value and calls the correct function, which then performs a privileged operation. On some architectures, including modern x86 variants, a special system call instruction is used in place of the interrupt.
In the Mach kernel, most things that were typically implemented as system calls in a BSD kernel are performed by sending Mach messages to a specific process that handles them. Mach messages are roughly an order of magnitude more expensive than BSD system calls. A large part of this expense comes from the fact that the microkernel performs access-right checking on every message, which is very expensive. More modern microkernels, such as L4 and Xen, don’t do this. Xen doesn’t even implement the message passing in the microkernel; it just provides a mechanism for sharing memory, and allows processes (virtual machines) to use ring buffers in these for message passing.
All the Servers You Want
Unlike systems such as XNU and OSF/1, the HURD developers decided to go for a multi-server approach. Rather than having a single process running something like a BSD kernel, they would have different processes for each important subsystem.
This approach offers a number of advantages:
- The most obvious advantage is that it simplifies development, because you can easily restart an important OS component while testing it. This advantage is less obvious now, since most monolithic kernels include some support for virtualization, allowing something similar. For example, Dragonfly BSD, which I discussed in a previous article, has a slightly modified version of the kernel that can run as a process, allowing all of the standard debugging tools to be used.
- Another advantage is parallelism. Since each subsystem is an isolated process, each can run on a separate CPU. All of the locking is done implicitly via the message-passing mechanism, making a multi-server microkernel a very scalable beast. In contrast, traditional monolithic kernels have close ties between kernel components, for performance reasons. A lot of work in Linux and FreeBSD over the last few years has been in the direction of adding the same level of separation that a multi-server microkernel gets by design.
- Isolation is perhaps the biggest advantage. Security in a monolithic kernel is binary; something either runs in kernelspace and can access the entire system, or it runs in userspace and can’t. With HURD, code can have a much finer granularity of permission.
Everything Is a File?
Mach used a port as a basic abstraction for communicating, since it wasn’t restricted to UNIX-like behavior. In contrast, HURD aims to be a POSIX-compliant OS. As such, it uses the filesystem as a namespace in which all objects exist. Whenever you open a file in HURD, you get a Mach port to a translator that implements a protocol used for interacting with file-like objects.
This translator mechanism allows some quite neat extensions. Users need no special permissions to mount a filesystem in a directory they own, other than permission to access the underlying storage device. If this device is a physical disk, then the user might need more privileges, but not if it’s a regular file (such as an ISO 9660 image), or a remote server. This setup allows users to mount things like SMB, NFS shares, or even SVN repositories as regular filesystems just by mounting them (in HURD terminology, "setting the translator" with the settrans command). HURD users don’t run a specific FTP program, for example; they just mount the FTP server as they would any other filesystem.
This idea has been taken up in other operating systems. Projects such as FUSE allow filesystem drivers to be run in userspace on monolithic kernels, and many of these systems now include SMB and FTP drivers in kernelspace (a concept considered the height of bloat when HURD was first conceived).
These translators are not limited to filesystem drivers. The IDE driver, which sits under the filesystem drivers, is another example; it exports device nodes in /dev. Other device drivers can have stacked implementations in the same way. For example, a sound card driver might be set as a translator for /dev/dsp. On top of this, a user might set a translator for ~/mp3 that would play any MP3 data written to the file. A more complex translator might be set for ~/playlist, which would send files linked into that directory to ~/mp3 for playing.
Something similar is possible on most *NIX systems with named pipes, but a HURD translator has two key differences. The first is that a translator can represent a directory, while a pipe can’t. The second is that the interface to a translator is a Mach port, and so can be extended trivially to respond to other messages beyond those of the standard filesystem. A similar mechanism is used on other UNIX-like systems with IOCTLs on device nodes; however, these are only available to special files created by the kernel, while HURD translators can be unprivileged processes.
Many of these ideas will be familiar to Plan 9 users.
Plug and Play
Beyond filesystem translators, there are a few processes to which all others have a connection. The first of these with which a user will interact is the password server, present in the filesystem, which handles interactions with the /etc/passwd file. It provides a simple mapping between passwords and authentication tokens.
Because the password server is just another process, it’s trivial to replace. Irrespective of how the password-to-authentication mapping is performed (via a simple text file, NIS, LDAP, etc.), the interface remains the same. Adding a new authentication mechanism is therefore fairly simple.
The authentication tokens handed out by the password server are ports created by the authentication server. This arrangement provides mechanisms for the root user to create new authentication ports, and to compare the rights granted by a pair of ports.
Processes communicate with their authentication server via a port that’s set up before the process is created. Therefore, all processes on the same system don’t have to use the same authentication server. A program consisting of multiple processes might want to designate one of them as an authentication server, which would be used by the others to determine how they communicated. Other users on the system then could run other processes that interacted with this program, as long as they met some arbitrary requirements for the authentication server.
The other core server is the process server. This is fairly similar in concept to the process table in a BSD-like kernel. It maintains a mapping between POSIX-style process IDs and Mach "tasks" (processes) in the underlying microkernel. It stores some metadata about each process, for use by tools such as ps and top. It’s also a central registry for IPC. Any process can register a Mach port with the process server.
Since the process server is the only way of getting access to the process table, something equivalent to FreeBSD "jails" or Solaris "zones" could be achieved by a user running her own authentication, process, and password servers, giving a different translator as the root directory port to the newly jailed process. None of this activity requires any special privilege, although giving the new process tree its own IP address would require it to have different instances of a few of the network-related servers, which might require root access for them to be allowed to bind to the specified IPs.
A Different Core
A couple of years ago, the HURD developers began to realize that, no matter how shiny their design was, it would always be held back by the fact that it used a first-generation microkernel at the core. An effort began to port the system to a much more modern microkernel—L4.
L4 is sometimes called a nanokernel because it tries to be as small as possible. It’s very fast, and delegates even more work to servers outside of kernelspace. One of the most interesting features of the L4 port of HURD is how it handles paging.
Flat memory space is my favorite example of a leaky abstraction. A lot of processes actually do care about whether their memory is on disk or in RAM, and would alter their behavior accordingly. A good example is something like Firefox, which fills up available RAM with page caches. Often, swapping these pages back in from disk can be slower than re-fetching from the Internet.
L4/HURD puts paging entirely within the application’s control. When memory is constrained, applications receive a message asking them to free some memory. Something like a web browser could discard some caches, while other apps might swap some pages to disk that they were unlikely to use in the near future.
Another option for a replacement kernel, and one which is receiving a lot more attention, is Coyotos. This is a capability-based system, which fits quite well with the HURD security model. The EROS kernel, of which Coyotos is an evolution, has performance comparable to that of L4. Coyotos is somewhat simpler than EROS, and has the interesting goal of being aimed at being formally verifiable. It’s written in BitC, a Lisp-like language with low-level semantics. With a formally verified microkernel and clean separation of system servers, a Coyotos/HURD system could be very secure and stable.
Can I Use It?
At the moment, HURD boots, and can run GNOME (and nine CDs of other Debian packages). A few things are still missing, such as entropy collection, which is required for secure random-number generation (and most cryptographic algorithms). The project has been making slow progress for 20 years, and is likely to continue. It’s not the most full-featured—or even stable—kernel, but as a research platform it’s very interesting.
Transitioning to a more modern microkernel at the core will make the system considerably faster, so either L4/HURD or Coyotos/HURD is likely to be a very interesting platform for multicore systems before long.
Even in its current state, HURD exists to prove a point: It’s possible to have a complete and usable system running nothing other than GNU code.