Date: Oct 13, 2006
Emulation is used in one form or another increasingly frequently in modern computing environments. David Chisnall takes a look at some of the common mechanisms used.
Emulation is the process of allowing one machine to pretend to be another one. A variety of emulation options are available, depending on the system(s) involved. Which option is right for you and your situation? This article describes some of the possibilities available to you.
System Call Translation
System call translation is a very common form of emulation found in most Free *NIX systems and in some other places. In a UNIX-like system, the kernel handles a set of fairly well-defined system calls. A program marshals its arguments in a particular way, typically placing them in specific processor registers, and then executes a system call instruction. This causes the processor to switch to a higher privilege level and vector into the kernel’s system call handler, which constructs a stack frame and jumps to the function corresponding to the system call number.
The system call numbers for each system are well-defined. Once a system call has been added, changing the number would require all applications that use it to be recompiled, breaking backward compatibility. The numbers are not the same across operating systems, though. The read system call, for example, is number 3 on both Linux and FreeBSD, while the readv call is 120 on FreeBSD and 145 on Linux.
In some situations, it’s useful to be able to run software on FreeBSD that was actually written for Linux. Some proprietary software, for example, is supported only on Linux. Since Linux and FreeBSD both implement the POSIX specification, and both use ELF format binaries, it ought to be easy, but it isn’t.
The first problem comes from the fact that, on x86, Linux uses the MS-DOS system call convention and passes parameters in registers, while FreeBSD uses the UNIX convention, passing them on the stack. But both systems use the same mechanism for entering the kernel (raising interrupt 80h), and they both store the system call number in the EAX register.
Before you can run a Linux binary on FreeBSD, you have to "brand" it. When a Linux-branded binary is running, the kernel uses a small amount of wrapper code when a system call is generated. This approach inspects the system call number as normal, but instead of vectoring into the correct system call handler, it just calls a wrapper function, which then calls the equivalent FreeBSD system call handler.
For many POSIX system calls, this branding is quite easy. Actions such as reading and forking are well-defined, and have the same arguments on any system. For others, it’s less easy. If one of the parameters of a function is a structure that’s not fully defined by the specification, the wrapper may need to translate between Linux and FreeBSD formats before processing the call, and then translate the other way afterward. In the worst case, the functionality may be implemented very differently on the two systems, requiring the wrapper to contain an almost-complete implementation of the call.
This form of emulation is typically very fast. The time spent performing the translation is usually 1–2% of the time spent completing the system call.
Very few programs actually make system calls directly. Most call the C standard library instead, which then issues the calls. If the program can be persuaded to use a native version of the library, this can provide a more cohesive experience. An example is WINE, which allows Windows programs to be run in *NIX.
Simple system call translation isn’t really an option in this case; the Windows kernel is very different from a UNIX-like system, so a large translation layer would be needed. Windows programs also interact differently with the system. Most graphical *NIX programs use X11, which has a well-defined protocol for communicating with the screen. Since it was designed to be network-transparent, it’s fairly loosely coupled, so it doesn’t matter to the X server whether the drawing commands come from a native or an emulated client (or even from a client running on a different machine on the network).
Windows applications don’t use X11, however; they use the Windows GDI or DirectX. While it might be technically possible—although a lot of effort—to create a system-call translation layer on a system like Linux that would allow running the Windows GUI, it wouldn’t be very helpful, because it wouldn’t integrate at all with the rest of the system.
Instead, the WINE project moves the emulation up the stack, providing replacements for most of the libraries that ship with Windows. When a Windows program wants to draw something onscreen, it calls a function defined in gdi32.dll. On a Windows system, this will then issue calls to the display driver and produce the desired effect. On *NIX, these calls go to the WINE version of gdi32.dll, which translates them into X11 drawing calls. Similar translations happens in other parts of the API.
As with system call translation, this arrangement can result in the code running faster than on the native platform if the new implementation is more efficient than the old one.
Moving in the opposite direction, you get CPU emulation. This is perhaps the best-known form of emulation. The full name of the WINE project, WINE Is Not an Emulator, comes from a desire to distance itself from this kind of emulation, which is typically quite slow.
CPU emulation generally falls into two categories: pure CPU emulation and full system emulation. The latter category is very popular among classic gaming enthusiasts. Emulating an entire system is a complicated undertaking, and doing it quickly is even more difficult. One of the best products for this purpose was Connectix Virtual PC for the Mac. (Note that this was a very different product from Virtual PC for Windows, in spite of the similar names.) This was purchased by Microsoft, which did very little with it.
The x86 instruction set is widely regarded as a convoluted mess. Many instructions have all sorts of side-effects that are sometimes useful, but often completely ignored. Performing most arithmetic instructions, for example, will set several status bits. For a completely accurate emulation, the values of these status bits must be set. This can mean that a single x86 instruction requires a string of Power PC instructions to achieve the same results. One trick that Virtual PC employed was to analyze a sequence of x86 instructions, and not bother computing the values of status bits that were never checked.
In some cases, you don’t need to emulate an entire system. If you want to run an x86 Linux application on Power PC Linux, emulating a CPU, graphics card, hard disk, and so on and then running a second copy of Linux on top of it would be an enormous waste of effort, since you would end up duplicating a lot of running code. A similar situation occurs when running Power PC applications in OS X on an Intel processor, or m68k applications on a Power PC Macintosh.
In this case, the problem is conceptually quite similar to the kind of situation that requires system call translation, but in reverse. The system calls—and even many of the library calls—are the same, but the program calling them cannot run on the native machine. In this case, the solution is a cut-down emulator that only emulates the CPU. Every time the emulated program calls a standard library function or makes a system call, the emulator catches this call and passes it to the native implementation for processing. This setup can provide a significant speed boost; a lot of applications spend a lot of their CPU time inside things like standard GUI libraries, so allowing these to be run natively is a improves speed.
In the Free Software world, the most popular example of this form of emulation is QEMU, which allows Linux binaries compiled for one architecture to be run on another, passing the system calls to the native kernel.
Although it’s not really emulation, virtualization deserves a mention here. Whereas emulation is the process of one machine pretending to be of another type, virtualization is the process of one machine pretending to be two or more of the same type.
The x86 architecture is believed to be one of the most difficult to virtualize. Actually, this isn’t entirely true; starting with the 80386, all x86 CPUs have supported virtualization. The virtual 8086 mode allows the creation of independent virtual 8086 machines. This capability was used in Windows 3.x to allow MS-DOS applications to be run at once. These days, it isn’t particularly useful; very few programs that would run on an 8086 are still around, and those that are would run much faster in a full system emulator than they ever did on a real 8086.
The problem with the x86 instruction set is that it contains a small number of instructions that cannot be trapped. When attempting to virtualize something like IBM’s POWER architecture, you can run the guest OS at a lower privilege level than normal. Trying to execute a privileged instruction, however, causes a trap that’s caught by the hypervisor, which then emulates the instruction. With x86, there are a small number of instructions for which this isn’t possible.
The two common approaches to avoid this problem are binary rewriting and para-virtualization. Binary rewriting, as used by VMware, scans the instruction stream and replaces any occurrences of these instructions with calls to the virtualizer’s replacements. Para-virtualization is similar, but performs the replacement at compile time; the guest OS must be modified to call special hypervisor functions, rather than the standard privileged instructions. This is the approach used by Xen.
A virtualized system must perform some emulation; the privileged instructions are emulated even if the unprivileged ones are not. Usually, however, virtualization extends beyond the CPU. It’s useful for every virtual machine to have a network interface, for example. Unfortunately, your host machine only has one. Here you start to have a problem; most network interfaces were not designed with virtualization in mind. The simple solution is to provide every guest with an emulated NIC and multiplex them in the host. This technique is relatively easy on any *NIX system that already supports virtual interfaces (most do).
The situation is somewhat more complex when it comes to graphics. Most UNIX-like systems do some form of graphics virtualization already, in the form of virtual terminals. You can even run a GUI on some of these. If you’ve tried this, you may have noticed that when you switch from a text terminal to a terminal running X, the screen flickers quite a few times—something that doesn’t happen between text terminals. The reason is that X has absolutely no idea what the current state of the video hardware is. Worse, many graphics cards don’t have well-documented ways of switching between all of their supported modes. The solution is to drop back to a very simple mode, redo the initialization of the more complex graphics modes, and then redraw the screen. While this should be done transparently, it’s impossible because most graphics cards don’t have any mechanism for retrieving and restoring the internal state.
It is theoretically possible in a system like Xen to write virtualization-aware graphics drivers, but it wouldn’t be a trivial undertaking. The Vista driver model also requires graphics cards to support virtualization, so this should be available in the Windows world shortly after the GNU HURD port of Duke Nukem Forever is released.
You Think That’s a Computer You’re Using?
Emulation is in a lot of places in a modern computer. At the trivial level, things like OpenGL and DirectX emulate missing functionality on graphics hardware. At the more advanced level, entire operating systems or computers are emulated for legacy support.
This situation is likely to grow, as even mainstream operating systems gain virtualization support. Eventually it’s possible that running in a completely virtualized environment will become the norm. Even today, it’s possible to keep a Xen virtual machine image on a USB flash drive and use it on any Xen-supported system where you need to work.