A Privileged Position
The hypervisor's job is to allocate resources to guest domains, to protect guest domains from each other, to provide clean, portable device interfaces, and, if necessary, to provide a virtual machine that can be efficiently virtualized. To do all this it must occupy a privileged position on the system.
In a traditional, nonvirtualized system, the operating system must occupy a privileged position relative to user level applications. Much like a police officer, it uses this privileged position to protect user level applications from each other. To enable this, most processor architectures have at least two privilege levels. The operating system then runs at a higher priority level than user level code, allowing it to force user level applications to "follow the rules."
In a virtualized system, we have the hypervisor as well as guest operating systems and user level applications. Just as an operating system arbitrates between multiple user level applications, the Xen hypervisor arbitrates between guest operating systems. Thus the Xen hypervisor must run at a higher privilege level than the guest operating systems. However, within the guest domains, we still want the operating system to run at a higher privilege level that the user level applications.
Fortunately, the x86 architecture provides more than two privilege levels. In fact, the x86 architecture has four privilege levels called protection rings. Ring 0 is the most privileged, and ring 3 is the least privileged. In a traditional, nonvirtualized system, the operating system executes in ring 0 and the user level applications in ring 3 with rings 1 and 2 typically going unused. In a Xen system, the hypervisor executes in ring 0, guest operating systems in ring 1, and user level applications remain in ring 3.
These protection rings give the hypervisor the leverage it needs to enforce resource sharing and isolation among guest domains. In ring 0, the hypervisor has full control of the physical hardware. Guest domains access the physical hardware only as allowed and coordinated by the hypervisor. All resources needed by guest domains are granted by the hypervisor.
Guest domains make requests of the hypervisor through a set of hypercalls much like user level applications make requests to the operating system with a set of system calls. On x86, system calls are typically done with a software interrupt instruction, int 0x80. Similarly, hypercalls are done with the instruction int 0x82. The hypervisor responds to requests by sending the domain an asynchronous event much like a UNIX signal or an interrupt on real hardware. Here is what an example hypercall might look like in C code:
hypercall_ret = xen_op(operation, arg1, arg2, arg3, arg4);
Listing 3.1 shows the resulting assembly language for the xen_op routine that sets up the parameters for the Xen hypercall and then actually fires the Xen interrupt.
Listing 3.1. Assembly for a Xen Hypercall
_xen_op: mov eax, 4(esp) mov ebx, 8(esp) mov ecx, 12(esp) mov edx, 16(esp) mov esi, 20(esp) int 0x82 ret
With hardware support for virtualization such as Intel's VT-x and AMD's AMD-V extensions, these additional protection rings become less critical. These extensions provide root and nonroot modes that each have rings 0 through 3. The Xen hypervisor can run in root mode while the guest OS runs in nonroot mode in the ring for which it was originally intended.