3.6 Kernel Page Tables
When the system first starts, paging is not enabled because page tables do not magically initialize themselves. Each architecture implements this differently so only the x86 case will be discussed. The page table initialization is divided into two phases. The bootstrap phase sets up page tables for just 8MiB so that the paging unit can be enabled. The second phase initializes the rest of the page tables. We discuss both of these phases in the following sections.
The assembler function startup_32() is responsible for enabling the paging unit in arch/i386/kernel/head.S. While all normal kernel code in vmlinuz is compiled with the base address at PAGE_OFFSET + 1MiB, the kernel is actually loaded beginning at the first megabyte (0x00100000) of memory. The first megabyte is used by some devices for communication with the BIOS and is skipped. The bootstrap code in this file treats 1MiB as its base address by subtracting _PAGE_OFFSET from any address until the paging unit is enabled. Therefore before the paging unit is enabled, a page table mapping has to be established that translates the 8MiB of physical memory to the virtual address PAGE_OFFSET.
Initialization begins at compile time with statically defining an array called swapper_pg_dir, which is placed using linker directives at 0x00101000. It then establishes page table entries for two pages, pg0 and pg1. If the processor supports the Page Size Extension (PSE) bit, it will be set so that pages that will be translated are 4MiB pages, not 4KiB as is the normal case. The first pointers to pg0 and pg1 are placed to cover the region 1-9MiB; the second pointers to pg0 and pg1 are placed at PAGE_OFFSET+1MiB. This means that, when paging is enabled, they will map to the correct pages using either physical or virtual addressing for just the kernel image. The rest of the kernel page tables will be initialized by paging_init().
After this mapping has been established, the paging unit is turned on by setting a bit in the cr0 register, and a jump takes places immediately to ensure the Instruction Pointer (EIP register) is correct.
The function responsible for finalizing the page tables is called paging_init(). The call graph for this function on the x86 can be seen on Figure 3.4.
Figure 3.4 Call Graph: paging_init()
The function first calls pagetable_init() to initialize the page tables necessary to reference all physical memory in ZONE_DMA and ZONE_NORMAL. Remember that high memory in ZONE_HIGHMEM cannot be directly referenced and that mappings are set up for it temporarily. For each pgd_t used by the kernel, the boot memory allocator (see Chapter 5) is called to allocate a page for the PGD, and the PSE bit will be set if available to use 4MiB TLB entries instead of 4KiB. If the PSE bit is not supported, a page for PTEs will be allocated for each pmd_t. If the CPU supports the PGE flag, it also will be set so that the page table entry will be global and visible to all processes.
Next, pagetable_init() calls fixrange_init() to set up the fixed address space mappings at the end of the virtual address space starting at FIXADDR_START. These mappings are used for purposes such as the local Advanced Programmable Interrupt Controller (APIC) and the atomic kmappings between FIX_KMAP_BEGIN and FIX_KMAP_END required by kmap_atomic(). Finally, the function calls fixrange_init() to initialize the page table entries required for normal high memory mappings with kmap().
After pagetable_init() returns, the page tables for kernel space are now fully initialized, so the static PGD (swapper_pg_dir) is loaded into the CR3 register so that the static table is now being used by the paging unit.
The next task of the paging_init() is responsible for calling kmap_init() to initialize each of the PTEs with the PAGE_KERNEL protection flags. The final task is to call zone_sizes_init(), which initializes all the zone structures used.