3.3 Using Page Table Entries
Macros are defined in <asm/pgtable.h>, which is important for the navigation and examination of page table entries. To navigate the page directories, three macros are provided that break up a linear address space into its component parts. pgd_offset() takes an address and the mm_struct for the process and returns the PGD entry that covers the requested address. pmd_offset() takes a PGD entry and an address and returns the relevant PMD. pte_offset() takes a PMD and returns the relevant PTE. The remainder of the linear address provided is the offset within the page. The relationship between these fields is illustrated in Figure 3.1.
The second round of macros determine if the page table entries are present or may be used.
pte_none(), pmd_none() and pgd_none() return 1 if the corresponding entry does not exist.
pte_present(), pmd_present() and pgd_present() return 1 if the corresponding page table entries have the PRESENT bit set.
pte_clear(), pmd_clear() and pgd_clear() will clear the corresponding page table entry.
pmd_bad() and pgd_bad() are used to check entries when passed as input parameters to functions that may change the value of the entries. Whether they return 1 varies between the few architectures that define these macros. However, for those that actually define it, making sure the page entry is marked as present and accessed are the two most important checks.
Many parts of the VM are littered with page table walk code, and it is important to recognize it. A very simple example of a page table walk is the function follow_page() in mm/memory.c. The following is an excerpt from that function. The parts unrelated to the page table walk are omitted.
407 pgd_t *pgd; 408 pmd_t *pmd; 409 pte_t *ptep, pte; 410 411 pgd = pgd_offset(mm, address); 412 if (pgd_none(*pgd) || pgd_bad(*pgd)) 413 goto out; 414 415 pmd = pmd_offset(pgd, address); 416 if (pmd_none(*pmd) || pmd_bad(*pmd)) 417 goto out; 418 419 ptep = pte_offset(pmd, address); 420 if (!ptep) 421 goto out; 422 423 pte = *ptep;
It simply uses the three offset macros to navigate the page tables and the _none() and _bad() macros to make sure it is looking at a valid page table.
The third set of macros examine and set the permissions of an entry. The permissions determine what a userspace process can and cannot do with a particular page. For example, the kernel page table entries are never readable by a userspace process.
The read permissions for an entry are tested with pte_read(), set with pte_mkread() and cleared with pte_rdprotect().
The write permissions are tested with pte_write(), set with pte_mkwrite() and cleared with pte_wrprotect().
The execute permissions are tested with pte_exec(), set with pte_mkexec() and cleared with pte_exprotect(). It is worth noting that, with the x86 architecture, there is no means of setting execute permissions on pages, so these three macros act the same way as the read macros.
The permissions can be modified to a new value with pte_modify(), but its use is almost nonexistent. It is only used in the function change_pte_range() in mm/mprotect.c.
The fourth set of macros examine and set the state of an entry. There are only two bits that are important in Linux, the dirty bit and the accessed bit. To check these bits, the macros pte_dirty() and pte_young() are used. To set the bits, the macros pte_mkdirty() and pte_mkyoung() are used. To clear them, the macros pte_mkclean() and pte_old() are available.