Fundamental Kernel Data Structures: A First Pass
HP-UX kernel data structures are a key focus of our discussion. We break them down into several key areas: kernel memory tables, process tables, disk space tables, file system tables, and input/output tables. As we explore these various areas, keep your eyes open for similarities in approach and design. Many teams of programmers work on the various modules and subsystems that make up the kernel. They frequently borrow methods and algorithms from one another, and there seems to be a never-ending attempt to tweak and tune them for improved performance. This type of crosspollination helps the kernel mature and improve.
Kernel Memory Tables
A prime concern of the kernel is the management of the system memory resources (see Figure 3-11). Memory comes in many flavors and types: physical, virtual, and logical.
Figure 3-11. Kernel Memory Tables
HP-UX runs on processors that have a 32-bit instruction word size. The primary memory allocation size is called a physical pageframe. On current HP-UX systems, a pageframe is 4096 bytes (or 1024 words); this has been constant for many years. While this reduces the number of on-demand page-in operations required for a process and its threads, it creates challenges for the memory management schemes. We explore this fully in Chapter 6.
Several primary data structures are required to track and map the system's physical memory. The pfdat_ptr[x] array is commonly called the free-page table and is used to keep track of which pages are currently in use by the kernel and which have been assigned to a process. This table is a partitioned table to allow for the mapping of physical memory around holes in the physical memory map. As a general rule, if a table name ends in _ptr, it is most likely a partitioned table.
With the release of HP-UX 11.i, a process may be assigned larger contiguous sets of physical page-frames under a newly introduced Variable Page Size (VPS) feature. This is also called Performance-Optimized Page size (POPs) in some sales and training literature. To accommodate these features, the pfdat_ptr table has been modified to allow the pooling of contiguous free pages into larger views, ponds, and pools of various colors.
Virtual Address Space
The VAS does not reference a physical system entity; instead it is the conceptual memory space onto which the underlying hardware platform (HP PA-RISC) and the kernel must map all potential regions of use. This phantom map is a key concept to master as we study the kernel's theory of operation.
The kernel memory management structures must allow the hardware to map virtual pageframes to the physical page-frames that contain current process code or data. The primary data structure for this task is the htbl2.0[x]. If the needed page-frame is not currently memory-resident, then it is up to the kernel to handle the resulting page fault and get it loaded as soon as possible.
The HP-PA RISC hardware as well as the HP-UX kernel requires this virtual-to-physical page-frame map. The hardware calls this table the page directory (or pdir[x]) and uses its entries, defined as page data entries (or pdes), to update the CPU translation lookaside buffer (TLB). The hardware and kernel names are different to illustrate that the hardware does not specify the use of all the various bits in this structure: the kernel designers use the undefined bits for their own purposes.
On the older 32-bit HP PA-RISCbased systems (called narrow systems), this table is named htbl[x].
The htbl2.0[x] only provides for the mapping of virtual pageframes to physical pageframes. While this is the direction of translation most frequently needed by kernel functions, occasionally there is a requirement to identify which virtual pageframe has been assigned to a particular physical pageframe. This requirement is fulfilled by the pfn_to_virt_ptr[x] table. In addition to this basic feature, it is also used to link alias data structures if they are required. An alias is used if more than one virtual pageframe has been mapped to a single physical page-frame, an important feature allowing copy-on-write semantics during the fork() system call.
Process Logical Memory Space
As a matter of concept, we need to consider a process's view of memory. Linking-loaders create executable image files (for C, the common name is a.out). These files and their headers contain information about which system resources will be needed for the program to run. For a program to run, its page images must be loaded into consecutive pages in the VAS. This is because when the image was created, all references to data and procedure calls were coded as absolute addresses within the process's logical address space.
To facilitate the sharing of process code, dynamic shared library code, shared memory-mapped files, and other shared objects and related consecutive pages in the program's image are said to occupy regions of address space. The mapping of these logical process address regions to kernel-managed virtual memory regions is the job of the kernel's many region data structures.
The region structure contains a database with a page-by-page description that indicates if a page is currently in physical memory, stored as an image on a front-store (an executable program file), stored as an image on a back-store (a swap page), or still awaiting initialization (used for uninitialized data pages, BSS).
Managing Memory for Internal Kernel Usage
So far, we have discussed only the structures used for managing memory for use by the system's many processes. This type of memory management is done at the granularity of the page-frame. Additional structures are used inside the kernel for the allocation of smaller sized blocks of memory to be used by the kernel's many dynamic tables and linked lists. Until the 11.i release, HP-UX utilized the rather classic "kernel bucket" memory allocation scheme. This has been replaced with an "arena" allocation approach. This change was made to improve flexibility, reduce waste, and facilitate page reclamation.
All Together Now
It may seem at first glance that many structures are playing in the same sandbox. To some degree, this is an accurate assessment, and for it to work, all of the tables must play nice together! Each table has been optimized to provide support for a particular piece of the puzzle and must be meticulously managed to avoid system corruption. There are many levels of checks and balances used to maintain the memory management system's integrity.