Home > Articles > Operating Systems, Server

Virtual Memory in the IA-64 Linux Kernel

This sample chapter explores how the Linux kernel implements its virtual memory system and how it maps to the underlying hardware.
This chapter is from the book

Linux processes execute in a virtual environment that makes it appear as if each process had the entire address space of the CPU available to itself. This virtual address space extends from address 0 all the way to the maximum address. On a 32-bit platform, such as IA-32, the maximum address is 232 - 1 or 0xffffffff. On a 64-bit platform, such as IA-64, this is 264 -1 or 0xffffffffffffffff.

While it is obviously convenient for a process to be able to access such a huge address space, there are really three distinct, but equally important, reasons for using virtual memory.

  1. Resource virtualization. On a system with virtual memory, a process does not have to concern itself with the details of how much physical memory is available or which physical memory locations are already in use by some other process. In other words, virtual memory takes a limited physical resource (physical memory) and turns it into an infinite, or at least an abundant, resource (virtual memory).

  2. Information isolation. Because each process runs in its own address space, it is not possible for one process to read data that belongs to another process. This improves security because it reduces the risk of one process being able to spy on another process and, e.g., steal a password.

  3. Fault isolation. Processes with their own virtual address spaces cannot overwrite each other's memory. This greatly reduces the risk of a failure in one process triggering a failure in another process. That is, when a process crashes, the problem is generally limited to that process alone and does not cause the entire machine to go down.

In this chapter, we explore how the Linux kernel implements its virtual memory system and how it maps to the underlying hardware. This mapping is illustrated specifically for IA-64. The chapter is structured as follows. The first section provides an introduction to the virtual memory system of Linux and establishes the terminology used throughout the remainder of the chapter. The introduction is followed by a description of the software and hardware structures that form the virtual memory system. Specifically, the second section describes the Linux virtual address space and its representation in the kernel. The third section describes the Linux page tables, and the fourth section describes how Linux manages the translation lookaside buffer (TLB), which is a hardware structure used to accelerate virtual memory accesses. Once these fundamental structures are introduced, the chapter describes the operation of the virtual memory system. Section five explores the Linux page fault handler, which can be thought of as the engine driving the virtual memory system. Section six describes how memory coherency is maintained, that is, how Linux ensures that a process sees the correct values in the virtual memory locations it can access. Section seven discusses how Linux switches execution from one address space to another, which is a necessary step during a process context switch. The chapter concludes with section eight, which provides the rationale for some of the virtual memory choices that were made for the virtual memory system implemented on IA-64.

4.1 Introduction to the Virtual Memory System

The left half of Figure 4.1 illustrates the virtual address space as it might exist for a particular process on a 64-bit platform. As the figure shows, the virtual address space is divided into equal-sized pieces called virtual pages. Virtual pages have a fixed size that is an integer power of 2. For example, IA-32 uses a page size of 4 Kbytes. To maximize performance, IA-64 supports multiple page sizes and Linux can be configured to use a size of 4, 8, 16, or 64 Kbytes. In the figure, the 64-bit address space is divided into 16 pages, meaning that each virtual page would have a size of 264/16 = 260 bytes or 1024 Pbytes (1 Pbyte = 250 bytes). Such large pages are not realistic, but the alternative of drawing a figure with several billion pages of a more realistic size is, of course, not practical either. Thus, for this section, we continue to illustrate virtual memory with this huge page size. The figure also shows that virtual pages are numbered sequentially. We can calculate the virtual page number (VPN) from a virtual address by dividing it by the page size and taking the integer portion of the result. The remainder is called the page offset. For example, dividing virtual address 0x40000000000003f8 by the page size yields 4 and a remainder of 0x3f8. This address therefore maps to virtual page number 4 and page offset 0x3f8.

Let us now turn attention to the right half of Figure 4.1, which shows the physical address space. Just like the virtual space, it is divided into equal-sized pieces, but in physical memory, those pieces are called page frames. As with virtual pages, page frames also are numbered. We can calculate the page frame number (PFN) from a physical address by dividing it by the page frame size and taking the integer portion of the result. The remainder is the page frame offset. Normally, page frames have the same size as virtual pages. However, there are cases where it is beneficial to deviate from this rule. Sometimes it is useful to have virtual pages that are larger than a page frame. Such pages are known as superpages. Conversely, it is sometimes useful to divide a page frame into multiple, smaller virtual pages. Such pages are known as subpages. IA-64 is capable of supporting both, but Linux does not use them.

Figure 4.1Figure 4.1. Virtual and physical address spaces.

While it is easiest to think of physical memory as occupying a single contiguous region in the physical address space, in reality it is not uncommon to encounter memory holes. Holes usually are caused by one of three entities: firmware, memory-mapped I/O devices, or unpopulated memory. All three cause portions of the physical address space to be unavailable for storing the content of virtual pages. As far as the kernel is concerned, these portions are holes in the physical memory. In the example in Figure 4.1, page frames 2 and 3 represent a hole. Note that even if just a single byte in a page frame is unusable, the entire frame must be marked as a hole.

4.1.1 Virtual-to-physical address translation

Processes are under the illusion of being able to store data to virtual memory and retrieve it later on as if it were stored in real memory. In reality, only physical memory can store data. Thus, each virtual page that is in use must be mapped to some page frame in physical memory. For example, in Figure 4.1, virtual pages 4, 5, 8, 9, and 11 are in use. The arrows indicate which page frame in physical memory they map to. The mapping between virtual pages and page frames is stored in a data structure called the page table. The page table for our example is shown on the left-hand side of Figure 4.2.

Figure 4.2Figure 4.2. Virtual-to-physical address translation.

The Linux kernel is responsible for creating and maintaining page tables but employs the CPU's memory management unit (MMU) to translate the virtual memory accesses of a process into corresponding physical memory accesses. Specifically, when a process accesses a memory location at a particular virtual address, the MMU translates this address into the corresponding physical address, which it then uses to access the physical memory. This is illustrated in Figure 4.2 for the case in which the virtual address is 0x40000000000003f8. As the figure shows, the MMU extracts the VPN (4) from the virtual address and then searches the page table to find the matching PFN. In our case, the search stops at the first entry in the page table since it contains the desired VPN. The PFN associated with this entry is 7. The MMU then constructs the physical address by concatenating the PFN with the frame offset from the virtual address, which results in a physical address of 0x70000000000003f8.

4.1.2 Demand paging

The next question we need to address is how the page tables get created. Linux could create appropriate page-table entries whenever a range of virtual memory is allocated. However, this would be wasteful because most programs allocate much more virtual memory than they ever use at any given time. For example, the text segment of a program often includes large amounts of error handling code that is seldom executed. To avoid wasting memory on virtual pages that are never accessed, Linux uses a method called demand paging. With this method, the virtual address space starts out empty. This means that, logically, all virtual pages are marked in the page table as not present. When accessing a virtual page that is not present, the CPU generates a page fault. This fault is intercepted by the Linux kernel and causes the page fault handler to be activated. There, the kernel can allocate a new page frame, determine what the content of the accessed page should be (e.g., a new, zeroed page, a page loaded from the data section of a program, or a page loaded from the text segment of a program), load the page, and then update the page table to mark the page as present. Execution then resumes in the process with the instruction that caused the fault. Since the required page is now present, the instruction can now execute without causing a page fault.

4.1.3 Paging and swapping

So far, we assumed that physical memory is plentiful: Whenever we needed a page frame to back a virtual page, we assumed a free page frame was available. When a system has many processes or when some processes grow very large, the physical memory can easily fill up. So what is Linux supposed to do when a page frame is needed but physical memory is already full? The answer is that in this case, Linux picks a page frame that backs a virtual page that has not been accessed recently, writes it out to a special area on the disk called the swap space, and then reuses the page frame to back the new virtual page. The exact place to which the old page is written on the disk depends on what kind of swap space is in use. Linux can support multiple swap space areas, each of which can be either an entire disk partition or a specially formatted file on an existing filesystem (the former is generally more efficient and therefore preferable). Of course, the page table of the process from which Linux "stole" the page frame must be updated accordingly. Linux does this update by marking the page-table entry as not present. To keep track of where the old page has been saved, it also uses the entry to record the disk location of the page. In other words, a page-table entry that is present contains the page frame number of the physical page frame that backs the virtual page, whereas a page-table entry that is not present contains the disk location at which the content of the page can be found.

Because a page marked as not present cannot be accessed without first triggering a page fault, Linux can detect when the page is needed again. When this happens, Linux needs to again find an available page frame (which may cause another page to be paged out), read the page content back from swap space, and then update the page-table entry so that it is marked as present and maps to the newly allocated page frame. At this point, the process that attempted to access the paged-out page can be resumed, and, apart from a small delay, it will execute as if the page had been in memory at all along.

The technique of stealing a page from a process and writing it out to disk is called paging. A related technique is swapping. It is a more aggressive form of paging in the sense that it does not steal an individual page but steals all the pages of a process when memory is in short supply. Linux uses paging but not swapping. However, because both paging and swapping write pages to swap space, Linux kernel programmers often use the terms "swapping" and "paging" interchangeably. This is something to keep in mind when perusing the kernel source code.

From a correctness point of view, it does not matter which page is selected for page out, but from a performance perspective, the choice is critical. With a poor choice, Linux may end up paging out the page that is needed in the very next memory access. Given the large difference between disk access latency (on the order of several milliseconds) and memory access latency (on the order of tens of nanoseconds), making the right replacement choices can mean the difference between completing a task in a second or in almost three hours!

The algorithm that determines which page to evict from main memory is called the replacement policy. The provably optimal replacement policy (OPT) is to choose the page that will be accessed farthest in the future. Of course, in general it is impossible to know the future behavior of the processes, so OPT is of theoretical interest only. A replacement policy that often performs almost as well as OPT yet is realizable is the least recently used (LRU) policy. LRU looks into the past instead of the future and selects the page that has not been accessed for the longest period of time. Unfortunately, even though LRU could be implemented, it is still not practical because it would require updating a data structure (such as an LRU list) on every access to main memory. In practice, operating systems use approximations of the LRU policy instead, such as the clock replacement or not frequently used (NFU) policies [11, 69].

In Linux, the page replacement is complicated by the fact that the kernel can take up a variable amount of (nonpageable) memory. For example, file data is stored in the page cache, which can grow and shrink dynamically. When the kernel needs a new page frame, it often has two choices: It could take away a page from the kernel or it could steal a page from a process. In other words, the kernel needs not just a replacement policy but also a memory balancing policy that determines how much memory is used for kernel buffers and how much is used to back virtual pages. The combination of page replacement and memory balancing poses a difficult problem for which there is no perfect solution. Consequently, the Linux kernel uses a variety of heuristics that tend to work well in practice.

To implement these heuristics, the Linux kernel expects the platform-specific part of the kernel to maintain two extra bits in each page-table entry: the accessed bit and the dirty bit. The accessed bit is an indicator that tells the kernel whether the page was accessed (read, written, or executed) since the bit was last cleared. Similarly, the dirty bit is an indicator that tells whether the page has been modified since it was last paged in. Linux uses a kernel thread, called the kernel swap daemon kswapd, to periodically inspect these bits. After inspection, it clears the accessed bit. If kswapd detects that the kernel is starting to run low on memory, its starts to proactively page out memory that has not been used recently. If the dirty bit of a page is set, it needs to write the page to disk before the page frame can be freed. Because this is relatively costly, kswapd preferentially frees pages whose accessed and dirty bits are cleared to 0. By definition such pages were not accessed recently and do not have to be written back to disk before the page frame can be freed, so they can be reclaimed at very little cost.

4.1.4 Protection

In a multiuser and multitasking system such as Linux, multiple processes often execute the same program. For example, each user who is logged into the system at a minimum is running a command shell (e.g., the Bourne-Again shell, bash). Similarly, server processes such as the Apache web server often use multiple processes running the same program to better handle heavy loads. If we looked at the virtual space of each of those processes, we would find that they share many identical pages. Moreover, many of those pages are never modified during the lifetime of a process because they contain read-only data or the text segment of the program, which also does not change during the course of execution. Clearly, it would make a lot of sense to exploit this commonality and use only one page frame for each virtual page with identical content.

With N processes running the same program, sharing identical pages can reduce physical memory consumption by up to a factor of N. In reality, the savings are usually not quite so dramatic, because each process tends to require a few private pages. A more realistic example is illustrated in Figure 4.3: Page frames 0, 1, and 5 are used to back virtual pages 1, 2, and 3 in the two processes called bash 1 and bash 2. Note that a total of nine virtual pages are in use, but thanks to page sharing, only six page frames are needed.

Figure 4.3Figure 4.3. Two processes sharing the text segment (virtual pages 1 to 3).

Of course, page sharing cannot be done safely unless we can guarantee that none of the shared pages are modified. Otherwise, the changes of one process would be visible in all the other processes and that could lead to unpredictable program behavior.

This is where the page permission bits come into play. The Linux kernel expects the platform-specific part of the kernel to maintain three such bits per page-table entry. They are called the R, W, and X permission bits and respectively control whether read, write, or execute accesses to the page are permitted. If an access that is not permitted is attempted, a page protection violation fault is raised. When this happens, the kernel responds by sending a segmentation violation signal (SIGSEGV) to the process.

The page permission bits enable the safe sharing of page frames. All the Linux kernel has to do is ensure that all virtual pages that refer to a shared page frame have the W permission bit turned off. That way, if a process attempted to modify a shared page, it would receive a segmentation violation signal before it could do any harm.

The most obvious place where page frame sharing can be used effectively is in the text segment of a program: By definition, this segment can be executed and read, but it is never written to. In other words, the text segment pages of all processes running the same program can be shared. The same applies to read-only data pages.

Linux takes page sharing one step further. When a process forks a copy of itself, the kernel disables write access to all virtual pages and sets up the page tables such that the parent and the child process share all page frames. In addition, it marks the pages that were writable before as copy-on-write(COW). If the parent or the child process attempts to write to a copy-on-write page, a protection violation fault occurs. When this happens, instead of sending a segmentation violation signal, the kernel first makes a private copy of the virtual page and then turns the write permission bit for that page back on. At this point, execution can return to the faulting process. Because the page is now writable, the faulting instruction can finish execution without causing a fault again. The copy-on-write scheme is particularly effective when a program does a fork() that is quickly followed by an execve(). In such a case, the scheme is able to avoid almost all page copying, save for a few pages in the stack- or data-segment [9, 64].

Note that the page sharing described here happens automatically and without the explicit knowledge of the process. There are times when two or more processes need to cooperate and want to explicitly share some virtual memory pages. Linux supports this through the mmap() system call and through System V shared memory segments [9]. Because the processes are cooperating, it is their responsibility to map the shared memory segment with suitable permission bits to ensure that the processes can access the memory only in the intended fashion.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020