Home > Articles > Operating Systems, Server

This chapter is from the book

4.6 Memory Coherency

A given memory location is said to be coherent if all CPUs and I/O devices in a machine observe one and the same value in that location. Modern machines make aggressive use of memory caches and thereby introduce many potential sources for incoherence. For example, to maximize performance, caches usually delay stores by the CPU as long as possible and write them back to main memory (or the next level in the cache hierarchy) only when absolutely necessary. As a result, the same memory location may have different values in the various caches or in main memory. Modern machines also often use separate caches for instructions and data, thereby introducing the risk of the same location being cached both in the instruction and data cache, but with potentially different values.

A particularly insidious source of incoherence arises from virtually-tagged caches that tag the cache contents not with the memory location (physical address) but with the virtual address with which the location was accessed. Consider the scenario where the same location is accessed multiple times with different virtual addresses. This is called virtual aliasing because different virtual addresses refer to the same memory location. Linux frequently does this and with virtually-tagged caches, the memory location may end up being cached multiple times! Moreover, updating the memory location through one virtual address would not update the cache copies created by the aliases, and we would again have a situation where a memory location is incoherent.

I/O devices are yet another source of incoherence: a device may write a memory location through DMA (see Chapter 7, Device I/O) and this new value may or may not be observed by the memory caches.

We would like to emphasize that, by itself, an incoherent memory location is not an issue. A problem arises only if an incoherent value is observed, e.g., by a CPU or an I/O device. When this happens, the result is usually catastrophic. For example, suppose we are dealing with a platform that uses separate instruction and data caches. If the operating system reads a page of code from the text section of an executable file and copies it to the user space of a process, the data cache will be updated but the instruction cache will remain unchanged. Thus, when the process attempts to execute the newly loaded code, it may end up fetching stale instructions and the process may crash. To avoid such problems, memory locations must be made coherent before a stale value can be observed. Depending on the platform architecture, maintaining coherency may be the responsibility of hardware or software. In practice, the two often share the responsibility, with the hardware taking care of certain sources of incoherence and the software taking care of the rest.

4.6.1 Maintenance of coherency in the Linux kernel

To accommodate the wide variety of possible memory coherence schemes, Linux defines the interface shown in Figure 4.38. Every platform must provide a suitable implementation of this interface. The interface is designed to handle all coherence issues except DMA coherence. DMA is handled separately, as we see in Chapter 7, Device I/O.

Figure 4.38Figure 4.38. Kernel interface to maintain memory coherency.

The first routine in this interface is flush_cache_all(). It must ensure that for data accessed through the kernel address space, all memory locations are coherent. Linux calls this routine just before changing or removing a mapping in the page-table-mapped kernel segment or the kmap segment. On platforms with virtually-tagged caches, the routine is usually implemented by flushing all data caches. On other platforms, this routine normally performs no operation.

The second routine, flush_icache_range(), ensures that a specific range of memory locations is coherent with respect to instruction fetches and data accesses. The routine takes two arguments, start and end. The address range that must be made coherent extends from start up to and including end - 1. All addresses in this range must be valid kernel or mapped user-space virtual addresses. Linux calls this routine after writing instructions to memory. For example, when loading a kernel module, Linux first allocates memory in the page-table-mapped kernel segment, copies the executable image to this memory, and then calls flush_icache_range() on the text segment to ensure the d_caches and i_caches are coherent before attempting to execute any code in the kernel module. On platforms that do not use separate instruction or data caches or that maintain coherency in hardware, this routine normally performs no operation. On other platforms, it is usually implemented by flushing the instruction cache for the given address range.

The next three routines are all used by Linux to inform platform-specific code that the virtual-to-physical translation of a section of a (user) address space is about to be changed. On platforms with physically indexed caches, these routines normally perform no operation. However, in the presence of virtually indexed caches, these routines must ensure that coherency is maintained. In practice, this usually means that all cache lines associated with any of the affected virtual addresses must be flushed from the caches. The three routines differ in the size of the section they affect: flush_cache_mm() affects the entire address space identified by mm structure pointer mm; flush_cache_range() affects a range of addresses. The range extends from start up to and including end -1, where start and end are both user-level addresses and are given as the second and third arguments to this routine. The third routine, flush_cache_page(), affects a single page. The vm-area that covers this page is identified by the vma argument and the user-space address of the affected page is given by argument addr. These routines are closely related to the TLB coherency routines flush_tlb_mm(), flush_tlb_range(), and flush_tlb_page() in the sense that they are used in pairs. For example, Linux changes the page table of an entire address space mm by using code that follows the pattern shown below:

flush cache mm( mm);
... change page table of mm...
flush tlb mm( mm);

That is, before changing a virtual-to-physical translation, Linux calls one of the flush_cache routines to ensure that the affected memory locations are coherent. In the second step, it changes the page table (virtual-to-physical translations), and in the third step, it calls one of the flush_tlb routines to ensure that the TLB is coherent. The order in which these steps occur is critical: The memory locations need to be made coherent before the translations are changed because otherwise the flush routine might fault. Conversely, the TLB must be made coherent after the translations are changed because otherwise another CPU might pick up a stale translation after the TLB has been flushed but before the page table has been fully updated.

The three routines just discussed establish coherence for memory locations that may have been written by a user process. The next three routines complement this by providing the means to establish coherence for memory locations written by the kernel.

The first routine is flush_dcache_page(). The Linux kernel calls it to notify platform-specific code that it just dirtied (wrote) a page that is present in the page cache. The page is identified by the routine's only argument, pg, which is a pointer to the page frame descriptor of the dirtied page. The routine must ensure that the content of the page is coherent as far as any user-space accesses are concerned. Because the routine affects coherency only in regards to user-space accesses, the kernel does not call it for page cache pages that cannot possibly be mapped into user space. For example, the content of a symbolic link is never mapped into user space. Thus, there is no need to call flush_dcache_page() after writing the content of a symbolic link. Also, note that this routine is used only for pages that are present in the page cache. This includes all non-anonymous pages and old anonymous pages that have been moved to the swap cache.

Newly created anonymous pages are not entered into the page cache and must be handled separately. This is the purpose of clear_user_page() and copy_user_page(). The former creates an anonymous page, which is cleared to 0. The latter copies a page that is mapped into user space (e.g., as a result of a copy-on-write operation). Both routines take an argument called pg as their last argument. This is a pointer to the page frame descriptor of the page that is being written. Apart from this, clear_user_page() takes two other arguments: to and uaddr. The former is the kernel-space address at which the page resides, and uaddr is the user-space address at which the page will be mapped (because anonymous pages are process-private, there can be only one such address). Similarly, copy_user_page() takes three other arguments: from, to, and uaddr. The first two are the kernel-space addresses of the source and the destination page, and uaddr is again the user-space address at which the new page will be mapped. Why do these two routines have such a complicated interface? The reason is that on platforms with virtually indexed caches, it is possible to write the new page and make it coherent with the page at uaddr without requiring any explicit cache flushing. The basic idea is for the platform-specific code to write the destination page not through the page at address to, but through a kernel-space address that maps to the same cache lines as uaddr. Because anonymous pages are created frequently, this clever trick can achieve a significant performance boost on platforms with virtually indexed caches. On the other hand, on platforms with physically indexed caches, these operations normally perform no operation other than clearing or copying the page, so despite having rather complicated interfaces, the two routines can be implemented optimally on all platforms.

4.6.2 IA-64 implementation

The IA-64 architecture guarantees that virtual aliases are supported in hardware but leaves open the possibility that on certain CPUs there may be a performance penalty if two virtual addresses map to the same memory location and the addresses do not differ by a value that is an integer multiple of 1 Mbyte. This implies that Linux can treat IA-64 as if all caches were physically indexed, and hence flush_cache_all(), flush_cache_mm(), flush_cache_range(), and flush_cache_page() do not have to perform any operation at all. To avoid the potential performance penalty, Linux/ia64 maps shared memory segments at 1-Mbyte–aligned addresses whenever possible.

While IA-64 generally requires that coherence is maintained in hardware, there is one important exception: When a CPU writes to a memory location, it does not have to maintain coherence with the instruction caches. For this reason, the IA-64 version of flush_icache_range() must establish coherence by using the flush cache (fc) instruction. This instruction takes one address operand, which identifies the cache line that is to be written back (if it is dirty) and then is evicted from all levels of the cache hierarchy. This instruction is broadcast to all CPUs in an MP machine, so it is sufficient to execute it on one CPU to establish coherence across the entire machine. The architecture guarantees that cache lines are at least 32 bytes in size, so the routine can be implemented by executing fc once for every 32-byte block in the address range from start to end -1. Care must be taken not to execute fc on an address that is outside this range; doing so could trigger a fault and crash the kernel.

Is implementing flush_icache_range() sufficient to guarantee coherence between the data and instruction caches? Unfortunately, the answer is no. To see this, consider that Linux calls this routine only when it positively knows that the data it wrote to memory will be executed eventually. It does not know this when writing a page that later on gets mapped into user space. For this reason, flush_dcache_page(), clear_user_page(), and copy_user_page() logically also must call flush_icache_range() on the target page. Given that there are usually many more data pages than code pages, this naive implementation would be prohibitively slow. Instead, Linux/ia64 attempts to delay flushing the cache with the following trick: The Linux kernel reserves a 1-bit field called PG arch 1 in every page frame descriptor for platform-specific purposes. On IA-64, this bit indicates whether or not the page is coherent in the instruction and data caches. A value of 0 signifies that the page may not be coherent and a value of 1 signifies that the page is definitely_coherent. With this setup, flush_dcache_page(), clear_user_page(), and copy_user_page() can be implemented so that they simply clear the PG arch 1 bit of the dirtied page. Of course, before mapping an executable page into user space, the kernel still needs to flush the cache if the PG arch 1 bit is off. Fortunately, we can use the platform-specific update_mmu_cache() for this purpose. Recall from Section 4.4.2 that this routine is called whenever a translation is inserted or updated in the page table. On IA-64, we can use this as an opportunity to check whether the page being mapped has the execute permission (X) bit enabled. If not, there is no need to establish coherency with the instruction cache and the PG arch 1 bit is ignored. On the other hand, if the X bit is enabled and the PG arch 1 bit is 0, coherency must be established by a call to flush_icache_range(). After this call returns, the PG arch 1 bit can be turned on, because it is now known that the page is coherent. This approach ensures that if the same page is mapped into other processes, the cache flush does not have to be repeated again (assuming it has not been dirtied in the meantime).

There is just one small problem: Linux traditionally maps the memory stack and memory allocated by the brk() system call with execute permission turned on. This has a devastating effect on the delayed i_cache flush scheme because it causes all anonymous pages to be flushed from the cache even though usually none of them are ever executed. To fix this problem, Linux/ia64 is lazy about turning on the X bit in PTEs. This works as follows: When installing the PTE for a vm-area that is both writable and executable, Linux/ia64 does not turn on the X bit. If a process attempts to execute such a page, a protection violation fault is raised by the CPU and the Linux page fault handler is invoked. Because the vm-area permits execute accesses, the page fault handler simply turns on the X bit in the PTE and then updates the page table. The process can then resume execution as if the X bit had been turned on all along. In other words, this scheme ensures that for vm-areas that are mapped both executable and writable, the X bit in the PTEs will be enabled only if the page experiences an execute access. This technique has the desired effect of avoiding practically all cache flushing on anonymous pages.

The beauty of combining the delayed i_cache flush scheme with the lazy execute bit is that together they are able to not just delay but completely avoid flushing the cache for pages that are never executed. As we see in Chapter 7, Device I/O, the effectiveness of this pair is enhanced even further by the Linux DMA interface because it can be used to avoid the need to flush even the executable pages. The net effect is that on IA-64 it is virtually never necessary to explicitly flush the cache, even though the hardware does not maintain i_cache coherence for writes by the CPU.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020