Home > Articles > Software Development & Management

This chapter is from the book

3.2 Caches

Registers are the closest user accessible data storage areas to the functional units. Typically, CISC and RISC processors have fewer than 100 registers and hence these don’t contain much data. Memory is very far from the processor in terms of processor clocks, so intermediate storage areas are desired. Cache is a type of memory located between the processor and main memory.

The largest data caches on high performance processors are usually a few megabytes in size. Cache may be on-chip (physically part of the processor chip) or off-chip.

3.2.1 Cache Line

A cache line is the smallest unit of data that can be transferred to or from memory. A cache line may contain more than one datum. These are the blocks of data that are brought into the cache from memory. Cache lines are usually between 32 and 128 bytes. Since memory is many times larger than cache, multiple addresses from memory MUST map to the same cache line. To not do this would mean that only a small portion of memory could utilize the cache.

There are advantages and disadvantages to having longer (i.e., larger) cache lines. Measured in clocks, the overhead in identifying which cache line to move is likely to be the same regardless of line size. So, if we assume that the time per byte of data is the same, then the resulting bandwidth (bytes/second) will be better. Hence, processors with longer cache lines typically have higher memory bandwidth performance. Long cache lines are good for algorithms that access memory with unit stride accesses since each element of the cache line is used. On the other hand, if an algorithm accesses memory in a random manner (e.g., indirect accesses), then computers with short cache lines may perform better, since the algorithm may access only a single datum in any given cache line.

3.2.2 Cache Organization

Generally speaking, caches are organized so that a cache line can be placed in a certain set of locations in the cache. If there are multiple sets, then the cache is referred to as being set associative. Usually the cache is organized so that any cache line from memory can map to one cache line in any of n sets. This is referred to as a n -way set associative cache. Fully associative caches are those where a cache line can be placed anywhere in the cache.

The simplest (and cheapest) type of cache is direct mapped, which is actually a one-way set associative cache. If the cache size is one MB, then any two locations in memory that are one MB apart map to the same cache line in the cache. For example, data located at hexadecimal addresses 0x0010000, 0x00200000, 0x00300000, ... , 0xFFF00000 are multiples of one MB apart and all must use the same location in a direct mapped cache. An improvement on the direct mapped cache is a two-way associative cache. When a cache line is brought into cache, it is placed in one, but not both, of the two sets. If the cache line and total cache size are held constant, a two-way set associative cache will experience less cache thrashing than a direct mapped cache. Increasing the associativity usually continues to increase performance, but the additional benefit decreases beyond a certain way of associativity.

A fully associative cache is the most advanced (and expensive) type of cache. Due to the complexities involved and resulting expense, these caches tend to be small in size.

3.2.3 Cache Mechanisms

Cache line replacement strategy and write policies are two important performance issues relating to caches. Strategies for cache line replacement are manifested in n -way set associative caches for n > 1. Users don’t have to worry about this functionality with direct mapped caches because the decision is already made for you! Some processors update the cache line in the set that was Least Recently Used (LRU), while others randomly choose which one to update or use round-robin replacement. There are other replacement strategies, but these are actually the two most common and they represent both ends of the performance spectrum. LRU usually performs better than random replacement, but it is more difficult to implement.

When a processor performs a store instruction, it typically writes data into the cache-resident line containing the address. In general, there are two policies for dealing with the cache line subsequent to its having data written into it. In the first policy, the data is written to both the cache line in the cache and to main memory. This is referred to as write through since the write is made to main memory through the cache. The second policy, which is much more common, is referred to as write back. In this case, when a processor executes a store instruction, the data is written only to the cache line in the cache. This modified cache line is written to memory only when necessary (usually when it is replaced in the cache by another cache line). Note that this write back may occur much later than the actual store instruction’s execution. As a result, write back caches usually make fewer memory transactions, which is a good thing as we shall see later. All caches discussed in this chapter are assumed to be write back caches.

3.2.4 Cache Thrashing

The main problem with direct mapped caches is that severe cache thrashing can occur. To illustrate what is meant by cache thrashing, consider the following:

REAL*8 X(*), Y(*) 
... 
DO I = 1, N 
  Y(I) = X(I) + Y(I) 
ENDDO 

This requires loading X(I), loading Y(I), adding X(I) and Y(I), and storing Y(I) for each iteration (value of I).

Suppose we are executing this loop on a machine with a single, direct mapped, 1 MB cache and that a cache line is 32 bytes in length. If X and Y are arrays of eight-byte data (e.g., data declared as REAL*8 in Fortran) then X and Y could be allocated as illustrated in Figure 3-2. That is, X and Y are a multiple of the cache size apart in memory. On the first iteration elements X(1) through X(4) are loaded into cache and X(1) is loaded into a register. Note that operations in Figure 3-2 that force cache lines to be moved between the cache and memory are highlighted in bold. Then Y(1) through Y(4) are loaded into cache from memory and Y(1) is loaded into a register. Note that the cache line containing Y(1) displaces the cache line contain-ing X(1) through X(4). Completing the iteration, X(1) and Y(1) are added and Y(1) is stored. On the second iteration, X(2) must be loaded. This requires that elements X(1) through X(4) be again loaded into cache. But this displaces the cache line containing Y(1) and hence forces that line to be stored to memory (because Y(1) has been modified) before the line containing X(1) through X(4) can be loaded into the cache. Each load of X and each load of Y requires that the data be moved to and from memory. In this case, the cache is nearly useless. This process of repeatedly displacing and loading cache lines is referred to as cache thrashing.

03fig02.gifFigure 3-2. Example of cache thrashing.


If the cache happened to be a two-way set associative cache, then most, if not all, of this thrashing would be eliminated. This is because the cache line for X would likely reside in one set of the cache and the cache line for Y in the other. Both round-robin and least-recently-used replacement strategies would address this situation very well. A random replacement approach might take a few iterations before getting it right. That is, since the set number to be replaced is generated randomly, the result could easily be that the first set is replaced multiple times before selecting the second set (or vice-versa).

The previous example was derived from an example where the arrays X and Y were declared as follows:

REAL*8 X(7340032), Y(7340032) 

Note that 7340032 in hexadecimal is 0x700000, which is seven MB, an integral multiple of the cache size.

If the first array had been “padded” so that it was not a multiple of the cache size apart, then this cache thrashing could have been avoided. Suppose X and Y were declared as follows:

REAL*8 X(7340036), Y(7340036) 

Then many memory transactions will be eliminated. Note that the padding is for four elements. Since each element is eight bytes, this is effectively padding the array by one cache line (32 bytes = 4 × 8 bytes). With the arrays declared as above, we then have a sequence of operations as outlined in Figure 3-3. Not only does the second iteration not require any memory transactions, neither does iteration 3 or 4! At the beginning of iteration five, the cache line containing Y(1) through Y(4) will have to be stored to memory, so we will have roughly three memory transactions for every four iterations. Compare this to the previous example where we had three memory transactions for every single iteration.

03fig03.gifFigure 3-3. Cache thrashing eliminated by padding arrays.


To demonstrate the difference in performance, the example above was executed on an HP V-Class computer with a two MB direct-mapped cache. With N set to 7340032 in both cases, the resulting times to execute the loop are shown in Table 3-1.

Table 3-1. Performance of Y(I) = X(I) + Y(I) for Loop with Fixed Length.

Array Dimension

Elapsed Time (sec.)

7340032

7.21

7340036

3.52

3.2.5 Caching Instructions Versus Data

One compelling reason for using direct mapped caches is that they are easier to implement than associative caches. One benefit of this is that processors can access direct mapped caches faster than associative caches. This can be extremely important for data as well as instructions. It may not matter how much data you have in registers if the processor continuously stalls waiting on instructions.

The area of memory reserved for the executable instructions of a program is referred to as the text area. Nearly all workloads require more megabytes of data than they do of text. Some architectures use a single cache for both instructions and data. Such caches are generally referred to as unified caches. One can easily imagine a situation where a series of instructions that are repeatedly executed (e.g., a loop) just happen to map to the same cache line(s) as the data. This sort of cache thrashing is very difficult to identify, not to mention remedy. Modern computer architectures have alleviated this problem in two different ways. One is to have associative unified caches and another is to have the instruction cache separate from the data cache. The latter solution is to literally have two, independent caches. Unified associative caches still run the risk of text displacing data (and vice versa), but having separate instruction cache and data cache eliminates this problem. So, what is becoming the most common implementation in the industry? A combination of the two, of course! Most new architectures feature separate I and D caches that are small and very close to the processor with another, larger unified cache that is a little farther away. This brings us to the next topic.

3.2.6 Multiple Levels of Caches

It’s already been established that faster memory is more expensive than slower memory. In particular, cache is much faster and more expensive than main memory. There is so much truth in this that one sometimes wonders if caches should be spelled c-a-s-h! Given the advantages of cache to today’s processor performance, it is easy to see how an intermediate level of cache might be beneficial and economically feasible. Consider the Compaq AlphaServer DS20 machine which has a 500 MHz Alpha 21264 processor with a 64 KB instruction cache and 64 KB data cache on chip and an intermediate (or secondary) cache that is four MB in size. The primary benefit of such caches is, of course, to reduce the average time per access to memory.

Suppose we have a workload which we have studied and we are able to produce a rough estimate of how often memory accesses occur as a function of the data storage available. Table 3-2 is a representative listing of this information.

Table 3-2. Memory Accesses Relative to Data Storage Available for a Synthetic Workload.

Size Range

Cumulative Percentage of Memory Accesses

Percentage of Accesses in This Range

0 - 4 KB

72%

72%

4 - 16 KB

83%

11%

16 - 64KB

87%

4%

64 - 256 KB

90%

3%

256 KB - 1 MB

92%

2%

1 - 4 MB

95%

3%

4 - 16 MB

99%

4%

16 - 64 MB

100%

1%

Now consider three different cache architectures with the same processor and memory system. Suppose they have cache structure and attributes as shown in Table 3-3. Note that only Architecture 3 has two caches.

Table 3-3. Data Storage Sizes and Associated Access Times for Hypothetical Architectures.

Architecture

L1 Cache Size

L1 Access Time (Cycles)

L2 Cache Size

L2 Access Time (Cycles)

1

1 MB

3

None

N/A

2

4 MB

5

None

N/A

3

16 KB

1

4 MB

6

If we assume it is 100 clocks from the processor to main memory for all three architectures, then, using the data in Table 3-2 and Table 3-3, we can produce the expected number of clock cycles for an average data access for all three architectures.

For architecture 1, 92% of the accesses are in one MB of data storage or less. So:

Expected latency for architecture 1 = 92% × 3 cycles + 8% × 100 cycles = 10.76 cycles.

Now, architecture 2 has a four MB cache and so 95% of the accesses are estimated to occur within it. Thus:

Expected latency for architecture 2 = 95% × 5 cycles + 5% × 100 cycles = 9.75 cycles.

So, all else being equal, architecture 1 is likely to be faster for this application than architecture 2 even though it has a smaller cache! The remaining cache architecture has the following:

Expected latency for architecture 3 = 83% × 1 cycle + 12% × 6 cycles + 5% × 100 cycles = 6.55 cycles

What an improvement! Even though the primary cache is very small and the secondary cache is the same size and slower than that of architecture 2, it is much faster than either of the other cache architectures.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020