Software Optimization for High Performance Computing: Creating Faster Applications
Some day, on the corporate balance sheet, there will be an entry which reads, “Information”; for in most cases, the information is more valuable than the hardware which processes it.
—Grace Murray Hopper
Many applications perform relatively simple operations on vast amounts of data. In such cases, the performance of a computer’s data storage devices impact overall application performance more than processor performance. Data storage devices include, but are not limited to, processor registers, caches, main memory, disk (hard, compact disk, etc.) and magnetic tape. In this chapter we will discuss performance aspects of memory systems and caches and how application developers can avoid common performance problems. This will be followed by a brief overview of disk file system performance issues.
Suppose for a moment that you are a carpenter. You have a tool belt, a lightweight tool box, a tool chest permanently attached to your vehicle, and a shop that contains more tools and larger machinery. For any particular carpentry job you put a different set of tools into your tool belt, tool box, and tool chest to reduce the number of trips you have to make up and down the ladder to the tool box, walking back and forth to the tool chest and driving to and from the shop. The combination of tools in your tool belt varies with the job, simply because it isn’t practical to carry everything on your belt. The same applies to the tool box, chest, and even the shop. The things you need most often are kept in closer proximity to you.
Computer architectures have adopted an analogous strategy of keeping data close to the processor. Moreover, the distance, measured in processor clocks, to storage devices increases as their capacity increases. The processor’s set of registers are, of course, the closest storage devices. The next closest storage devices are referred to as caches and usually vary in size from a few hundred bytes to several megabytes (MB). Caches are usually made with static random access memory (SRAM) chips. Beyond caches lies the main memory system. Most computer main memory systems are built from dynamic random access memory (DRAM) chips. Some memory systems are built with SRAMs (e.g., the Cray T90), rendering them faster than DRAM, but expensive. At the next level of storage hierarchy is the magnetic disk. Magnetic disks are truly the workhorses of data storage, playing important roles in virtual memory and file systems. As storage devices become larger, they typically are farther away from the processor and the path to them becomes narrower and sometimes more complicated. The typical memory hierarchy and its basic components are illustrated in Figure 3-1.
Figure 3-1. Typical memory hierarchy for modern computers.