Storage Hardware Overview
A basic understanding of server storage hardware is essential for any DBA who wants to effectively configure and administer the Database Engine of an instance of SQL Server 2012. The most elementary metric of hardware storage performance is IOPS, meaning input/output operations per second. Any elementary metric of performance is “throughput,” which is the amount of megabytes per second that an I/O subsystem can read or write in a sustained fashion.
This section discusses the fundamental server hardware components used for storage and I/O and how those components affect SQL Server performance, scalability, and cost. Storage performance and scalability is usually balanced against what the budget allows. There are always trade-offs between what is best and what is the best that the checkbook can afford.
The hardware storage subsystem is based on these fundamental components:
- Hard Disks
- RAID (Redundant Arrays of Inexpensive Disks)
- Disk Controllers and Host Bust Adapters (HBA)
- Network Attached Storage (NAS), Storage Area Networks (SAN), and Logical Units (LUN)
- Solid State Drives (SSD)
The following section addresses the highlights concerning each of the preceding concepts:
Understanding Hard Disks
Hard disks have been with us for many decades and are familiar to most readers. Disks, also known as spindles, have natural limitations on their performance. Disks are essentially a mechanical spinning platter with a moving armature and read/write head that moves over the surface of the platter as needed to read and write data. Naturally, physics limits the speed that a mechanical device like this can perform reads and writes. Plus, the further apart the needed data might be on the surface of the disk, the further the armature has to travel, the longer it takes to perform the I/O operation. That’s why defragmentation, or the process of putting related data back together in contiguous disk sectors, is so important for hard disk-based I/O subsystems.
Depending on the speed of rotation on the platter and the “seek” speed of the armature, a hard disk is rated for the number of I/Os it can perform per second, called IOPS. For example, a modern, current generation hard disk might sustain ~200 IOPS with a seek speed of ~ 3.5ms. A bit of math quickly reveals that a SQL Server with a single hard disk would be overwhelmed by an application that needed to do any more than a couple hundred transactions per second. (And don’t forget that Windows needs to do I/O of its own!) So, the typical solution for high-performance database applications is to add more hard disks.
Time spent waiting for an I/O operation to complete is known as latency. Latency is usually very low for sequential operations—that is, an I/O that starts at one sector on the platter and moves directly to the next sector on the platter. Latency is usually higher for random operations—that is, an operation where the I/O starts at one sector but then must proceed to a random location elsewhere on the platter. Microsoft provides a lot of guidance about acceptable latency for a SQL Server I/O subsystem, with transaction log latency recommended to be 10ms or less and database file latency recommended to be 20ms or less. These are very fast I/O response times, so doubling those recommendations is likely to be acceptable for most business applications.
Although latency and IOPS are the most important characteristics of a hard disk for OLTP applications, business intelligence (BI) applications are typically heavy on read operations. Consequently, BI applications usually seek to maximize “disk throughput” because they make frequent use of large, serial reads culling through many gigabytes of data. A typical, current-generation hard disk today has a throughput ~125MB/second at 15,000 RPM. In BI applications, DBAs frequently monitor disk throughput along with disk latency. Refer to Chapter 15, “Monitoring SQL Server 2012,” for more details on monitoring SQL Server 2012’s performance.
The next section tells the best way to combine multiple disks for greater I/O performance, the redundant array of inexpensive disks (RAID).
Understanding RAID Technologies
If a single hard disk is insufficient for the I/O needs of the SQL Server instance, the usual approach is to add more hard disks. It is possible to add more spindles and then place specific SQL Server objects, such as tempdb, onto a single, additional hard disk and thereby see an improvement in I/O performance by segregating the I/O. Bad idea! Hard disks are prone to failure and, when they fail, a single hard disk failure can crash SQL Server. Instead, DBAs use redundant arrays of inexpensive disks (RAID) to add more hard disks while providing greater fault tolerance.
RAID is described in terms of many “levels.” But with database technology, the most commonly used types of RAID are RAID1, RAID5, and RAID10. RAID0 is also described but not recommended for databases. These are described a bit more in the following sections.
RAID0, called striping, spreads IOPS evenly across two disks. RAID0 is very fast for reads and writes, but if any one disk in the array fails, the whole array crashes.
Figure 3.1 Shows two disks in RAID0 configuration.
Figure 3.1. RAID Level 0, striped array of disks.
In this example, assume we have three transactions of two blocks each. So, Transaction1 needs to write two blocks to disk: Block A and Block B. Transaction2 needs to write Blocks C and D. Finally, Transaction3 needs to write Blocks E and F.
Each transaction takes only half the time to write to the RAID0 set as it would with a single disk because each of the two blocks are written simultaneously on Disk 1 and Disk 2, instead of writing the two blocks serially on a single disk. Of course, the downside is that if either drive fails, you lose the whole set, given that half of the data is on Disk 1 and the other half is on Disk 2.
RAID1, called mirroring, is essentially a set of two disks in which every read and every write operation is performed on both disks simultaneously. Fault tolerance is improved because either disk can fail without crashing the array, allowing SQL Server to continue operating. The DBA or server administrator can then replace the failed drive without an emergency drill. RAID1 is essentially as fast as a single hard disk, but it is fault tolerant whenever a single disk in the array fails.
Figure 3.2 represents two disks in RAID1 configuration.
Figure 3.2. RAID Level 1, mirrored array of disks.
In this example, assume as before that we have three transactions of two blocks each, this time written to a RAID1 set. So, Transaction1 needs to write two blocks to disk: Block A and Block B. Transaction2 needs to write Blocks C and D. Finally, Transaction3 needs to write Blocks E and F.
Each transaction takes about the same to write to the RAID1 set as it would with a single disk because each of the two blocks are written sereally on Disk 1 and Disk 2, just like on a single disk. The big benefit here is that if either drive fails, you still have a full and complete copy of the data on both Disk 1 and Disk 2. And since there is no parity bit calculation, write speed is superior to that of RAID5.
RAID5 is a group of at least three disks in which every read is striped across all the disks in the array. This means that read-centric IOPS are faster than with a single disk because the required data can be pulled from more than one disk simultaneously. Write performance, however, is slower than read performance on RAID5 because every write IOP also includes one additional parity write. This parity bit enables the array to reassemble any lost data should one of the drives in the array fail. That means a RAID5 can survive a single drive failure, but not more than one drive failure at a time.
It also means that RAID5 is good for read-heavy applications, but is not as good for write-heavy applications. RAID5 can be expanded beyond three disks, but is typically never bigger than seven disks because of the considerable time needed to reconstruct a failed drive from the parity bits written across all those other drives.
Figure 3.3 represents three disks in RAID5 configuration.
Figure 3.3. RAID Level 5, striped array of disks with parity.
In this example, assume as before that we have three transactions of two blocks each, this time written to a RAID5 set. So, Transaction1 needs to write two blocks to disk: Block A and Block B. Transaction2 needs to write Blocks C and D. Finally, Transaction3 needs to write Blocks E and F.
The first difference you’ll notice is that RAID5 requires a minimum of three disks. Each transaction takes longer to write to the RAID5 set as it would with a single disk because each of the two blocks are written as a stripe across two of the disks while the third disk has a calculated parity bit written to it. The calculation and extra block write takes more time. A transaction that reads two blocks off of the RAID5 set would be faster than a single disk read, because the blocks are striped and could be read simultaneously. The benefit here is that if any single drive fails, you have enough information to recalculate the missing data using the parity blocks. The other benefit is that it is cheaper than other kinds of RAID sets.
But RAID5 has drawbacks too. First, write operations are a lot slower than on DASD or the other RAID configurations described here. Second, even though RAID5 provides inexpensive fault tolerance, should a drive fail, the process of calculating the values of the missing data from the parity blocks can be time consuming. So a full recovery, while easy to do, can be slower than on RAID1 or RAID10.
RAID10 is also called RAID1+0. RAID10 is a group of at least four disks in which a RAID1 pair of disks are also striped, RAID0 style, to another pairs of RAID1 disks. This approach uses twice as many disks as RAID1. But it means that the array can sustain two failed drives simultaneously without crashing, as long as the failed disks are not in a single RAID1 pair. RAID10 has both fast read and write speeds, but is more expensive and consumes more space than RAID1 and RAID5.
Figure 3.4 represents four disks in RAID10 configuration.
Figure 3.4. RAID Level 10, mirrored plus striping.
In this example, assume as before that we have three transactions of two blocks each, this time written to a RAID10 set. So, Transaction1 needs to write two blocks to disk: Block A and Block B. Transaction2 needs to write Blocks C and D. Finally, Transaction3 needs to write Blocks E and F.
The first difference you’ll notice is that RAID10 requires a minimum of four disks. Each transaction is faster to write to the RAID10 set as it would with a single disk because each of the two blocks are written as a stripe across each pair of mirrored disks. A transaction that reads two blocks off of the RAID 10 set would be much faster than a single disk read, because the blocks are striped and could be read simultaneously. So, both writes and reads are typically faster than any other option presented here.
In addition to speed of both read and write operations, RAID10 sets are more fault tolerant than any other configuration. That is because each mirrored pair within the RAID10 set can sustain a single disk failure, meaning that more than one disk can fail without causing the array to crash.
But RAID10 most obvious and dramatic drawback is the cost of requiring at least four disks. The disks themselves are costly, but they also consume plenty of space and energy as well. If you can afford them, RAID10 arrays are certainly the best option. But they constitute a much greater expense than DASD.
Disk Controllers and Host Bus Adapters (HBA)
Disks are typically configured and controlled using one of two popular standards: SCSI (pronounced “scuzzy”) and IDE/ATA (usually “A-T-A” for short). SCSI is by far the most popular standard for servers, whereas ATA (usually labeled as SATA) is the most popular standard for home and personal use. Servers using hard disks that adhere to these standards can be directly cabled to the disks, resulting in the acronym DASD, for Direct Attached Storage Device. DASD can also be cabled using standard network protocols like IP over Ethernet cable or Fibre Channel. Note that Fibre Channel and iSCSI are seen primarily on SANs.
Host bus adapters (HBAs) control the movement of data between the server motherboard and the hard disks. HBAs typically include performance options, such as a built-in cache, which can improve performance by buffering writes to the hard disk and then sending them in a burst.
HBAs are important in storage and I/O discussions because it is possible, with enough active processes, to saturate a single HBA. For example, imagine a scheduled job attempts to back up all databases simultaneously on a SQL Server with multiple TBs of data while other I/O heavy operations were also processing. A situation like that could attempt to push through more I/O than the HBA can sustain. HBAs are also important because their firmware may need to be independently updated and maintained when faced with updates to Windows or other hardware components. Remember to stay apprised of vendor recommendations and updates when using high-end HBAs.
Network Attached Storage (NAS), Storage Area Networks (SAN), and Logical Units (LUN)
The Internet is full of information about digital storage. However, some of it is old, outdated, or simply not useful. With SQL Server and storage, consider Network Attached Storage (NAS) as one such area where DBAs should steer clear. Think of NAS servers as more of a file-and-print server technology, not suitable for SQL Server database files, transaction log files, or backup files. NAS may be useful in some Extract-Transform-Load (ETL) applications, but only when their use and potential failure won’t crash the production SQL Server database.
Storage Area Networks (SAN) are usually slightly slower than DASD, if only because they are more heavily used by a variety of applications than DASD. SAN is typically expensive and also has a degree of management overhead in that it should be set up, configured, and administrated only by a dedicated IT professional. But what it loses in speed, it makes up in flexibility, redundant components, and manageability.
For example, it is very easy to extend, reduce, or reconfigure storage on a SAN in Logical Units (LUNs) without stopping important Windows services, like SQL Server. These LUNs can be made available to Windows servers as if they were regular disk drives, when in fact they are usually RAID volumes or even just portions of RAID volumes. In fact, it is easy for SAN administrators to virtualize storage so that it can be quickly moved around on-the-fly.
This is great for SAN administrators, but it can be bad for DBAs if the storage the application has been depending upon is shared with another application in the enterprise. For example, it is possible to configure a large RAID10 volume as two LUNs. These two LUNs share the same underlying disks and, if both LUNS are busy, contend with one another for IOPS on the underlying RAID volume. It’s not good, but certainly a possibility.
Depending on the SAN in use, enormous amounts of cache may also be available to help speed I/O processing. These caches can also be quickly and easily configured (and reconfigured) by SAN administrators to better balance and tune applications that use the SAN.
Unfortunately, many if not most SAN administrators think about storage only as measured by volume, not IOPS or disk throughput. Consequently, it is up to DBAs to know how much I/O performance their applications need and to monitor I/O performance within SQL Server to ensure that they are achieving adequate I/O performance. Many a SAN administrator has been startled to find out that the SQL Server DBA is better informed about the I/O speed (or lack thereof) on the LUNs assigned to them than they are, usually due to a misconfigured setting somewhere on the SAN.
The bottom line for DBAs when administrating storage on a SAN is to follow the SAN vendor’s recommendations wherever possible, and then to monitor and performance tune the SQL Server instance as if the LUN(s) are normal disks.
Solid State Disks (SSD)
The new kids on the block for storage and I/O subsystems are several kinds of solid state disks (SSDs). SSDs are treated just like hard disks when configured, as individual devices or in RAID volumes. Because they are entirely electronic in nature, they offer significant savings in power consumption, speed, and resistance to damage from impact compared to hard disks. SSDs are becoming increasingly popular in the database administration community because they are remarkably faster than hard disks, especially for random I/O operations.
From a hardware perspective, a variety of different types of memory chips might be used within the SSD. But the two most common types of memory in SSDs are DRAM, which usually has volatile memory, and NAND flash memory, which is slower but nonvolatile. Volatile memory loses data when it loses power, whereas nonvolatile memory does not lose data when there is no power. When assessing flash memory-based SSDs, multilevel cell (MLC) flash memory is slower and less reliable than single-level cell (SLC) flash memory.
SSDs, however, have a few special considerations. First, the memory blocks within an SSD can be erased and rewritten a limited number of times. (DRAM-based SSD does not have this limitation.) Enterprise-quality flash drives work around this limitation by overprovisioning the amount of storage on the SSD through algorithms called wear leveling, thus ensuring that the SSD will last for the same duration as a similarly priced hard disk. Second, SSDs require a lot of free memory blocks to perform write operations. Whereas hard disks simply overwrite an unused sector, SSDs must clear out a previously written block using an algorithm called TRIM.
Finally, whereas SQL Server indexes residing on hard disks need frequent defragmentation, indexes residing on SSDs have no such requirement. Because all memory blocks on the SSD are only a few electrons away, all read access is pretty much the same speed whether the index pages are contiguous or not.
Now that you understand the most important aspects of storage hardware, let’s discuss the principles and management tasks that correlate the storage and I/O elements of SQL Server back to the system hardware.
The following “Notes from the Field” section introduces the one of the most commonly used methods of improving I/O performance: segregation of workload.