Technology Platform, Techniques, and Alternatives
It wasn't long ago that IT professionals shrugged off storage as a straightforward, albeit very boring, aspect of maintaining a computing infrastructure. But in the last few years, a push toward shared enterprise storage (see Figure 11) has given rise to several deployment options.
For instance, when does a network-attached storage (NAS) device do a better job storing hordes of enterprise data than a storage area network (SAN)? And how do these newer technologies compare with local storage, where a hard disk is directly accessed by a server via a SCSI cable connection?
Recently, switch maker McData2, leading server vendor Compaq, and partner MierCom (forever known from this point on as the Global Test Alliance [GTA]) kicked the competitive tires of these storage technology alternatives to see how performance varied across several common storage scenarios3. Their test bed was set up to loosely emulate file servers, Web servers, video servers, and other application servers with regard to the data that they routinely transfer to and from a storage location. The GTA varied the storage location between a local SCSI-attached disk drive, a disk drive on a storage server across a Gigabit Ethernet LAN, and a disk drive in a SAN disk array connected over a Fibre Channel SAN.
Which setup worked best? It depends. The GTA tests show that the right storage route to take depends on the storage network environment, the size of the files being stored or retrieved, the type of PCI bus connection, and how your users access the stored data. Specifically, their tests indicate the following:
The NAS environment (in which data moves between a server initiator and a storage target over a Gigabit Ethernet network) can deliver better data-transfer performance than a SAN in certain cases, such as when file sizes are small.
SANs really outperform the NAS alternative when data reads or writes are sequential and file sizes are large, such as when a server is delivering streaming video or when a server is backing up large data volumes.
When connecting a server to a SAN, performance is virtually the same whether the SAN adapter uses a 32-bit or 64-bit PCI-bus connection.
For a Gigabit Ethernet network interface card (NIC) in a NAS environment, performance was typically better via a 64-bit PCI-bus connection than a 32-bit PCI-bus connection. But the difference isn't muchonly about 10% in tests.
In all cases, writing data to a storage device takes more time and resources than reading it, and this subsequently yields much lower data-transfer performance.
With random data reads (when there's no correlation between data from one read to the next), data-transfer performance is much lower than sequential reads of large data files in all scenarios tested.
With random reads, data-transfer performance over a Gigabit Ethernet NAS is nearly as good as reading data from a local disk drive on a SCSI bus3.
The data presented is among the first such published storage-comparison results. Still, readers are cautioned to keep two points in mind.
First, these results are based on the particular equipment that the GTA deployed. A SAN disk array other than the Hitachi 5800 used, for example, might exhibit different performance characteristics.
Second, due to the broad differences among SAN, NAS, and SCSI environments, the results should not necessarily be viewed as perfect apples-to-apples comparisons. For example, while direct SCSI data storage exhibits the best data-transfer performance in some scenarios, it is not generally accessible by multiple servers concurrently, as standalone storage nodes in the NAS or SAN environments are.
Also, although the GTA used an off-the-shelf Compaq server as a NAS storage target, a specialized Hitachi Disk Storage Array was employed as the target node in the SAN environment (see Figure 14)3. There are specialized NAS storage nodes available, too, but their attempts to procure one for this testing were unsuccessful.
Figure 14 In GTA's comparison of competing storage technologies, three test beds were set up to measure the effectiveness of each deployment option.
How the GTA Did It
The test scenarios that the GTA created involved an application-processing server, which, depending on the application, could be an e-mail server, a Web server, a database server, or a video server. This server was the initiator of each storage operation, meaning that it issued all disk read and/or write requests.
Those requests were sent to and processed by a storage target, which varied depending on the environment. In the NAS environment, the storage target was a Compaq ProLiant server, accessed via an IP-based Gigabit Ethernet network. In the SAN environment, the storage target was a Hitachi 5800 Disk Array, which was built for the purpose of being a SAN node. In the SCSI environment, the storage target was one of the application server's internal disk drives.
GTA used the same Compaq server configuration as the initiator in all the scenarios. This was a fairly robust Compaq ProLiant ML370, with dual 866MHz Pentium III processors and 1GB of RAM.
GTA changed the initiator server configuration only when changing from a NAS to a SAN environment. Then GTA replaced the 3Com Gigabit Ethernet NIC with an Emulex LP7000e host bus adapter Fibre Channel.
In the SAN and NAS environments, GTA also compared data-transfer performance between 32-bit and 64-bit PCI-bus connections. This was the connection inside the application server used by the Gigabit NIC and the SAN host bus adapter. The 3Com Gigabit NIC used, model 3C985B-SX, can be plugged into a 32-bit PCI slot or 64-bit PCI slot within the Compaq server. The Emulex LP7000e HBA comes in different models for 32-bit and 64-bit PCI-bus connections.
The SCSI environment is not affected by whether a Gigabit Ethernet or Fibre Channel storage network is in place. The internal disk drive was a SCSI-bus directly connected to the processor motherboard of the Compaq server. No network I/O or NICs were involved.
Another key component to this testing was a sophisticated, public domain software test tool from Intel called Iometer. This software is well suited for this mixed-technology environment because it measures and reports average data transfer in megabytes per secondswhether the data is being sent to a local SCSI-connected disk, out over a Gigabit Ethernet network via a NIC, or out over a SAN via a host bus adapter. Iometer issues disk reads and/or writes to any defined disk drive, which can be a local drive or a network drive mapped to a NAS node, or a drive on a remote SAN disk array. Iometer, which consists of client and server software components, can also perform the same tests across multiple platforms concurrently and consolidate the results, or it can perform a test via multiple threads (instances of the same software process running concurrently and independently) on the same processor. This was the method GTA used for running two and five servers against the same storage target at the same time.
In GTA's research on how to characterize different real-world storage applications, it was found that storage scenarios vary in three regards: the relative percentage of storage requests that are reads versus writes, whether disk access is random or sequential, and the typical file size. Based on this information, GTA developed five scenarios for this comparative testing.
In GTA's first file-server scenario, the server was designed to imitate an application server, such as an e-mail or file server, that conducts many typically small reads and writes continuously. This scenario is characterized by 80% reads, 20% writes. File size is fixed at 4KB, and disk access is random in all cases. This scenario tests how well small files can be served across a Gigabit Ethernet network versus a Fibre Channel SAN.
The cumulative data-transfer rates achieved in this scenario (see Figure 15) are relatively scantless than 1MBps3. This is the impact of moving fairly small files, running a mix of reads and writes, and using random disk access, all of which tend to slow things down. In this scenario, data-transfer performance for all three storage environments is fairly comparable. It is only when five or more servers are collectively accessing the disk storage that the SAN environment provides slightly greater aggregate throughput. A SAN might be a slightly better choice in this type of scenario, but only if you expect to have multiple servers concurrently accessing the same disk storage.
Figure 15 In this series of tests, GTA measured how different storage technologies handle an application server reading and writing small files (4KB) to a target storage device. GTA ran tests with one, two, and five servers initiating the storage operation.
GTA's second file-server scenario (see Figure 16) was similar to the first, with one exception3. Rather than fixing all the file sizes at 4KB, larger file sizes also were included because 10% were 8KB and another 10% were 16KB. This scenario tested how the storage alternatives compared with some larger file sizes added in.
Figure 16 In this series of tests, GTA measured how different storage technologies handled an application server reading and writing a mix of small and large files (4KB, 8KB, and 16KB) to a target storage device. Tests were run with one, two, and five servers initiating the storage operation.
GTA's tests with this scenario showed that, as file sizes increased, data-transfer throughput also increased. As with the first file-server scenario, though, there was no clear winner between NAS, SAN, or local SCSI disk. It's noteworthy that, even with five servers collectively accessing the same disk storage, only 1% to 2% of the Gigabit Ethernet or Fibre Channel bandwidth was used. This means that the transport capacity of Fibre Channel and Gigabit Ethernet is huge.
In GTA's third scenario, one, two, and then five Web servers were serving the same set of Web pages and files. All disk operations are reads; all disk access is random. File sizes were variable, ranging from 20% very small (512 bytes) to 10% fairly large (128KB). This scenario showed how well Web pages can be served over the different storage/transport options.
GTA was surprised with the results of the testing with this scenario (see Figure 17)3. Given random-access retrieval of a range of file sizes, data-transfer rates achieved in the NAS environment clearly outperformed the SAN. Indeed, the NAS throughput was roughly double the SAN throughput in all cases. And despite the hype concerning the throughput speed offered by SANs, it was surprising to see Gigabit Ethernet perform so much better than Fibre Channel SANs in any situation.
Figure 17 In this series of tests, GTA measured how different storage technologies handle a Web server randomly reading and writing files of various sizes to and from a target storage device. Tests were run with one, two, and five servers initiating the storage operation.
In GTA's fourth scenario, one, two, and five video servers delivered streaming video. As with the previous scenario, all disk operations were reads. However, all disk access here is sequential. The same 64KB file size was used in all cases. This scenario tested the relative performance of serving streaming video over the different storage/transport options.
When comparing the video server scenario with the results of the Web server scenario, GTA saw the opposite result (see Figure 18)3. With sequential disk access to large files and consistent, fairly sizable files, the SAN environment outperformed the NAS alternative by a considerable marginfrom more than double for a single video server to nearly four times the throughput when five video servers were reading the same disk files across the SAN or NAS. In the case of five video servers, the cumulative SAN throughput, 47MBps, tapped roughly half the Fibre Channel SAN's bandwidth. These tests indicate that if you are going to serve large amounts of video from a shared-storage node, your best bet is a SAN deployment.
Figure 18 In this series of tests, GTA measured how the different storage technologies handled a video server sequentially reading large video files (64KB) from a target storage device. Tests were run with one, two, and five servers initiating the storage operation.
In GTA's final scenario, one, two, and five application servers were writing folders and directories to the storage target in large 1MB files. All disk operations were writes, and disk access was 100% sequential. This scenario tests how well large files are transported and written sequentially to a backup storage disk, emulating server backup to tape.
In this scenario, applications servers were writing massive amounts of sequential data to a storage target's disk (see Figure 19)3. In the NAS and SAN environments, it seemed that the maximum disk-write-throughput point might have been reached because the storage data-transfer rate did not increase with two or more servers, compared with a single-server initiator. With the Hitachi 5800 disk array in the SAN environment, the peak GTA reached was about 30MBps; with the Compaq NAS server, the write capacity to a single disk peaked at about 5MBps. The specialized SAN storage node clearly outperformed the off-the-shelf server acting as a NAS node in the test bed. GAT didn't know how well a specialized NAS device would have fared by comparison, but given these two storage nodes, the SAN alternative delivers much better performance.
Figure 19 In this series of tests, GTA measured how the different storage technologies reacted when an application server that wrote files to specified directories on a target storage device. Tests were run with one, two, and five servers initiating the storage operation.
The SCSI option did well here for backing up a single server. Indeed, performance was comparable to doing backup over a SAN. However, a key motivation to doing a backup in the first place was to create and maintain a copy of a server's data in a location where it would be safe if something took out the server. Local SCSI doesn't accomplish that end.
There are many other scenarios that could still be tested. For example, it would be interesting to see how data-transfer performance compares if disk storage was striped across multiple target disk drives instead of just one. It would also be interesting to see how different, specialized storage nodes (such as those from Network Appliance in the case of NAS, or EMC in the case of SANs) perform by comparison. However, neither vendor was willing to participate in this novel test bed.
The data presented represents a first step toward quantifying which of the various storage alternatives does the best job for a particular set of requirements. As GTA's testing shows, there are cases in which each delivers the best relative data-transfer performance. Therefore, it is clear that, as far as storage technologies go, one size does not fit all. Indeed, the moral of this story could be that users need to gain a better understanding of their storage needs before they sign on the bottom line for a SAN- or NAS-based storage network.