- Storage Virtualization Overview
- Core Concepts
- Chapter Summary
1.2 Core Concepts
The SNIA taxonomy for storage virtualization is divided into three basic categories: what is being virtualized, where the virtualization occurs, and how it is implemented. As illustrated in Figure 1.1, virtualization can be applied to a diversity of storage categories.
Figure 1.1 The SNIA storage virtualization taxonomy separates the objects of virtualization from location and means of execution.
What is being virtualized may include disks (cylinder, head, and sector virtualized into logical block addresses), blocks (logical blocks from disparate storage systems may be pooled into a common asset), tape systems (tape drives and tape systems may be virtualized into a single tape entity, or subdivided into multiple virtual entities), file systems (entire file systems may be virtualized into shared file systems), and file or record virtualization (files or records may be virtualized on different volumes). Where virtualization occurs may be on the host, in storage arrays, or in the network via intelligent fabric switches or SAN-attached appliances. How the virtualizaton occurs may be via in-band or out-of-band separation of control and data paths. While the taxonomy reflects the complexity of the subject matter, the common denominator of the various whats, wheres, and hows is that storage virtualization provides the means to build higher-level storage services that mask the complexity of all underlying components and enable automation of data storage operations.
The ultimate goal of storage virtualization should be to simplify storage administration. This can be achieved by a layered approach, binding multiple levels of technologies on a foundation of logical abstraction. Concealing the complexity of physical storage assets by only revealing a simplified logical view of storage is only a first step towards streamlining storage management. Treating multiple physical disks or arrays as a single logical entity segregates the user of storage capacity from the physical characteristics of disk assets, including physical location and unique requirements of the physical devices. Storage capacity for individual servers, however, must still be configured, assigned, and monitored by someone. Although one layer of complexity has been addressed, the logical abstraction of physical storage alone does not lift the burden of tedious manual administration from the shoulders of storage managers.
To fulfill its promise, storage virtualization requires automation of the routine soul-numbing tasks currently performed by storage administrators. Allocating additional storage capacity to a server, for example, or increasing total storage capacity by introducing a new array to the SAN are routine and recurring tasks begging for automation.
Ideally, storage automation should be policy-based to further reduce manual intervention. Virtualization intelligence should automatically determine whether a specific storage transaction warrants high availability storage or less expensive storage, requires immediate data replication off-site or simple backup to tape on a predetermined schedule, or becomes part of a lifecycle management mechanism and retired at the appropriate time. A tiered infrastructure leveraging class of storage provides policy engines with repositories that meet the requirements of different types of storage transactions.
Finally, storage virtualization should become application-aware, so that policy-based automation responds to specific data types and identifies the unique needs of each upper layer application. Digital video, for example, gains more consistent performance if it is written to the outer, longer tracks of physical disks. Likewise, financial transactions for banking or e-commerce would benefit from frequent point-in-time copy policies for safeguarding most current transactions. An intelligent entity within the storage network that monitors and identifies applications and, based on preset policies, automates the handling of data for class of storage brings storage virtualization much closer to the concept of utility.
Application-aware storage virtualization provides the potential for dynamic communication between upper-layer applications and the storage services beneath them. As demonstrated by Microsoft's initiative to provide enhanced interfaces between the operating system and storage utilities such as snapshot, mirroring, and multi-pathing, it will become possible for upper layer applications to more fully leverage underlying storage services. Storage virtualization-enabled applications could, for example, seek out those services that more closely align to their current requirements for capacity or class of storage or, via APIs, inform the storage network of unique policies that should be enforced.
The viability of storage virtualization is enhanced by, but not dependent on, interoperability between storage assets. Although storage virtualization vendors highlight the benefits their products bring to heterogeneous data centers that may include HP, IBM, EMC, HDS, or other storage, some customers are quite happy with single vendor, homogeneous storage solutions. Logical abstraction of physical storage, automation of tedious tasks, policy-driven data handling, and application awareness have significant value for both single-vendor and multivendor storage networks. Interoperability, however, is a key component of the storage utility, since a utility should accommodate any type of application, operating system, computer platform, SAN infrastructure, storage array, or tape subsystem without manual intervention.
As shown in Figure 1.2, storage virtualization technology is a layered parfait of more sophisticated functionality that drives toward greater degrees of simplicity. Current products provide bits and pieces of a virtualized solution, from elementary storage pooling to limited automation and policy engines. Vendors and customers, however, are still struggling toward more comprehensive, utility-like storage virtualization strategies that fully leverage the potential of the technology.
Table 1.2. Storage virtualization enables successive layers of advanced functionality to fully automate storage administration.
Virtualization-Enabled Applications |
Services |
Application-Aware Storage Virtualization |
Dynamic Capacity Allocation Tape Backup Processes Storage Consolidation Heterogeneous Storage Tiered Storage (ILM) Point-in-time Snapshots Replication / Mirroring Auditing / Service Billing |
Policy-based Management |
|
Automation of Storage Processes |
|
Logical Abstraction Layer |
|
Physical Storage Systems |
Where the intelligence to do all these virtual things resides is interesting from a technical standpoint, but of less interest to the ultimate consumers of storage resources. The transparency that storage virtualization provides for storage assets should eventually apply to the storage virtualization solution itself. The abstraction layer that masks physical from logical storage may reside on host systems such as servers, within the storage network in the form of a virtualization appliance, as an integral option within the SAN interconnect in the form of intelligent SAN switches, or on storage array or tape subsystem targets. In common usage, these alternatives are referred to as host-based, network-based, or array-based virtualization. Each approach has strengths and weaknesses that we will describe in subsequent chapters.
In addition to differences between where virtualization intelligence is located, vendors have different methods for implementing virtualized storage transport. The in-band method places the virtualization engine squarely in the data path, so that both block data and the control information that govern its virtual appearance transit the same link. The out-of-band method provides separate paths for data and control, presenting an image of virtual storage to the host by one link and allowing the host to directly retrieve data blocks from physical storage on another. In-band and out-of-band virtualization techniques are sometimes referred to as symmetrical and asymmetrical, respectively, but for the sake of accuracy and simplifying the virtualization vocabulary, this text uses the hyphenated band terminology.
Simplifying storage administration through virtualization technology has many aspects. Centralizing management, streamlining tape backup processes, consolidating storage assets, enhancing capacity utilization, facilitating data integrity via snapshots, and so on, are not really attributes of storage virtualization, but rather beneficiaries of it. Storage consolidation, for example, is enabled by networking storage assets in a SAN. If there is only one large disk array to manage, virtualization may not contribute significantly to ease of use. If there are multiple disk arrays, and in particular, arrays from different vendors in the SAN, storage virtualization can help streamline management by aggregating the storage assets into a common pool. Current vendor literature is often punctuated with exclamations about the many benefits of storage virtualization, and then proceeds to focus on backup, snapshots, and so on. In some cases, customers may indeed benefit from these enhanced utilities, but may not need to virtualize anything to use them. As always, the starting point for assessing the potential benefit of a new technology is to understand your application requirements and your existing practices and measure potential benefit against real need.
In the following chapters, we will examine the basic problems that storage virtualization is trying to solve, the different approaches technologists are proposing, and the intersections between current capabilities and productive use by customers. In the process, hopefully, the confusion factor generated by this subject can be kept to a minimum as the virtual state of this technology is separated from its real one.