Jiro Storage Networks
Perhaps the most difficult part of writing this book was to decide how much information to include about storage networks and storage techniques in general. On the one hand, the Federated Management Architecture (FMA) and Jiro can be applied to virtually any management solution. On the other hand, FMA was originally built with a direct focus on storage, so many of the architectural decisions can be justified if this is obvious from the start.
The basis for much of the content of this book is the concept of a storage network. The partitioning of storage data, management, and operations from an overall production network into a dedicated storage network is a relatively new trend and a quickly evolving field of study. There are many different reasons to separate storage traffic from a production network:
You avoid having users overwhelm a network and cut off storage traffic, or vice versa.
You allow the storage network to be optimized for particular quality of service (QoS) attributes that may differ from quality of service parameters required in a production network.
You prevent confusion between storage management and network management, two tasks that have largely different concerns and needs.
You allow the storage network to use a network protocol optimized for storage access and data movement.
There are more reasons to maintain a split between a production network and a storage network, but much more detail can be derived from one of the many books on storage networks that are available. In addition to storage networks, FMA and Jiro must be able to manage storage that is available on a production network in two other forms.
Direct attach storage, which is attached directly to a host's bus. A typical example of this is the hard drive in your personal computer or server.
Network attach storage (NAS), which is a class of systems that provide file services to host computers. A host system that uses NAS uses a file system device driver to access data using file access protocols such as the Network File System (NFS) or the Common Internet File System (CIFS). NAS systems interpret these commands and perform the internal file and device I/O operations necessary to execute them. 1
To manage storage as a whole, a person first thinks of the hardware that is required for storage management: routers, switches, disk devices, tape devices, and more. 2 What people sometimes forget is the wide variety of software that goes into day-to-day storage management. Storage management of any kind could not be achieved without the software. Software components for managing storage include the following:
Device drivers: layers of code on hosts that translate operating system requests to device requests.
Management console: software that allows monitoring of particular resources.
Backup management tools: policy-based tools for scheduling and maintaining backups and archives of live data.
Volume and file manager: tools that allow hosts to access data in hierarchical formats using custom file systems with adequate security.
As an enterprise or medium-sized business grows, more storage is required. Furthermore, as businesses become distributed or embrace the Web, the amount of time that storage must remain online increases. For many businesses, it is essential for storage to remain online 24 3 7 3 365. The availability requirement alone is a primary driver for storage networks. It is difficult to replace a hard drive that is directly attached to a host without bringing the host down during installation.
According to research by IDC, production storage between 1999 and 2004 is on course to grow by 10,000 petabytesthat is, 10,000,000,000,000,000,000 bytes of information. Accompanying this increase in storage will be an increase in storage management costs, and all this is coupled with a tight worker market. This combination spells trouble for end users. Storage administrators and companies with storage issues will attempt to solve problems in a variety of ways:
Flexibility. The main goal of flexibility is to predict future storage network requirements early to decrease the impact and maintenance when growth is needed. An example is to outsource a large part of the storage networking needs to a company that specializes in this storage network, such as a storage service provider (SSP). The biggest single issue with an SSP is trust: Does your company trust your data to be sent offsite to another company? There are other ways to increase storage flexibility, including redesigning the existing storage network in a modular and expandable fashion.
Time balancing. Who is impacted, and what is the company tolerance for using and paying for time? For example, acknowledging that you cannot afford to reengineer the storage network or hire additional resources means that you will impact your employees and customers due to maintenance time as your storage needs increase. Furthermore, the company will not be able to take advantage of new storage opportunities that could create more efficient use of time. The company could also choose to substantially increase the amount of time spent dealing with storage networking. This approach acknowledges the value of the employee and customer information, but if the company lacks the ability to be flexible, time invested in the network will increase linearly (or exponentially) with the amount of storage added.
Resources. Adding administrators to address storage networking needs increases the total cost of ownership (TCO) but does not necessarily increase the efficiency of the storage network. Resources can be acquired in the form of onsite storage networking consultants who are dedicated to the maintenance of your systems. To some degree, the issue of trust is relieved with this option, although it does require higher capital expenditure.
No matter how a business chooses to address its ever-increasing storage needsprobably through a combination of these approachesthere is another variable that can aid in creating an effective storage management plan: storage management software. The simple facts are that stored information is increasing exponentially, and it is unlikely that the number of storage management professionals will increase exponentially during the same time. The only answer to this dilemma is to create effective storage management tools that allow storage management experts, whether they are onsite or hired hands, to more effectively manage increased storage without increasing the number of experts or their training time.
A tool that proactively monitors your storage network and asks for help only when necessary is sometimes referred to as the Holy Grail 3of storage management. In many cases, this level of management can be achieved if you are willing to build storage networks with products from a single vendor. By choosing a single vendor solution, however, you are tied into its pricing and support mechanisms, forcing you to trust a single vendor with your data and your budget.
The truth is that the storage industry suffers from commodity pricing. By allowing businesses to choose a quality of service level and a corresponding price point for the quality of service, the industry enables businesses to grow their network without bounds and based on their own constraints of budgeting vs.QoS. The problem today with heterogeneous storage networks is that each vendor of a component within the storage network often uses its own management techniques.
From the point of view of the storage manager, we are back to the first problemincreasing the amount of storage increases the number of storage management issues that must be dealt with. For example, by purchasing two fibre 4 channel switches from two different companies, you require your storage management experts to understand two management consoles. 5
The Federated Management Architecture from Sun is meant to bring heterogeneous environments back down to a single point of control. Furthermore, the architecture dictates policy-based solutions that can grow unbounded with a storage management network.
This chapter discusses the nuts and bolts of data centers, including management techniques and protocols as well as the hardware and software involved in a storage solution. After discussing storage and storage management, we explain how FMA and Jiro fit in to the storage management picture.
The important thing to take from this chapter is not necessarily an understanding of heterogeneous storage networks vs. homogeneous storage networks, or one type of hardware vs. another type of hardware. The essential information is simply that all these types of hardware and software exist. They all must be managed, no matter who is doing the management for you. Your goal should be to try to understand how a device ends up being managed by software, and how software itself also requires management from a policy-based solution.
2.1 Storage Hardware
Beyond host computer systems, there are two primary categories of hardware to consider. In general, there are the physical devices that store data and the network support that helps move the data to and from the correct locations. Both categories contain many different kinds of devices. A few of the devices in each category are profiled here.
Each type of device and configuration has trade-offs. For example, the managed fibre channel switch profiled later seems like a perfect device for network management. The drawbacks of a switch versus an average low-cost hub are that switches involve propagation delay and tend to be expensive.
On the other hand, low-price hubs give no indication of trouble in a network, can be difficult to manage, and share bandwidth between all attached devices (switches can allocate all bandwidth to multiple zones). These limitations have a direct impact on the ability of a storage administrator and storage management software to detect problems in the storage network.
Again, you should devote thought to each storage network before spending the company's budget. Even within a single data center, a wide variety of hardware devices can be employed to fit the characteristics and QoS of a particular department or area.
2.1.1 Disk Devices
If you are coming from a PC-centric background, when you think of storage, you think of the drives that are attached to the bus in your system. This isn't far from the truth of implementation for many large installations. Host file servers often contain direct attach storage, which is physically contained within a host. The host then shares these disks through a network file protocol such as NFS or CIFS. To expand storage, the system administrator brings down the host, adds a drive to the server tower, configures it, and shares it.
In large data centers, storage is more partitioned than in the physical containment model used in hosts. There are many reasons for this partitioning. One is that mainframes have traditionally been very good at separating storage from the systems. Another reason is simply that large data centers have encountered problems with the old model and have already started to partition into storage networks as their solution. Physical drives fit into rack-mounted cabinets that are 19 inches wide and of variable height depending on the contents of the rack-mounted equipment.
Redundant arrays of independent disks (RAID) hardware enables high-performance data retrieval and high availability of data through the use of multiple disks. Basically, to enable high performance, data is spread over multiple disks to allow parallel reads and writes to the disks. By having more disk arms moving, you relieve a major performance bottleneck: the disk arm. To enable high availability, data is striped across disks, and then parity bits are used to enable recovery of lost data. In the basic RAID levels, parity is used to enable recovery of one lost disk in the disk array. So if four disks are being used and one crashes, the crashed disk can be replaced and the data retrieved from the parity bits.
RAID levels, 0 to 5, give different levels of redundancy or performance. Advanced RAID techniques combine the RAID levels to try to give performance and high availability. The basic RAID levels are
Level 0: striping
Level 1: mirrors
Level 3: dedicated parity disk
Level 4: parallel access with parity disk
Level 5: parallel access with distribution parity
Combining some of the RAID levels makes implementations more expensive (in terms of hardware and possibly performance), but it creates benefits that combine the best of both techniques. For example, RAID level 0 combined with level 1 can give fast read and write access as well as good data redundancy.
RAID devices are put in the hardware section, but the location of the RAID implementation varies widely. RAID can be implemented in three places:
Onboard a physical disk array
In a controller card residing in a server system
In software, such as a logical volume manager
Where you implement RAID capabilities affects both the cost and the effectiveness of the implementation. For example, using software RAID implementations may be inexpensive, but it creates a burden on the host that implements the RAID capabilities. The software is burdened with manipulating the distribution of data across physical devices. This robs memory and valuable processor cycles from the file-serving processes. The result is that increased traffic to the host increases the demands on the file-sharing software as well as on the software RAID controller, a double hit to the server at a time when you would prefer to lighten the load on the processor to aid in processing requests. To relieve the host, RAID implementation can be moved to controller cards or onto the disk arrays themselves. Typically, this locks the RAID implementation into a single vendor, but it can create a very effective implementation. The decision of where to implement RAID in a storage network is an important one.
Just a bunch of disks, more widely known as JBOD devices, are low-cost devices that contain . . . a bunch of disks. There are many different ways to configure the disks. Typically, the JBOD is in a rack enclosure, and you hot-swap drives in and out of the JBOD. Whereas literal RAID device has the RAID capabilities onboard the device, if you want to use some or all of the disks available in the JBOD for RAID configurations, it must be controlled by software or an external RAID controller.
Network attach storage on the low end fits into the category of disk devices. Devices fit into several price groups. On the high end of NAS price points, NAS involves a rack-mounted system that attaches to an IP network. The high-end device typically contains one or more disk drives that can be configured in various RAID configurations. In the low-end price range, you will likely find software-based RAID, limited management capabilities, and very limited backup capabilities. Furthermore, on the low end, stand-alone devices are available that can sit on desktops or even in the home. Onboard any NAS device is what could be termed a specialized operating system that is optimized for file serving. In this operating system many of the general functions of the kernel and operating system are removed, such as any graphics capabilities, extraneous port handling drivers (for USB or parallel devices), and other optimizations that can be found for the specific device. The file system, volume management, and security are all built into the operating system and services that are hosted on the NAS device. Plug in the NAS, and you have instant space available via CIFS or NFS attachable directories.
Higher-priced NAS devices contain a huge amount of functionality. They contain everything from built-in tape libraries for archiving and backup to custom file systems built for network sharing of data.
2.1.2 Tape Devices
There are essentially three types of tape storage enclosures that systems can use:
Single tape drive. Targeted at user data backup, single tape drives often exist on servers or single-user computers that contain important data.
Tape autoloader. This device loads tapes automatically and contains a single read/write head. This is really a degenerate case of a tape library (discussed next).
Tape library. Much larger than a tape autoloader, this device often contains multiple read/write heads.
For management purposes, the physical devices are important, but much of the data management will be accomplished through backup/archive manager or hierarchical storage manager (HSM) software, both of them covered later in this chapter.
2.1.3 Storage Networking Hardware
A variety of devices make up the category of what can be considered storage networking hardware. Later in this chapter we talk more about what it means to create a storage network, but the devices that fall into this category are similar to traditional networking hardware. Hubs, routers, and switches are combined to make up a network infrastructure. Each device has different capabilities as far as network management is concerned, and each is used in a different way.
Hubs. These devices provide a low-cost, easily installable way to expand a storage network. Hubs have two major drawbacks. One is that they tend to be less "manageable" than switches. The second is that bandwidth is shared between all the devices on the hub. A switch has the ability to partition devices and maintain full bandwidth to each partition of devices, even in a degenerate configuration in which each attached device is in its own zone. In this degenerate case, each attached device has full bandwidth. This configuration is not possible with hubs.
Switches. Like hubs, switches allow network expansion. The difference is that the switches have more management capabilities, more configuration options, and typically have some ability to debug and maintain performance in fibre channel network. The switch forms the center point of what is known as a fabric. The switch can route data between ports of any two devices that are connected to the fabric. You can also create logical partitions of the fabric, known as zones, which give full throughput to all logical partitions. Finally, a switch is often able to detect a misbehaving component and eliminate it from the fabric without impacting the remaining devices. The downside of switches is that they tend to be much more expensive than hubs and can introduce a small amount of propagation delay. Expensive hubs and inexpensive switches can overlap in capabilities. Furthermore, in the future it is likely that low-end hubs will actually become low-end switches as components used in switches hit lower and lower price points.
Routers. Used for routing network traffic, routers let you add a variety of features to make them an integral part of a storage network. For example, some routers can convert fibre channel protocol traffic to parallel SCSI traffic, allowing you to attach legacy SCSI devices, such as tape libraries, to a fibre channel network.
In some cases, switches and hubs can be used interchangeably. Switches are more manageable than hubs but incur some propagation delay depending on their zoning options. On the other hand, a switch will remove a misbehaving device from a storage network automatically and will often signal the administrator in multiple ways, perhaps through a nice red LED.
In addition to the devices that form a network infrastructure, controller cards attach devices to the physical network. Sometimes these are termed host bus adapters, or HBAs. If you have multiple HBAs installed on a host, one HBA can fail while a storage network connection continues to be available. HBAs are similar to a network interface card (NIC).
The hubs, switches, and routers discussed in this section come in two forms: one for fibre channel networks and one for IP networks. A quickly advancing standard known as SCSI over IP moves the most popular storage protocol, SCSI, to an IP network. With the advent of SCSI over IP, similar management tools and hardware can be used to manage both the client network and the storage network. Increasing the capabilities of the management tools for these networks and creating one set of hardware for a complete network (storage and production) will lower the total cost of ownership for storage networks.