Hardware Replication Problems
Hardware replication exposes a number of challenges when accessing replicated data. Accessing replicated data means that its I/O stack must be completely reconstructed. Indeed, because the hardware replication happens at a low level in the I/O stack, each I/O stack layer must be correctly created:
Physical layerensure the replicated disks are consistent and accessible.
Driver layerdetect the replicated disks.
LVM layerreconstruct the replicated logical groups and volumes.
File system layermake the replicated file system consistent.
Application layermake the data ready for the application.
A certain number of difficulties might occur in each one of these layers, as we described below.
Consistency groupsas described earlier, a single I/O generated by an application is translated into several I/Os at the disk level. It is important that the storage array maintains the coherence of these multiple I/Os, in order to have a consistent replicated disk. Enterprise-class storage systems use the notion of consistency groups to insure such coherence and correct write ordering.
Splitting the pairsbefore accessing the cloned disks, the replication must be suspended (split). This suspension can only be done when the primary and original disks are fully synchronized. The replicated disks will be accessible once the split is totally completed.
Drivers are the software components that link the hardware to the OS. At this stage, no replication is taking place at this level. Therefore, the drivers must be at the correct levels and configured for the hardware. The goal is to correctly detect and access the cloned disks drives.
When doing hardware replication on a logical volume, the entire content of the physical disks is cloned. This includes the configuration section (called the private region in VERITAS, or metadb in the Solaris_ Volume Manager (SVM), and the data section (also called the public region). However, this private region (or metadb) holds disk identifications parameters. Therefore, the cloned disks and original disks have the same ID. This is not a major issue if the replicated data is to be accessed on two different hosts, but it can be a difficult issue to solve if you want to access the replicated data on the same host.
FIGURE 2 One-Host Configuration
FIGURE 3 Two-Host Configuration
Replicated Data On a Different Host
Accessing the replicated data on a secondary host is equivalent to importing the logical group (or metaset) that contains the logical volumes you want to access. However, because the disks are cloned, the volume manager on the secondary host will believe this logical group is already imported on the primary host. This information is stored in the private region. It is necessary to clean up this information on every replicated disk (clearimpor under VxVM). It is then possible to import the replicated logical group, and access its volumes.
Replicated Data On the Same Host
In this case the situation becomes challenging. As described previously disk cloning implies duplication of diskid. If not properly supported, the LVM can get confused and, in the worst case scenario, this can lead to silent data corruption. The method described in this article to access the replicated volumes is:
Re-create a new logical group and populate it with the cloned disks.
Re-create every logical volume on this new disk group, using the configuration of the primary group.
File System Layer
Once the LVM layer is correctly done, you must avoid reformatting the volumes. It is necessary, however, to check the file systems for any corruption. Indeed, if a crash happened on the primary host, a file system corruption might happen, especially if the crash occurred during creation of a large file. This is why it is strongly advised that you use journalized file systems such as UNIX File System (UFS) with logging enabled, or the VERITAS File System (VxFS).
Finally, the application can use the replicated data. A final problem can occur, depending on the application. For example, if the application uses host-specific values (such as the node name, IP address, and mount points) you must reconfigure the application with the new values.