DR Best Practices
This section contains general guidelines and specific considerations you must remember before using the DR operations on the .
This section contains general guidelines you must follow when you are executing DR commands.
Using the cfgadm(1M) Command or the Sun MC Software for DR
You can use the cfgadm(1M) command on the domain or the DR module in the Sun MC software to perform DR operations. Normally, the granularity offered by the cfgadm(1M) command is not needed for day-to-day business needs of the DR software. Use the DR module in the Sun MC software for routine DR operations.
Using a Slot for the Boot Device
You should use the first host bus adapter in the OpenBoot PROM probe list to access the boot disk. This practice ensures that the boot path is fixed and will not change if the I/O cards are added to the system and the /etc/path_to_inst file is recreated during a boot operation (for example, boot -ra). The device tree structure to the boot device also remains fixed when the domain is booted from a CD-ROM drive or a networked software image.
Labeling Boot Devices
You should label the boot disk and the boot mirror disk by using the format(1M) command with the volname option. This technique enables you to easily identify these disks. The following format(1M) command output shows disk 0 is labeled as bootdisk, and disk 4 is labeled mir-disk.
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0<SUN18G cyl7506 alt2 hd19 sec248> bootdisk /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@0,0 1. c1t0d0<SUN18G cyl7506 alt2 hd19 sec248> /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@0,0 2. c2t0d0<SUN18G cyl7506 alt2 hd19 sec248> /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@0,0 3. c3t0d0<SUN18G cyl7506 alt2 hd19 sec248> /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@0,0 4. c4t0d0<SUN18G cyl7506 alt2 hd19 sec248> mir-disk /ssm@0,0/pci@1a,700000/pci@1/SUNW,isptwo@4/sd@0,0 Specify disk (enter its number):
Changing or adding labels can be done when partitions are in use.
You need two or more paths to the same resources in order to use DR to service the I/O paths and still have access to the resources. The system needs an alternate path to the resources in case a single-point-of-failure occurs in the path. These paths must be as independent as possible.
For storage devices, you must use different I/O assemblies to host each switch and interface. On the storage device, you must have different I/O interfaces or channels. For complex multivendor storage solutions, you must verify the hardware dependencies so that the entire solution provides the needed independance.
Host Bus Adapters for Multipathed Devices
If you configure multipathed devices into a domain, all of the primary paths must be on the same I/O assembly. All of the secondary, or alternate, paths should be on the same I/O assembly, if possible. This practice ensures that a failure on a primary I/O assembly causes the system to fail over to the alternate path.
Application Configuration for Quiesce
You must configure applications so that they support a system quiesce. In client-server environments, set the client timeout values appropriately to accommodate a quiesce of a server. You must configure clients to respond to the SIGHUP signal, which is sent out after a quiesce.
Using cPCI Instead of PCI
The cPCI solution allows greater flexibility in making configuration changes. Individual cPCI cards can be unconfigured or disconnected. All of the PCI cards in a PCI assembly must be unconfigured or disconnected. The cPCI boards also support the high-availability hot-swap model. They can be unconfigured or configured without having a system login. Your decision on the use of the cPCI option should be made before you order the system.
System Testing DR Operations
You should test all of the DR operations before you put the system into your production network.
Documenting DR Operations
DR is normally used for system upgrades, changes, or maintenance. You should update your system runbooks to include all of the DR steps and special information for quick DR decisions.
The Reconfiguration Manager (RCM) provides a framework to integrate application dependencies into unconfigure and disconnect operations. Based on user defined scripts, unconfigure-disconnect operations on devices can be blocked. Application changes before unconfiguring and/or disconnecting a device can be set up. Using RCM simplifies automating and configuring user application DR dependencies (refer to the rcmscript(1M) man page for more information).
Specific Considerations for the Sun Fire 38006800 Servers
This section contains specific considerations for the . The topics include domain set-up, access control lists, memory placement, POST level, power-off options for I/O assemblies, and firmware levels.
You should always have at least two domains on the . An available domain is required to perform the POST operation on the I/O assemblies. If you disconnect an I/O assembly on a server without an available domain, you will not be able to connect it to another running domain because you will not be able to test the assembly.
On the Sun Fire 6800 server, two domains can be created in a single segment mode, meaning that a free domain would not exist. To dynamically connect an I/O assembly on the Sun Fire 6800 server, the server must be configured in dual segment mode if two domains are set up.
Access Control List
The access control list (ACL) on the Sun Fire SSC prevents uncontrolled access to unassigned or available resources. It also ensures a domain has access only to its preallocated resources.
FIGURE 12 shows an ACL for a domain on a Sun Fire 6800 server. The domain has access to the CPU/Memory boards SB1, SB2, and SB5, as well as access to the I/O assemblies IB7 and IB9. The other CPU/Memory boards and I/O assemblies on the system, that are not in the ACL for this domain, cannot be connected or configured in the domain, even when they are not being used by other domains.
FIGURE 12 Access Control List for a Domain
The memory in the domain should be placed so that a minimum of CPU/Memory boards contain permanent memory and so that the requirements for the copy-rename mechanisms are fulfilled for minimum number of boards. Memory layouts can be divided into evenly and unevenly spread configurations. The layout you use will impact DR operations differently for memory operations.
The advantage of evenly placing the memory across all of the CPU/Memory boards is that every CPU/Memory board can be used as a target for a copy-rename operation.
The disadvantage is that more than one CPU/Memory board may contain permanent memory, which causes the copy-rename mechanism to apply to more than one CPU/Memory board.
The advantage of unevenly placing the memory across the CPU/Memory boards is that it can minimize the amount of CPU/Memory boards containing permanent memory. The copy-rename mechanism then only applies to those boards containing permanent memory.
The disadvantage is that only specific CPU/Memory boards can be used as a target of a copy-rename operation.
You may, however, prefer to use an even approach because the kernel, and therefore the permanent memory, grows dynamically.
Specifying the POST Level
You should specify the POST level on connect and configuration operations. When you add new hardware, the highest level of POST must be used so that you know only good components are connected.
To change the POST level, use the -o and -x options with the cfgadm(1M) command, as shown in the following code example.
# cfgadm -s "select=class(sbd)" Ap ID Type Receptacle Occupant Condition N0.IB6 PCI_I/O_boa disconnected unconfigured unknown N0.IB8 CPCI_I/O_bo connected configured ok N0.SB0 CPU_Board disconnected unconfigured unknown N0.SB2 CPU_Board connected configured ok # cfgadm -o platform=diag=default -c configure N0.SB0 # cfgadm -s "select=class(sbd)" Ap ID Type Receptacle Occupant Condition N0.IB6 PCI_I/O_boa disconnected unconfigured unknown N0.IB8 CPCI_I/O_bo connected configured ok N0.SB0 CPU_Board connected configured ok N0.SB2 CPU_Board connected configured ok
CODE EXAMPLE 6 Changing the POST Level
Using the NOPOWEROFF Option
If I/O assemblies are moved between domains and no hardware is changed, you can use the nopoweroff option to ensure that the boards retain their POST status. In FIGURE 13, you can see the condition of the IB7 I/O assembly. It is OK after being disconnected and unassigned from the domain. You can connect or configure IB7 to another domain without having to test it.
FIGURE 13 I/O Attachment Points After Using the nopoweroff Option
Any new resource, CPU/Memory board or I/O assembly, must have the same level of firmware as all of the other boards and assemblies in the domain.