Home > Articles > Operating Systems, Server > Solaris

  • Print
  • + Share This
Like this article? We recommend

Campus Cluster Topologies and Components

There are many considerations involved when planning a campus cluster topology, such as:

  • Number of cluster nodes

  • Type of interconnects

  • Type of servers and storage interconnects

  • Distance

  • Availability of a third site

Although the vast majority of clusters deployed today are two-node clusters, a more robust campus cluster consists of four nodes, two at each site, and two-way, host-based mirrors across sites. To protect against local storage failures, controller-based RAID (such as RAID-5) is used within the storage arrays.

Multipathing solutions protect against failures of the storage paths and make it very unlikely that data needs to be completely resynchronized under normal circumstances. This configuration helps ensure that in most failures, failover would only take place within the same site. This configuration would not cause any additional overhead for the administrative staff to move to another data center. It maintains full redundancy in case of a total site loss.

Some financial decision makers may question the expense involved in deploying more than a two-node campus cluster. However, using Solaris Resource Manager software, corporations can allocate some resources to nonclustered services on the remote systems, making use of otherwise idle resources while still reserving resources for failover in case of a disaster.

Quorum Devices in Campus Clusters

During cluster membership changes (for example, those caused by node outages), Sun Cluster 3.0 software uses a quorum mechanism to decide which nodes of the cluster are supposed to form the new cluster. Only a group of nodes with the majority of quorum votes may form a new cluster. This quorum mechanism helps ensure data integrity even in cases where cluster nodes cannot talk to each other because of broken interconnects. All other nodes not having majority are either shut down or prevented from accessing the data disks by means of reservation mechanisms in the storage layer (SCSI-2 and SCSI-3 persistent group reservations). Thus, only the nodes of a cluster with quorum have physical access to data.

Nodes and disks have quorum votes. By default, nodes have one vote. Dedicated quorum disks have as many votes as the sum of the nodes' votes attached to it, minus one. For example, dual-ported quorum disks have one vote, a four-node attached quorum disk has three votes. Because there must be a mechanism to break the tie in a two-node cluster, this configuration requires a quorum device. Sun Cluster 3.0 software enforces the configuration of a quorum device in these cases.

Quorum rules are not only valid for normal clusters, but for campus clusters as well. Because campus clusters are designed to protect against total site failures, it is important to understand the function of the quorum device in two- and three-site configurations.

For example, consider a typical two-site campus cluster setup. For the sake of simplicity, the quorum device (QD) is represented as a separate disk, which is not a requirement. It could be a disk in the data storage.

As shown in FIGURE 1, the quorum device (QD) is configured in site A. If site A fails completely, two out of three votes would be unavailable, leaving the node in site B with only one vote. The node in site B could not gain quorum and thus would shut down, leaving the whole cluster—even with a surviving node and a good copy of data in a local mirror—useless. When an operational node cannot communicate with its peer, it has no way of telling whether the communication paths or the peer itself are down. This situation is known as split brain. Shutting down node B in this scenario makes sense because it is unclear what has happened to A. If A is still alive, this action prevents data corruption. If it is down, the administrator has to decide how to proceed.

Figure 1FIGURE 1 Two-Site Campus Cluster Configuration

Without the quorum mechanism, each site could think it was the only survivor, then form a new cluster and start HA services. If both sites access the same data simultaneously, there is a high probability that they might cause data corruption.

In a configuration where one site is production and the other is either idle or running nonproduction work, the recommended practice is to configure the quorum device in the production site. If the remote site fails, the production site has enough votes to continue without interruption. If the production site fails, the problem cannot be overcome automatically with a two-site topology.

In this case, two options exist. Either an administrator must initiate a manual procedure to recover the quorum, or implement a third site for the quorum. Both methods are supported with Sun Cluster 3.0 software.

FIGURE 2 illustrates a three-site configuration. The quorum device is in a separate third site, C. In all scenarios where only one site is affected by a disaster, two remain in operation and provide one quorum vote each, so that the needed quorum of two votes is gained by the two surviving sites, that then form a new cluster. Therefore, a three-site configuration is highly recommended for enterprises that require fully automatic failover, even in case of a disaster affecting one site.

Figure 2FIGURE 2 Three-Site Campus Cluster Configuration

As previously noted, enterprises may choose to use a manual procedure to recover from a loss of quorum. In this situation, the node that lost quorum is unavailable and cannot boot into cluster mode. This situation makes it necessary to change the quorum device definition in the cluster configuration repository (CCR) of the other node in the surviving site to an available quorum device, then to reboot this node into cluster mode.


Experience has shown that this technique is very error prone and requires highly skilled personnel to implement correctly. This option should be considered only if the cause of loss of quorum is a total loss of a site and no system in that site is accessing any data.

A final possibility is to use a third server in the third location to serve as the quorum. This node would then serve as the third vote in a three-site configuration.

Cluster Interconnect Hardware Configurations

Typical NICs can only be used with either copper or multimode fiber cables. However, the maximum distance can be extended by converting the media to single-mode fiber.

Transceivers for Fast Ethernet adapters plug into the RJ-45 or the MII port of the NIC and convert to single-mode fiber cables that can then span more than 15 km (in this combination). This type of transceiver has been qualified for campus clusters based on Sun Cluster 2.2 software.

Similar converters exist for Gigabit Ethernet to convert from multimode fiber to single-mode fiber. Using single-mode fiber, the maximum distance can be extended at least to 5 km. However, because the public network is not part of the cluster, it is up to the administrator to extend the network appropriately.

Data and Volume Manager Configuration

Mirroring data across sites helps to ensure that a copy of data survives any disaster. For campus clusters, host-based mirroring using a volume manager is recommended. However, special care should be taken when configuring Solaris™ Volume Manager (formerly Solstice DiskSuite™) software as the volume manager, especially when distributing replicas. (Refer to the Sun product documentation for details.)

Newer releases of volume managers tend to be equipped with more intelligence regarding placement policies for mirrors. Therefore, it is even more important to have control over this placement process. It is highly recommended to use the appropriate controls provided by the volume managers to spread mirrors across sites.

The prolonged distance between sites may introduce latency problems in accessing data. Volume managers offer a property called "preferred plex," which directs read requests to the preferred local plex, thus avoiding the overhead of going to the remote storage.

Storage Configurations

Since the advent of Fibre Channel, extending the distance between servers and storage devices is no longer a problem. However, limitations in the maximum distance exist that may limit the usefulness of this technology in certain scenarios. Campus clusters using Sun Cluster 3.0 software today support the following:

  • Upgrades from Sun Cluster 2.2 software campus cluster configurations using Sun Enterprise™ servers, Fibre Channel host bus adapters (code named SOC+) and Fibre Channel arbitrated loop (FC-AL), long wave GBICs (LWGBICS), and Sun StorEdge A5x00 storage systems

  • Storage configurations using cascaded Fibre Channel switches with LWGBICs

Wave Division Multiplexers

In many areas, single-mode fiber is too costly or not available in sufficient quantities. A typical campus cluster configuration requires two wires for the storage, two for the cluster interconnect, and at least one for the public network. Additionally, many configurations have another network for backup purposes and one for administration that is connected to the terminal concentrator and other console ports. In total, a single campus cluster might need seven single-mode fiber connections.

WDMs use certain properties of the fiber to multiplex several streams onto a single fiber. Using WDMs over distances longer than 10 km has been successfully tested. This approach enables enterprises to deploy campus clusters in most geographic locations.

  • + Share This
  • 🔖 Save To Your Account