Arbitration is the act of deciding. Many forms of arbitration occur in computer systems. I/O devices arbitrate for access to an I/O bus. CPUs arbitrate for access to a multiprocessor interconnect. Network nodes arbitrate for the right to transmit on the network. In clusters, arbitration determines which nodes are part of the cluster. Once they are part of the cluster, arbitration determines how the nodes host the services. To accurately predict the behavior of the cluster and services in the event of a failure, you must understand the arbitration schemes used in the cluster.
Asymmetric arbitration is a technique commonly used when the priority of competing candidates can be established clearly and does not change. An example of this is the SCSI-2 protocol. In SCSI-2, the target address specifies the priority. If multiple targets attempt to gain control of the bus at the same time, the SCSI-2 protocol uses the target address to arbitrate the winner.
By default, the sets the host SCSI target address to 7, the highest priority. This helps ensure the stability of the bus because the host has priority over all of the I/O slave devices. Additional stability in the system is ensured by placing slow devices at a higher priority than fast devices. This is why the default CD-ROM target address is 6 and the default disk drive target addresses begin with 0.
SCSI priority arbitration creates an interesting problem for clusters using the SCSI bus for shared storage. Each address on the bus can be owned by only one target on the bus. Duplicate SCSI target addresses cause a fault condition that is difficult to detect and yields unpredictable behavior. For simple SCSI disks, each node of the cluster must have direct access to the bus. Therefore, one node must have a higher priority than the other node. In practice, this requirement rarely results in a problem, because the disks themselves are the unit of ownership and both nodes have higher priority than the disks.
Asymmetric arbitration is also used for Fibre Channel Arbitrated Loop (FC-AL). FC-AL is often used instead of the SCSI bus as a storage interconnect because of its electrical isolation (electrical faults are not propagated by fiber), its long distance characteristics (components can be separated by many kilometers), and its ability to be managed as a network. The priority of FC-AL nodes or ports is based on the physical loop address. Each candidate that wants to send data must first send an arbitration request around the loop to all other candidates. Once the port that is receiving the arbitration request approves and detects it, the candidate can send data. An obvious consequence is that greater numbers of candidates increase the arbitration time. Also, the candidates can be separated by several kilometers, resulting in additional latency, and the time required to arbitrate may significantly impact the amount of actual throughput of the channel.
Symmetric arbitration is used when the candidates are peers and a priority-based arbitration is not used. This case is commonly found in symmetric multiprocessor (SMP) systems and networks. Arbitration is required so that all candidates can be assured that only one candidate has ownership of the shared component.
10BASE-T and 100BASE-T Ethernet networks use a carrier sense, multiple access, with collision detection (CSMA/CD) arbitration scheme. This scheme allows two or more nodes to share a common bus transmission medium [Madron89]. The node listens for the network to become idle, then begins transmitting. The node continues to listen while transmitting to detect collisions, which happen when two or more nodes are transmitting simultaneously. If a collision is detected, the node will stop transmitting, wait a random period of time, listen for idle, and retransmit. Stability on the network is assured by changing the random wait time to increase according to an algorithm called truncated binary exponential backoff. With this method it is difficult to predict when a clean transmission will be possible, which makes 10BASE-T or 100BASE-T Ethernet unsuitable for isochronous workloads on networks with many nodes. Also, many busy nodes can result in low bandwidth utilization.
The faster versions of Ethernet, 1000BASE-SX and 1000BASE-T (also known as Gigabit Ethernet), are only available as full duplex, switched technologies, thereby eliminating the CSMA/CD arbitration issues on the common medium. For larger networks, the arbitration role is moved into the network switch. Direct, node-to-node connections are full duplex, requiring no arbitration.
Voting and Quorum
Voting is perhaps the most universally accepted method of arbitration. It is time tested for many centuries in many formspopularity contests, selecting members of government, and so forth. This is the method used by software for arbitration of cluster membership (see "Majority Voting and Quorum Principles").
One problem with voting is pluralitythe leading candidate gains more votes than the other candidates, but not more than half of the total votes cast. Many different techniques can be used during these special casesrun-off elections, special vote by a normally nonvoting member, and so forth. Another problem with voting is that ties can occur.
A further problem with voting is that it can be time consuming. The act of voting as well as the process of counting votes is time consuming. This may not scale well for large populations.
Occasionally, voting is confused with a quorum. They are similar but distinct. A vote is usually a formal expression of opinion or will in response to a proposed decision. A quorum is defined as the number, usually a majority of officers or members of a body, that, when duly assembled, is legally competent to transact business [Webster87]. Both concepts are important; the only vote that should ratify a decision is the vote of a quorum of members. For clusters, the quorum defines a viable cluster. If a node or group of nodes cannot achieve a quorum, they should not start services because they risk conflicting with an established quorum.