System Management Services Software: An Inside Look
The Sun Fire™ 15K server is monitored and controlled by a system controller (SC) that runs System Management Services (SMS) software in the Solaris™ Operating Environment (Solaris OE). SMS is highly integrated into the Solaris OE and uses several of its features while performing its role for the platform.
This Sun BluePrints™ OnLine article addresses some of the more advanced topics of SMS software including the Management Network (MAN) and SMS security. In addition, it provides insight to a new security feature that is available through a patch to SMS version 1.2 and that is integrated in follow-on releases such as the upcoming SMS version 1.3. This new feature provides the much requested capability to use secure shell for file synchronization between SCs.
This article contains the following sections:
"Understanding the Management Network (MAN)"
"Managing Complete Link Failure"
"Understanding SMS Security Design"
"Using ssh for SC to SC File Propagation"
Understanding the Management Network (MAN)
The Sun Fire 15K server has two SCs within the platform cabinet. One of the SCs is designated as the main SC, and the other is designated as the spare SC. These two SCs must be kept synchronized with one another and must be apprised of each others' statuses. The Sun Fire 15K server allows the hardware to be partitioned into one or more environments that are capable of running separate images of the Solaris OE, commonly referred to as domains. With this in mind, the main SC has the following additional functions:
Controlling dynamic reconfiguration (DR)
Providing consoles for each domain
Recording message logs for each domain
Assisting with the installation of Solaris OE domains
Synchronizing time between the SC and domains
These functions are implemented over a network that is internal to the platform chassis called the Management Network or MAN. The MAN is not a general-purpose network and should only be used for its intended purpose. To that end, neither the domains themselves, nor the SC will route traffic to these networks by default, other than as mentioned above.
FIGURE 1 MAN Overview
The SC-to-domain network is referred to as the I1 network. It is constructed from 18 separate 100BASE-T network interface controllers (NICs) on each SC, which then connect to an Ethernet hub on each input/output (I/O) board, forming a point-point network between the SC and each I/O board. Because the Ethernet hub has connections from each SC, it is programmable so that only the NIC connecting to the main SC is active. These networks operate at half-duplex because the hub on the I/O board does not support full-duplex transfers.
Between the two SCs themselves, another network referred to as the I2 network exists. This network operates at 100-megabit full-duplex and does not involve the use of hubs.
I1 MAN Functions
- Domain consoles
- Message logging
- Dynamic reconfiguration
- Network boot/Solaris OE installation
- Time synchronization
I2 MAN Functions
- SC heartbeat
- File synchronization
The MAN drivers on the SC create meta-interfaces for individual NICs to reduce complexity and administrative overhead. The drivers also implement the concept of a community network that can be monitored by the SC and that can be configured in a highly available manner by using both external NICs on the SC. A floating IP address is created and owned by the main SC, as an aid to reach the main SC for external clients that might not have specific knowledge of which SC is performing a certain role at a given time.
FIGURE 2 Simplified MAN Overview
Note that it is possible for there to be multiple interfaces on the domain side of the I1 network. A single meta-interface represented by scman0 on the SC is created. It reacts to path failures and automatically switches the active network path, provided that one exists. It also enforces domain isolation, keeping domain traffic exclusively between the domain and SC, while making various point-point links appear as a normal Ethernet network. On the domain side of the network, there is a corresponding meta-interface called dman0.
The I2 network has its own meta-interface called scman1. Unlike I1 network scman0 interfaces, scman1 interfaces are active on both SCs. The scman1 network has two possible paths, and the network driver will detect path failures and switch paths automatically just as the scman0 meta-interface does.
Automatic path switchover is handled as follows:
I1 network. By default, a domain's active NIC is on the I/O board that contains the golden input/output static random access memory (IOSRAM). IOSRAMs are located on every I/O board, with the golden IOSRAM acting as the master IOSRAM, hence the term golden. This is typically the lowest numbered I/O board in the domain; however, this may not always be the case. dman0 pings the SC through the active NIC every 10 seconds. Every 30 seconds, dman0 checks the inbound packet count. If the packet count has increased, the connection is considered good. If not, a path switch is started and the next available path is selected.
I2 network. By default, the active NIC is eri0, but this may change in the future. scman1 on the main SC pings the spare SC through the active NIC every 10 seconds. Every 30 seconds, scman1 on the main SC checks the inbound packet count. If the packet count has increased, the connection is considered good. If not, the active path is switched to the other NIC, provided it was not previously marked as failed.