Sun Fire Midframe Server Best Practices for Firmware Update 5.13.x
The Sun Fire 3800, 4800, 4810, and 6800 midframe servers provide new functionality to monitor, diagnose, and administer the system, which can increase overall system reliability, availability, and serviceability (RAS). Much of the new functionality is available through the Sun Fire System Controller (SC), which is a central component of the Sun Fire midframe server.
This article revisits existing best practices that were published in "Sun Fire? Midframe Server Best Practices for Administration", Sun BluePrints Online, October 2001. The October 2001 article has been used as the basis for this current article, so it is not a prerequisite of reading this article. However, you should familiarize yourself with the content in the previous best practices article so that you are familiar with the differences between the firmware updates.
This article introduces new best practices based on the enhancements that have been made to the SC in firmware update 5.13.x. If you are already familiar with the prior update, you can concentrate on the new enhancements, which include:
SC failover (see "Configuring the Sun Fire SC Failover")
SNMP support (see "Configuring SNTP")
This article also includes discussions on the following topics:
Error analysis and diagnosis
SC maintenance procedures
While many recommendations made here apply to the majority of cases, not all recommendations may apply to every circumstance.
This section contains descriptions of how to configure the Sun Fire Midframe platform. The topics include:
"Configuring the RS-232 Serial Port"
"Configuring the Ethernet Port"
"Configuring a Switched Private Network"
"Configuring the Sun Fire SC Failover"
"Setting the Date and Time on the Platform"
"Changing POST Levels and Other Settings"
"Configuring the Midframe Service Processor"
"Configuring the Sun MC Software"
"Preparing for Firmware Updates"
"Configuring the Sun Explorer Software Utility"
Configuring the RS-232 Serial Port
You can access the SC through the built-in RS-232 serial port or through its 10/100BASE-T Ethernet port. Be sure that access to the serial port is available during the initial setup of the SC because it is the only connection on which the SC power-on self-test (SCPOST) output can be viewed. The port settings should be 9600 bps, 8-bits, no parity, and 1 stop bit (9600-8-N-1).
You can also access the serial port by using a network terminal server or by using the serial port on a Midframe Service Processor (MSP). For more information about the need for an MSP and on how to configure the MSP, refer to "Configuring the Midframe Service Processor" on page 11.
After you have set up the SC, serial port access should continue to be available on demand to provide an alternate access path to the SC in the event of a network problem, firmware updates, or SC reboots or resets. Serial port access is also required to monitor certain SC and platform related errors because it is where these errors will be displayed.
Configuring the Ethernet Port
The Ethernet port should be used as the primary connection path for the speed, multisession access, and logging capabilities it provides. Ethernet connections to the SC are accomplished by using a Telnet session. A 100BASE-T link is strongly recommended for the SC Ethernet connection and required for use with Sun? Management Center (Sun MC) software. The Ethernet port should not be used instead of the RS-232 serial port connection, but should be used in addition to the RS-232 port.
Configuring a Switched Private Network
You should configure the SC on a switched private network. If you are configuring two SCs for the network, assign each SC a separate IP address so that they do not conflict with each other on the network. If the SC failover functionality introduced in firmware 5.13.x is used, a third IP address representing the logical hostname can be assigned.
FIGURE 1 illustrates a simplified network topology. The MSP is a workstation placed on the private ethernet network of the Sun Fire Platform to provide administrative support functions to the Sun Fire platform and the SCs.
FIGURE 1 Simplified Network Topology
The serial connection in the illustration above can be replaced with a terminal concentrator if the same MSP is monitoring multiple Sun Fire platforms. If the terminal concentrator supports encrypted logins and sessions (for example, by using secure shell), the terminal concentrator may be connected to the public Ethernet network. A terminal concentrator is recommended to improve the ability to a single MSP to monitor multiple platforms.
Configuring the Sun Fire SC Failover
You should use two SCs in a Sun Fire system to provide failover of the SC functionality in case one of the SCs fail and to keep the domains in the system running. Prior to firmware version 5.13.0, if the main SC suffered a failure, administrative capabilities such as the ability to access domain consoles would be lost. With the introduction of firmware 5.13.0, full failover is available, so if the main SC fails, the spare SC can take over administrative functions, in addition to the system clock functionality.
When enabled, the two SCs communicate with each other by using an internal communications link. They also exchange health information and synchronize internal configuration information over the link. The SC that is acting as the main system controller also generates a heartbeat. If the heartbeat unexpectedly disappears, the spare SC takes over the main functionality.
Before enabling the SC failover feature, both SCs and all of the boards in a Sun Fire platform should be at the same firmware version. While it is possible to have mixed versions of firmware under certain circumstances, it is recommended that all of the boards and the SCs use the same version of firmware.
You can determine the firmware version by using the showboards command, as follows:
heslab-16-sc0:SC> showboards -p version Component Compatible Version --------- ---------- ------- SSC0 Reference 5.13.2 /N0/IB6 Yes 5.13.2 /N0/IB8 Yes 5.13.2 /N0/SB0 Yes 5.13.2 /N0/SB2 Yes 5.13.2 heslab-16-sc0:SC>
The above output does not include the version of firmware from the spare SC. To gather that information, you must connect to the spare SC and use the showsc command to determine the ScApp revision, as in the following:
heslab-16-sc1:sc> showsc SC: SSC1 Spare System Controller SC Failover: disabled SC date: Sun Oct 06 14:06:58 PDT 2002 SC uptime: 2 days 6 hours 35 minutes 8 seconds ScApp version: 5.13.2 RTOS version: 23 heslab-16-sc1:sc>
The SC failover software introduces a number of new commands and settings that need attention. You should use the showfailover command to check on failover status. The -v option gives the most information about the configuration.
heslab-16-sc0:SC> showfailover -v SC: SSC0 Main System Controller SC Failover: enabled and active. Clock failover enabled. heslab-16-sc0:SC>
The above information shows that the showfailover -v command was run on SSC0, that the SC is currently in the role of main, and that both SC administrative function failover and clock failover are enabled and functioning. You should run the showfailover -v command whenever you reboot a SC to ensure that the SC failover functionality has restarted and is functioning properly.
You can obtain an additional piece of information about the SC failover status by using the showplatform -p sc command, as in the following example.
heslab-16-sc0:SC> showplatform -p sc SC POST diag Level: max SC Failover: enabled and active. Logical Hostname: heslab-16-sc heslab-16-sc0:SC>
In the above example, the value for the logical hostname is displayed. Each SC continues to have a unique IP address assigned to it. The logical hostname is a third IP address that always points at whichever SC is currently functioning in the role of main. In the figure below, the logical SC is the logical hostname.
FIGURE 2 Sun Fire SC Logical Hostname
The following is an example of how to set up the Sun Fire SC failover functionality.
heslab-16-sc0:SC> setupplatform -p sc SC -- SC POST diag Level [max]: max Enable SC Failover? [no]: yes Logical Hostname or IP Address : heslab-16-sc heslab-16-sc0:SC>
To force the spare SC to assume the role of main, use the setfailover force command. This should not be necessary under normal operating conditions, but the functionality should be tested during a maintenance window after you enable the failover functionality to verify correct failover operations.
The use of SNTP, as discussed in "Configuring SNTP" on page 10, is strongly recommended with SC failover. If SNTP is not enabled, the time on the two SCs needs to be checked to ensure that they are the same. Because the domains on a Sun Fire platform derive their time based on the time set on the SCs, the time on the running domains could be affected after an SC failover if the time on the SCs is not synchronized.
Even though SC Failover copies configuration information from the main SC to the spare SC, it is not a replacement for backing up the SC. Users should still perform a dumpconfig of the SC after enabling failover and on a regular basis afterwards.
Setting the Date and Time on the Platform
When a Sun Fire platform is installed, the platform time should be set from the platform shell and in each individual domain using the setdate command. Each domain shell can have a separate time setting, so setting each one individually is required.
The following shows an example of how to set the date and time.
heslab-16-sc0:SC> setdate 09081228 Sun Sep 08 12:28:00 PDT 2002 heslab-16-sc0:SC>
You can set the time and date on the domains in a similar manner. Use the setdate -h command for additional help and options for setting the time.
The following is an example of the output from the setdate -t list command, which is helpful in determining the correct time zone to use for your locale:
heslab-16-sc0:SC> setdate -t list list: is not a valid timezone, valid time zones are: ACT GMT+9.5 Central Standard Time (Northern Territory) AET GMT+10 Eastern Standard Time (New South Wales) AGT GMT-3 Argentine Time ART GMT+2 Eastern European Time AST GMT-9 Alaska Standard Time BET GMT-3 Brazil Time BST GMT+6 Bangladesh Time CAT GMT+2 Central African Time CNT GMT-3.5 Newfoundland Standard Time CST GMT-6 Central Standard Time CTT GMT+8 China Standard Time EAT GMT+3 Eastern African Time ECT GMT+1 Central European Time EET GMT+2 Eastern European Time EST GMT-5 Eastern Standard Time GMT GMT+0 Greenwich Mean Time HST GMT-10 Hawaii Standard Time IET GMT-5 Eastern Standard Time IST GMT+5.5 India Standard Time JST GMT+9 Japan Standard Time MET GMT+3.5 Iran Time MIT GMT-11 West Samoa Time MST GMT-7 Mountain Standard Time NET GMT+4 Armenia Time NST GMT+12 New Zealand Standard Time PLT GMT+5 Pakistan Time PNT GMT-7 Mountain Standard Time PRT GMT-4 Atlantic Standard Time PST GMT-8 Pacific Standard Time SST GMT+11 Solomon Is. Time UTC GMT+0 Coordinated Universal Time VST GMT+7 Indochina Time heslab-16-sc0:SC>
With SC firmware versions 5.13.0 and higher, the SC is capable of sychronizing its time-of-day clock with a network time server using SNTP. SNTP usage is encouraged to keep the SCs at an accurate time.
The following shows an example of how to enable SNTP.
heslab-16:A> setupdomain -p sntp SNTP ---- SNTP server : 10.1.63.251 heslab-16:A>
Changing POST Levels and Other Settings
To provide thorough testing of all components, the power-on self-test (POST) level for both the SC and domains should be set to maximum. Maximum is the default level for all domains. If you cannot always run the maximum POST level, you should at least use it during system installation. You should also use the maximum level of POST in other circumstances, such as if hardware is being replaced or moved after an unexpected system or power failure or when the hardware is suspected of causing system problems.
The following shows an example of the platform SCPOST level you should use:
heslab-16-sc0:SC> setupplatform -p sc SC -- SC POST diag Level [max]: max Enable SC Failover? [no]: yes Logical Hostname or IP Address : heslab-16-sc heslab-16-sc0:SC>
For SCs running firmware versions lower than 5.13.0, the parameters for controlling SC Failover will not be visible.
The following show an example of the domain boot parameters we recommend:
heslab-12:B> showdomain -p bootparams diag-level = max verbosity-level = off error-level = max interleave-scope = within-board interleave-mode = optimal reboot-on-error = true error-policy = diagnose OBP.use-nvramrc? = true OBP.auto-boot? = true OBP.error-reset-recovery = sync heslab-12:B>
A value of default is equal to max in the case of the domain diag-level parameter that controls the domain POST level.
Of special note is the addition of the error-policy value in firmware versions 5.13.x, or higher. This value affects the behavior of the system if the SC detects a hardware error condition by determining whether the error message only is displayed or whether the error message is displayed with a diagnostic recommendation. It is recommended that error-policy be set to diagnose.
Configuring the Midframe Service Processor
An external system to help with the administration of Sun Fire servers is helpful because of the need to access and monitor the SC on a regular basis (console output or SC platform messages) and because the SC attempts to log messages to an external host (by using SNMP or syslog).
This section contains descriptions of how to configure the Midframe Service Processor (MSP):
Configuring log messages on the MSP
Configuring the Sun MC software
Configuring the Sun Explorer software utility
The MSP provides a centralized and secured access point for logging these messages, and it provides support services that the SC cannot provide. While the Sun Fire platform is theoretically self-contained, for ease of problem diagnosis, accessibility to platform information, and updating system firmware and software, an MSP is strongly recommended to provide a centralized location for these functions.
This article does not recommend any particular type of MSP because each site's needs (for instance, the number of systems to monitor and the requirement for the Sun MC software) are generally different. In addition, the requirements of individual sites might conflict. For example, syslog(3) does not require as many system resources to monitor hosts as does the Sun MC software. However, because of the limited number of syslog(3) logging facilities available per host, it might not be possible to monitor as many systems as a single, larger Sun MC server can, without generating large unmanageable log files.
To Configure Log Messages on the MSP
To be able to log messages sent to the MSP with syslog(3) from a SC, you need to make additions to the default /etc/syslog.conf file on the syslog(3) host, as described in the following steps. The additions should correspond to the settings made on the Sun Fire platform.
Add the following entries to the syslog.conf file:
local0.notice /var/log/messages.platform local1.notice /var/log/messages.domain-A local2.notice /var/log/messages.domain-B local3.notice /var/log/messages.domain-C local4.notice /var/log/messages.domain-D
The above entries should be separated by tabs; otherwise, syslogd will fail.
Create the log files before restarting syslog(3) by entering the following commands.
You must ensure that the files have the appropriate permissions.
nerm# touch /var/log/messages.platform nerm# touch /var/log/messages.domain-A nerm# touch /var/log/messages.domain-B nerm# touch /var/log/messages.domain-C nerm# touch /var/log/messages.domain-D
Restart the syslogd(1M) daemon, or reboot the MSP.
Verify that syslog(3) is working correctly by using the logger(1) command:
nerm# logger -p local0.notice "Platform test message" nerm# logger -p local1.notice "Domain A test message" nerm# logger -p local2.notice "Domain B test message" nerm# logger -p local3.notice "Domain C test message" nerm# logger -p local4.notice "Domain D test message"
Verify that the test message is logged in the appropriate log file.
Verify that the SC is logging properly by entering the setkeyswitch command:
heslab-12:B> setkeyswitch off heslab-12:B> setkeyswitch on
Verify that the POST results are sent to the log files.
Periodically, you should rotate the log files to prevent them from growing too large. You can do this by setting up additional scripts such as /usr/lib/newsyslog to run on a regular basis. The newsyslog script modifies the contents of the additional scripts to rotate the specified log files. Rotate the files on a monthly basis, and keep archived copies of the information for at least two months.
For the Solaris 8 OE, and earlier releases, add five entries to the existing /usr/lib/newsyslog file to rotate the five log files referenced above.
The following code contains an example of an entry:
# LOGDIR=/var/log LOG=messages.platform if test -d $LOGDIR then cd $LOGDIR if test -s $LOG then test -f $LOG.6 && mv $LOG.6 $LOG.7 test -f $LOG.5 && mv $LOG.5 $LOG.6 test -f $LOG.4 && mv $LOG.4 $LOG.5 test -f $LOG.3 && mv $LOG.3 $LOG.4 test -f $LOG.2 && mv $LOG.2 $LOG.3 test -f $LOG.1 && mv $LOG.1 $LOG.2 test -f $LOG.0 && mv $LOG.0 $LOG.1 mv $LOG $LOG.0 cp /dev/null $LOG chmod 644 $LOG sleep 40 fi fi
Be sure to create five unique entries for each of the five log files by changing the name of the file in the LOG value. Also, be sure that the last line in the file remains as follows:
kill -HUP ´cat /etc/syslog.pid´
For the Solaris 9 OE, use the logadm command to set up the rotation of the log files.
The logadm command replaces the functionality of the /usr/lib/newsyslog file.
nerm# logadm -w /var/log/messages.platform -C 8 -a 'kill -HUP ´cat /var/run/syslog.pid´'
This command must be entered on one line.
Use the following command for the domains:
nerm# logadm -w /var/log/messages.domain-A -C 8 -a 'kill -HUP ´cat /var/run/syslog.pid´'
Be sure to change the message file name for all of the domains on the system.
As mentioned, syslog(3) facilities are limited, so plan ahead and organize the limited number of resources to effectively enable an administrator to quickly locate data. In addition, you should establish a mechanism to parse and sort the incoming information on a regular basis and to send the administrator an email message of the changes. Further information on the configuration of syslog can be found in the Solaris OE system administration guides.
Configuring the Sun MC Software
A Sun MC software server normally requires a higher level of system resources, such as a correctly configured dual processor system capable of having 1 gigabyte of RAM or more. However, a Sun MC software server also has a greater capability to monitor and administer a large number of systems. Whether or not the Sun MC software proxy agent is running on the same host as the server agent might influence the Sun MC software server configuration.
The Sun MC software should be implemented with two systems. One small system should act as a proxy agent for one or more Sun Fire platforms, and the second system should be a larger Sun MC software server that is tasked with monitoring the entire network. This configuration provides additional monitoring capabilities in case the system containing the Sun MC software server becomes unavailable. It also provides flexibility in the MSP and security configuration.
To be able to monitor SNMP traps generated by the SC, you must install the Sun MC 3.0 Platform Update 1, or higher. This version of Sun MC is available with the Solaris 8 OE 04/01 release. Currently, the Sun MC software is the only package that can understand the SNMP traps generated by the SC. No MIBS are publicly available. Refer to the Sun Management Center 3.0 Supplement for Sun Fire 6800, 4810, 4800, and 3800 Systems for additional installation and setup information.
Preparing for Firmware Updates
For purposes of firmware updates to the SC, you must set up an FTP or HTTP service on the MSP. You can set up an anonymous FTP server by following the instructions in the ftpd(1M) man page, or you can use normal FTP by specifying a user and password in the FTP URL. If the MSP uses the Solaris 8 OE or higher, a version of the Apache Web server is provided with the Solaris OE, which you can use to provide HTTP services. Because the HTTP service is more configurable than the FTP service and because it may be restricted to listen only on certain network interfaces, the HTTP can have less of a security impact than FTP. For additional information, refer to the "Securing the Sun Fire? Midframe System Controller Updated for 5.13.x", which is available from: http://www.sun.com/blueprints
You can install the operating system for Sun Fire domains either from an attached DVD-ROM drive or over the network from a Solaris JumpStart? software server. The function of a JumpStart software server may also be well suited for an MSP. Detailed instructions for setting up a Solaris JumpStart software server can be found in the Solaris OE systems administration guides.
When choosing an appropriate MSP (or MSPs), some additional capabilities need to be considered, such as how to access the serial ports on multiple SCs and how many devices need to be monitored on the same system. For example, the Sun StorEdge? T3 Array may also need to be monitored by the same MSP.
Configuring the Sun Explorer Software Utility
After completing the initial installation of a Sun Fire server, you should install the Sun Explorer software utility on both the server and the MSP, and you should set it up to periodically collect system configuration information and error messages. Check the following site regularly for updates to the Sun Explorer software: http://sunsolve.sun.com
If possible, the output from the Sun Explorer software should be automatically sent to the Sun Explorer software database at the email address you specified when you set up the software. You should use version 220.127.116.11, or higher, because it has the capability to gather data from the SC. Version 3.5, or higher, can collect data from the SC; however, version 18.104.22.168, or higher, is recommended because it includes the ability to gather information about the components in the system, such as the field-replaceable unit ID (FRUID). The FRUID information is stored in each component in a Sun Fire system. It contains information about the parts, such as the revision level, serial number, and manufacturing information.
The following command gathers information from the SC. You should execute it on the MSP. In addition, use of the following command assumes that the Sun Explorer software has already been installed on the system in the default location: /opt/SUNWexplo
nerm# /opt/SUNWexplo/bin/explorer -w fru,scextended,default
If you execute this command from an MSP, the Explorer software will collect data from the MSP and the SC. To collect data from the SC, the Explorer software uses a telnet connection; therefore, the MSP must be able to establish a telnet session on the SC.
If security considerations prevent the automatic sending of the Sun Explorer software results to the Sun Explorer software database, you should still install the Sun Explorer software utility so that it is available to collect information in the event that service is required on the system and information needs to be collected.
The initial installation is also a good time to record and check the system serial number, hostid, and MAC address provided with the system and to become familiar with how these values are reported by the SC showplatform -v command. Keep this information where it can be easily accessed in case a SC replacement is required.