Home > Articles > Operating Systems, Server > Solaris

Sun Fire Midframe Server Best Practices for Firmware Update 5.13.x

  • Print
  • + Share This
This article is an update to the October 2001 Sun BluePrints OnLine article, "Sun Fire Midframe Servers Best Practices for Administration," and includes updated information for connecting and configuring the Sun Fire system controller and introduces SC administration concepts, platform security, and error analysis and diagnosis. This article update also introduces new features available with the 5.13.x firmware release for the Sun Fire SC, which further improve on the reliability, availability, and serviceability of Sun Fire Midframe Servers.
Like this article? We recommend

The Sun Fire 3800, 4800, 4810, and 6800 midframe servers provide new functionality to monitor, diagnose, and administer the system, which can increase overall system reliability, availability, and serviceability (RAS). Much of the new functionality is available through the Sun Fire System Controller (SC), which is a central component of the Sun Fire midframe server.

This article revisits existing best practices that were published in "Sun Fire? Midframe Server Best Practices for Administration", Sun BluePrints Online, October 2001. The October 2001 article has been used as the basis for this current article, so it is not a prerequisite of reading this article. However, you should familiarize yourself with the content in the previous best practices article so that you are familiar with the differences between the firmware updates.

This article introduces new best practices based on the enhancements that have been made to the SC in firmware update 5.13.x. If you are already familiar with the prior update, you can concentrate on the new enhancements, which include:

  • SC failover (see "Configuring the Sun Fire SC Failover")

  • SNMP support (see "Configuring SNTP")

This article also includes discussions on the following topics:

  • Platform configuration

  • Platform administration

  • Platform security

  • Error analysis and diagnosis

  • SC maintenance procedures

While many recommendations made here apply to the majority of cases, not all recommendations may apply to every circumstance.

Platform Configuration

This section contains descriptions of how to configure the Sun Fire Midframe platform. The topics include:

  • "Configuring the RS-232 Serial Port"

  • "Configuring the Ethernet Port"

  • "Configuring a Switched Private Network"

  • "Configuring the Sun Fire SC Failover"

  • "Configuring SNTP"

  • "Setting the Date and Time on the Platform"

  • "Changing POST Levels and Other Settings"

  • "Configuring the Midframe Service Processor"

  • "Configuring the Sun MC Software"

  • "Preparing for Firmware Updates"

  • "Configuring the Sun Explorer Software Utility"

Configuring the RS-232 Serial Port

You can access the SC through the built-in RS-232 serial port or through its 10/100BASE-T Ethernet port. Be sure that access to the serial port is available during the initial setup of the SC because it is the only connection on which the SC power-on self-test (SCPOST) output can be viewed. The port settings should be 9600 bps, 8-bits, no parity, and 1 stop bit (9600-8-N-1).

You can also access the serial port by using a network terminal server or by using the serial port on a Midframe Service Processor (MSP). For more information about the need for an MSP and on how to configure the MSP, refer to "Configuring the Midframe Service Processor" on page 11.

After you have set up the SC, serial port access should continue to be available on demand to provide an alternate access path to the SC in the event of a network problem, firmware updates, or SC reboots or resets. Serial port access is also required to monitor certain SC and platform related errors because it is where these errors will be displayed.

Configuring the Ethernet Port

The Ethernet port should be used as the primary connection path for the speed, multisession access, and logging capabilities it provides. Ethernet connections to the SC are accomplished by using a Telnet session. A 100BASE-T link is strongly recommended for the SC Ethernet connection and required for use with Sun? Management Center (Sun MC) software. The Ethernet port should not be used instead of the RS-232 serial port connection, but should be used in addition to the RS-232 port.

Configuring a Switched Private Network

You should configure the SC on a switched private network. If you are configuring two SCs for the network, assign each SC a separate IP address so that they do not conflict with each other on the network. If the SC failover functionality introduced in firmware 5.13.x is used, a third IP address representing the logical hostname can be assigned.

FIGURE 1 illustrates a simplified network topology. The MSP is a workstation placed on the private ethernet network of the Sun Fire Platform to provide administrative support functions to the Sun Fire platform and the SCs.

FIGURE 1 Simplified Network Topology

The serial connection in the illustration above can be replaced with a terminal concentrator if the same MSP is monitoring multiple Sun Fire platforms. If the terminal concentrator supports encrypted logins and sessions (for example, by using secure shell), the terminal concentrator may be connected to the public Ethernet network. A terminal concentrator is recommended to improve the ability to a single MSP to monitor multiple platforms.

Configuring the Sun Fire SC Failover

You should use two SCs in a Sun Fire system to provide failover of the SC functionality in case one of the SCs fail and to keep the domains in the system running. Prior to firmware version 5.13.0, if the main SC suffered a failure, administrative capabilities such as the ability to access domain consoles would be lost. With the introduction of firmware 5.13.0, full failover is available, so if the main SC fails, the spare SC can take over administrative functions, in addition to the system clock functionality.

When enabled, the two SCs communicate with each other by using an internal communications link. They also exchange health information and synchronize internal configuration information over the link. The SC that is acting as the main system controller also generates a heartbeat. If the heartbeat unexpectedly disappears, the spare SC takes over the main functionality.

Before enabling the SC failover feature, both SCs and all of the boards in a Sun Fire platform should be at the same firmware version. While it is possible to have mixed versions of firmware under certain circumstances, it is recommended that all of the boards and the SCs use the same version of firmware.

You can determine the firmware version by using the showboards command, as follows:

heslab-16-sc0:SC> showboards -p version

Component   Compatible Version
---------   ---------- -------
SSC0        Reference  5.13.2
/N0/IB6     Yes        5.13.2
/N0/IB8     Yes        5.13.2
/N0/SB0     Yes        5.13.2
/N0/SB2     Yes        5.13.2

heslab-16-sc0:SC>

The above output does not include the version of firmware from the spare SC. To gather that information, you must connect to the spare SC and use the showsc command to determine the ScApp revision, as in the following:

heslab-16-sc1:sc> showsc

SC: SSC1
Spare System Controller
SC Failover: disabled

SC date: Sun Oct 06 14:06:58 PDT 2002
SC uptime: 2 days 6 hours 35 minutes 8 seconds

ScApp version: 5.13.2
RTOS version: 23

heslab-16-sc1:sc>

The SC failover software introduces a number of new commands and settings that need attention. You should use the showfailover command to check on failover status. The -v option gives the most information about the configuration.

heslab-16-sc0:SC> showfailover -v

SC: SSC0 
Main System Controller
SC Failover: enabled and active.
Clock failover enabled.

heslab-16-sc0:SC>

The above information shows that the showfailover -v command was run on SSC0, that the SC is currently in the role of main, and that both SC administrative function failover and clock failover are enabled and functioning. You should run the showfailover -v command whenever you reboot a SC to ensure that the SC failover functionality has restarted and is functioning properly.

You can obtain an additional piece of information about the SC failover status by using the showplatform -p sc command, as in the following example.

heslab-16-sc0:SC> showplatform -p sc

SC POST diag Level: max
SC Failover: enabled and active.
Logical Hostname: heslab-16-sc

heslab-16-sc0:SC>

In the above example, the value for the logical hostname is displayed. Each SC continues to have a unique IP address assigned to it. The logical hostname is a third IP address that always points at whichever SC is currently functioning in the role of main. In the figure below, the logical SC is the logical hostname.

FIGURE 2 Sun Fire SC Logical Hostname

The following is an example of how to set up the Sun Fire SC failover functionality.

heslab-16-sc0:SC> setupplatform -p sc

SC
--
SC POST diag Level [max]: max
Enable SC Failover? [no]: yes
Logical Hostname or IP Address []: heslab-16-sc

heslab-16-sc0:SC>

To force the spare SC to assume the role of main, use the setfailover force command. This should not be necessary under normal operating conditions, but the functionality should be tested during a maintenance window after you enable the failover functionality to verify correct failover operations.

The use of SNTP, as discussed in "Configuring SNTP" on page 10, is strongly recommended with SC failover. If SNTP is not enabled, the time on the two SCs needs to be checked to ensure that they are the same. Because the domains on a Sun Fire platform derive their time based on the time set on the SCs, the time on the running domains could be affected after an SC failover if the time on the SCs is not synchronized.

Even though SC Failover copies configuration information from the main SC to the spare SC, it is not a replacement for backing up the SC. Users should still perform a dumpconfig of the SC after enabling failover and on a regular basis afterwards.

Setting the Date and Time on the Platform

When a Sun Fire platform is installed, the platform time should be set from the platform shell and in each individual domain using the setdate command. Each domain shell can have a separate time setting, so setting each one individually is required.

The following shows an example of how to set the date and time.

heslab-16-sc0:SC> setdate 09081228
Sun Sep 08 12:28:00 PDT 2002
heslab-16-sc0:SC>

You can set the time and date on the domains in a similar manner. Use the setdate -h command for additional help and options for setting the time.

The following is an example of the output from the setdate -t list command, which is helpful in determining the correct time zone to use for your locale:

heslab-16-sc0:SC> setdate -t list
list: is not a valid timezone, valid time zones are:

ACT  GMT+9.5   Central Standard Time (Northern Territory)
AET  GMT+10    Eastern Standard Time (New South Wales)
AGT  GMT-3     Argentine Time
ART  GMT+2     Eastern European Time
AST  GMT-9     Alaska Standard Time
BET  GMT-3     Brazil Time
BST  GMT+6     Bangladesh Time
CAT  GMT+2     Central African Time
CNT  GMT-3.5   Newfoundland Standard Time
CST  GMT-6     Central Standard Time
CTT  GMT+8     China Standard Time
EAT  GMT+3     Eastern African Time
ECT  GMT+1     Central European Time
EET  GMT+2     Eastern European Time
EST  GMT-5     Eastern Standard Time
GMT  GMT+0     Greenwich Mean Time
HST  GMT-10    Hawaii Standard Time
IET  GMT-5     Eastern Standard Time
IST  GMT+5.5   India Standard Time
JST  GMT+9     Japan Standard Time
MET  GMT+3.5   Iran Time
MIT  GMT-11    West Samoa Time
MST  GMT-7     Mountain Standard Time
NET  GMT+4     Armenia Time
NST  GMT+12    New Zealand Standard Time
PLT  GMT+5     Pakistan Time
PNT  GMT-7     Mountain Standard Time
PRT  GMT-4     Atlantic Standard Time
PST  GMT-8     Pacific Standard Time
SST  GMT+11    Solomon Is. Time
UTC  GMT+0     Coordinated Universal Time
VST  GMT+7     Indochina Time

heslab-16-sc0:SC>

Configuring SNTP

With SC firmware versions 5.13.0 and higher, the SC is capable of sychronizing its time-of-day clock with a network time server using SNTP. SNTP usage is encouraged to keep the SCs at an accurate time.

The following shows an example of how to enable SNTP.

heslab-16:A> setupdomain -p sntp

SNTP
----
SNTP server []: 10.1.63.251

heslab-16:A>

Changing POST Levels and Other Settings

To provide thorough testing of all components, the power-on self-test (POST) level for both the SC and domains should be set to maximum. Maximum is the default level for all domains. If you cannot always run the maximum POST level, you should at least use it during system installation. You should also use the maximum level of POST in other circumstances, such as if hardware is being replaced or moved after an unexpected system or power failure or when the hardware is suspected of causing system problems.

The following shows an example of the platform SCPOST level you should use:

heslab-16-sc0:SC> setupplatform -p sc

SC
--
SC POST diag Level [max]: max
Enable SC Failover? [no]: yes
Logical Hostname or IP Address []: heslab-16-sc

heslab-16-sc0:SC>

Note

For SCs running firmware versions lower than 5.13.0, the parameters for controlling SC Failover will not be visible.

The following show an example of the domain boot parameters we recommend:

heslab-12:B> showdomain -p bootparams

diag-level = max
verbosity-level = off
error-level = max
interleave-scope = within-board
interleave-mode = optimal
reboot-on-error = true
error-policy = diagnose
OBP.use-nvramrc? = true
OBP.auto-boot? = true
OBP.error-reset-recovery = sync

heslab-12:B>

Note

A value of default is equal to max in the case of the domain diag-level parameter that controls the domain POST level.

Of special note is the addition of the error-policy value in firmware versions 5.13.x, or higher. This value affects the behavior of the system if the SC detects a hardware error condition by determining whether the error message only is displayed or whether the error message is displayed with a diagnostic recommendation. It is recommended that error-policy be set to diagnose.

Configuring the Midframe Service Processor

An external system to help with the administration of Sun Fire servers is helpful because of the need to access and monitor the SC on a regular basis (console output or SC platform messages) and because the SC attempts to log messages to an external host (by using SNMP or syslog).

This section contains descriptions of how to configure the Midframe Service Processor (MSP):

  • Configuring log messages on the MSP

  • Configuring the Sun MC software

  • Configuring the Sun Explorer software utility

The MSP provides a centralized and secured access point for logging these messages, and it provides support services that the SC cannot provide. While the Sun Fire platform is theoretically self-contained, for ease of problem diagnosis, accessibility to platform information, and updating system firmware and software, an MSP is strongly recommended to provide a centralized location for these functions.

This article does not recommend any particular type of MSP because each site's needs (for instance, the number of systems to monitor and the requirement for the Sun MC software) are generally different. In addition, the requirements of individual sites might conflict. For example, syslog(3) does not require as many system resources to monitor hosts as does the Sun MC software. However, because of the limited number of syslog(3) logging facilities available per host, it might not be possible to monitor as many systems as a single, larger Sun MC server can, without generating large unmanageable log files.

To Configure Log Messages on the MSP

To be able to log messages sent to the MSP with syslog(3) from a SC, you need to make additions to the default /etc/syslog.conf file on the syslog(3) host, as described in the following steps. The additions should correspond to the settings made on the Sun Fire platform.

  1. Add the following entries to the syslog.conf file:

    local0.notice   /var/log/messages.platform 
    local1.notice   /var/log/messages.domain-A 
    local2.notice   /var/log/messages.domain-B 
    local3.notice   /var/log/messages.domain-C 
    local4.notice   /var/log/messages.domain-D 

    The above entries should be separated by tabs; otherwise, syslogd will fail.

  2. Create the log files before restarting syslog(3) by entering the following commands.

    You must ensure that the files have the appropriate permissions.

    nerm# touch /var/log/messages.platform 
    nerm# touch /var/log/messages.domain-A 
    nerm# touch /var/log/messages.domain-B 
    nerm# touch /var/log/messages.domain-C 
    nerm# touch /var/log/messages.domain-D 
  3. Restart the syslogd(1M) daemon, or reboot the MSP.

  4. Verify that syslog(3) is working correctly by using the logger(1) command:

    nerm# logger -p local0.notice "Platform test message"
    nerm# logger -p local1.notice "Domain A test message"
    nerm# logger -p local2.notice "Domain B test message"
    nerm# logger -p local3.notice "Domain C test message"
    nerm# logger -p local4.notice "Domain D test message"
  5. Verify that the test message is logged in the appropriate log file.

  6. Verify that the SC is logging properly by entering the setkeyswitch command:

    heslab-12:B> setkeyswitch off
    heslab-12:B> setkeyswitch on
  7. Verify that the POST results are sent to the log files.

    Periodically, you should rotate the log files to prevent them from growing too large. You can do this by setting up additional scripts such as /usr/lib/newsyslog to run on a regular basis. The newsyslog script modifies the contents of the additional scripts to rotate the specified log files. Rotate the files on a monthly basis, and keep archived copies of the information for at least two months.

    1. For the Solaris 8 OE, and earlier releases, add five entries to the existing /usr/lib/newsyslog file to rotate the five log files referenced above.

      The following code contains an example of an entry:

      #
      LOGDIR=/var/log
      LOG=messages.platform
      if test -d $LOGDIR
      then
               cd $LOGDIR
               if test -s $LOG
               then
      test -f $LOG.6 && mv $LOG.6  $LOG.7
                       test -f $LOG.5 && mv $LOG.5  $LOG.6
                       test -f $LOG.4 && mv $LOG.4  $LOG.5
                       test -f $LOG.3 && mv $LOG.3  $LOG.4
                       test -f $LOG.2 && mv $LOG.2  $LOG.3
                       test -f $LOG.1 && mv $LOG.1  $LOG.2
                       test -f $LOG.0 && mv $LOG.0  $LOG.1
                       mv $LOG    $LOG.0
                       cp /dev/null $LOG
                       chmod 644    $LOG
                       sleep 40
               fi
      fi

      Be sure to create five unique entries for each of the five log files by changing the name of the file in the LOG value. Also, be sure that the last line in the file remains as follows:

      kill -HUP ´cat /etc/syslog.pid´
    2. For the Solaris 9 OE, use the logadm command to set up the rotation of the log files.

      The logadm command replaces the functionality of the /usr/lib/newsyslog file.

      nerm# logadm -w /var/log/messages.platform -C 8 -a 'kill -HUP ´cat 
      /var/run/syslog.pid´'
  8. This command must be entered on one line.

    Use the following command for the domains:

    nerm# logadm -w /var/log/messages.domain-A -C 8 -a 'kill -HUP ´cat 
    /var/run/syslog.pid´'

    Be sure to change the message file name for all of the domains on the system.

    As mentioned, syslog(3) facilities are limited, so plan ahead and organize the limited number of resources to effectively enable an administrator to quickly locate data. In addition, you should establish a mechanism to parse and sort the incoming information on a regular basis and to send the administrator an email message of the changes. Further information on the configuration of syslog can be found in the Solaris OE system administration guides.

Configuring the Sun MC Software

A Sun MC software server normally requires a higher level of system resources, such as a correctly configured dual processor system capable of having 1 gigabyte of RAM or more. However, a Sun MC software server also has a greater capability to monitor and administer a large number of systems. Whether or not the Sun MC software proxy agent is running on the same host as the server agent might influence the Sun MC software server configuration.

The Sun MC software should be implemented with two systems. One small system should act as a proxy agent for one or more Sun Fire platforms, and the second system should be a larger Sun MC software server that is tasked with monitoring the entire network. This configuration provides additional monitoring capabilities in case the system containing the Sun MC software server becomes unavailable. It also provides flexibility in the MSP and security configuration.

To be able to monitor SNMP traps generated by the SC, you must install the Sun MC 3.0 Platform Update 1, or higher. This version of Sun MC is available with the Solaris 8 OE 04/01 release. Currently, the Sun MC software is the only package that can understand the SNMP traps generated by the SC. No MIBS are publicly available. Refer to the Sun Management Center 3.0 Supplement for Sun Fire 6800, 4810, 4800, and 3800 Systems for additional installation and setup information.

Preparing for Firmware Updates

For purposes of firmware updates to the SC, you must set up an FTP or HTTP service on the MSP. You can set up an anonymous FTP server by following the instructions in the ftpd(1M) man page, or you can use normal FTP by specifying a user and password in the FTP URL. If the MSP uses the Solaris 8 OE or higher, a version of the Apache Web server is provided with the Solaris OE, which you can use to provide HTTP services. Because the HTTP service is more configurable than the FTP service and because it may be restricted to listen only on certain network interfaces, the HTTP can have less of a security impact than FTP. For additional information, refer to the "Securing the Sun Fire? Midframe System Controller Updated for 5.13.x", which is available from: http://www.sun.com/blueprints

You can install the operating system for Sun Fire domains either from an attached DVD-ROM drive or over the network from a Solaris JumpStart? software server. The function of a JumpStart software server may also be well suited for an MSP. Detailed instructions for setting up a Solaris JumpStart software server can be found in the Solaris OE systems administration guides.

When choosing an appropriate MSP (or MSPs), some additional capabilities need to be considered, such as how to access the serial ports on multiple SCs and how many devices need to be monitored on the same system. For example, the Sun StorEdge? T3 Array may also need to be monitored by the same MSP.

Configuring the Sun Explorer Software Utility

After completing the initial installation of a Sun Fire server, you should install the Sun Explorer software utility on both the server and the MSP, and you should set it up to periodically collect system configuration information and error messages. Check the following site regularly for updates to the Sun Explorer software: http://sunsolve.sun.com

If possible, the output from the Sun Explorer software should be automatically sent to the Sun Explorer software database at the email address you specified when you set up the software. You should use version 3.5.3.1, or higher, because it has the capability to gather data from the SC. Version 3.5, or higher, can collect data from the SC; however, version 3.5.3.1, or higher, is recommended because it includes the ability to gather information about the components in the system, such as the field-replaceable unit ID (FRUID). The FRUID information is stored in each component in a Sun Fire system. It contains information about the parts, such as the revision level, serial number, and manufacturing information.

The following command gathers information from the SC. You should execute it on the MSP. In addition, use of the following command assumes that the Sun Explorer software has already been installed on the system in the default location: /opt/SUNWexplo

nerm# /opt/SUNWexplo/bin/explorer -w fru,scextended,default

If you execute this command from an MSP, the Explorer software will collect data from the MSP and the SC. To collect data from the SC, the Explorer software uses a telnet connection; therefore, the MSP must be able to establish a telnet session on the SC.

If security considerations prevent the automatic sending of the Sun Explorer software results to the Sun Explorer software database, you should still install the Sun Explorer software utility so that it is available to collect information in the event that service is required on the system and information needs to be collected.

The initial installation is also a good time to record and check the system serial number, hostid, and MAC address provided with the system and to become familiar with how these values are reported by the SC showplatform -v command. Keep this information where it can be easily accessed in case a SC replacement is required.

  • + Share This
  • 🔖 Save To Your Account