Error Analysis and Diagnosis
Sun Fire servers provide significantly enhanced diagnostics capabilities. In the event of a system fault, the system should provide data for both software and hardware failures that you can use to help determine the source of the fault. Errors can be generated and logged to several places, depending on the type of error. Use a utility such as Explorer to gather data from the system so that all error messages can be collected in a central location.
After the appropriate error messages have been located, use the flow charts in the Sun Fire 6800/4810/4800/3800 Systems Troubleshooting Manual to isolate the source of the error, as far as possible. Based on the results, attempt to verify the failure using component blacklisting, segmenting, or other reconfiguration before attempting to remove or replace components in the case of a suspected hardware problem.
There are maintenance functions that you must perform on a regular basis. The following functions are described in this section:
"Restoring the Sun Fire SC Configuration"
"Updating the Firmware and Real Time Operating System"
"Removing the SC from Platform Use"
Restoring the Sun Fire SC Configuration
If an SC fails, you might need to manually restore the SC configuration information. After the configuration of the platform has been completed, including setting up domains and segments, create a backup of your SC configuration so that a quick restoration will be possible.
The following shows an example of how to create a backup of the Sun Fire SC configuration.
heslab-12:sc> dumpconfig -f ftp://me:passw0rd@heslab-05/dumps
The following shows an example of how to restore a Sun Fire SC configuration.
heslab-12:sc> restoreconfig -f ftp://me:passw0rd@heslab-05/dumps
You should create a backup of the SC configuration on a routine basis to ensure that the dump file is up-to-date. To help with a quick recovery if a primary SC fails, make sure that the secondary SC has the same configuration information as the primary after the domains have been configured.
Updating the Firmware and Real Time Operating System
Periodically, updates to the SC firmware and RTOS will be made available. These updates often contain critical bug fixes and functionality enhancements to the SC and should be applied as part of a regular patch maintenance routine.
Before applying a firmware update using the flashupdate command, carefully read the release notes and Install.info files in the patch package before proceeding with the update to familiarize yourself with the procedures. Backing up the SC configuration before updating is also recommended.
Also be sure to perform the firmware updates regularly, and for Sun Fire systems which have two SCs, remember to update the firmware on both SCs.
You must follow the instructions in the Install.info file, included with each patch release, to ensure that both ScApp and RTOS are updated together. ScApp should only be run with the accompanying version of RTOS. Upgrading from some versions of the firmware may require that the upgrade to the SCs be done in a specific order. Be sure to read and follow the instructions carefully.
You can retrieve updates from the SunSolveSM program website:
You should also have copies of important SC parameters that are displayed by the showplatform and showboards commands, as well as those displayed by the OpenBoot™ PROM commands printenv and devalias. See "Appendix: SC Parameters" on page 28 for examples of the output.
Refer to the Sun Fire 6800/4810/4800/3800 System Controller Command Reference Manual for more information on the commands discussed here.
Removing the SC from Platform Use
If an SC needs to be removed for maintenance purposes, you must follow the instructions for SC replacement that are specific for the version of firmware on the SCs. For specific instructions, refer to the Sun Fire 6800/4810/4800/3800 Systems Platform Administration Manual for your version of firmware.
In general, an SC should never be removed from a system unless the SC can be powered off, either by using the poweroff SSCx command or by removing the power to the entire platform.