- Network Parameters
- Critical OpenBoot PROM Configuration Parameters
- Time and Date
- Host Watchdog, Rocker Switch, and Secure Mode Settings
- Password Management
- Firmware Upgrades
- Error Logging
- Error Analysis
- LED Status and Alarms
- Environment Monitoring and Control
- Console Navigation
- Power Failure and Recovery Behavior
- Precautions for Using the SCC
- System Controller Configuration Backups
- Power Connections for Redundancy and Grounding
- Rackmounting and Improving Serviceability
- About the Author
- Ordering Sun Documents
- Accessing Sun Documentation Online
Error analysis and diagnosis is accomplished using error data that is displayed on the system console and the data captured in the /var/adm/messages file. It is possible to administer the 1280 server by using only the serial or only the Ethernet connection, but it is recommended that both of these connections be used to collect data. As mentioned, the Ethernet connection is fast, but the serial connection is the only connection to which all of the diagnostic data, including the SC POST output, is sent. Minimally, the following data is required to determine the cause of a hardware failure:
Verbatim transcript of all output written to the system console leading up to the failure
Copy of the /var/adm/messages file from the time leading up to the failure
Output of the following LOM commands:
Core dumps and panic messages, if applicable
The logs should provide information to describe faults that occurred. If the console is locked, you should take the following two steps:
Issue the #. command (or whatever character sequence you have configured as the escape sequence).
Issue the break command at the lom> prompt.
This should drop you immediately to the LOM command line interface.
This should take you directly to the ok prompt.
If typing the break command from the LOM shell does not force control of the system back to the OpenBoot PROM prompt, then the system has stopped responding. In some circumstances, the host watchdog will detect that the Solaris OS has stopped responding and will automatically reset the system.
If the domain is hung, execute the showresetstate -v command to display the CPU registers. Then, use the reset -x command from the SC to cause an XIR to be sent to the processors from the lom> prompt. Using the reset command without an option is equivalent to reset -x, which uses an XIR to reset the CPU processors into the OpenBoot PROM and begins the OpenBoot PROM error recovery actions. The error reset recovery actions preserve most of the Solaris OS system states. This enables the collection of the needed data, including a Solaris OS core dump, for debugging the hardware and software. After the reset, execute another showresetstate -v command to display the CPU registers.
If the system is still hard hung (that is, you cannot log in to the Solaris OS, and executing the break command does not force control of the system back to the OpenBoot PROM prompt), execute the reset -a command to reset everything. The reset -a command, which is equivalent to the OpenBoot PROM reset-all command, skips XIR data collection and loses extra debugging data. A second reset -a command might be required in some hard hung situations or if the reset -x command fails.
There is no SC reset button on the 1280 server. Use the resetsc command to reset the SC in the event of a hardware or software problem.
You can use the following LOM commands to check the system status and configuration:
lom -f to check the fans
lom -v to check the status of the supply rails and internal circuit breakers
lom -c to check the LOM configuration
lom -t to check the temperatures
lom -l to check whether the fault LED and alarms are on or off
lom -e n, [x] to check the event log
To view all LOM status and configuration data, execute the lom -a command from the Solaris OS prompt. You can also check the output of commands such as cfgadm(1M), psrinfo(1M), and prtdiag -v.