Home > Articles > Operating Systems, Server > Solaris

Like this article? We recommend

Error Analysis

Error analysis and diagnosis is accomplished using error data that is displayed on the system console and the data captured in the /var/adm/messages file. It is possible to administer the 1280 server by using only the serial or only the Ethernet connection, but it is recommended that both of these connections be used to collect data. As mentioned, the Ethernet connection is fast, but the serial connection is the only connection to which all of the diagnostic data, including the SC POST output, is sent. Minimally, the following data is required to determine the cause of a hardware failure:

  • Verbatim transcript of all output written to the system console leading up to the failure

  • Copy of the /var/adm/messages file from the time leading up to the failure

  • Output of the following LOM commands:

    • showsc -v

    • showboards -v

    • history

    • date

    • showresetstate

    • showlogs

    • showenvironment

    • showcomponent

  • Core dumps and panic messages, if applicable

The logs should provide information to describe faults that occurred. If the console is locked, you should take the following two steps:

  1. Issue the #. command (or whatever character sequence you have configured as the escape sequence).

  2. This should drop you immediately to the LOM command line interface.

  3. Issue the break command at the lom> prompt.

  4. This should take you directly to the ok prompt.

If typing the break command from the LOM shell does not force control of the system back to the OpenBoot PROM prompt, then the system has stopped responding. In some circumstances, the host watchdog will detect that the Solaris OS has stopped responding and will automatically reset the system.

If the domain is hung, execute the showresetstate -v command to display the CPU registers. Then, use the reset -x command from the SC to cause an XIR to be sent to the processors from the lom> prompt. Using the reset command without an option is equivalent to reset -x, which uses an XIR to reset the CPU processors into the OpenBoot PROM and begins the OpenBoot PROM error recovery actions. The error reset recovery actions preserve most of the Solaris OS system states. This enables the collection of the needed data, including a Solaris OS core dump, for debugging the hardware and software. After the reset, execute another showresetstate -v command to display the CPU registers.

If the system is still hard hung (that is, you cannot log in to the Solaris OS, and executing the break command does not force control of the system back to the OpenBoot PROM prompt), execute the reset -a command to reset everything. The reset -a command, which is equivalent to the OpenBoot PROM reset-all command, skips XIR data collection and loses extra debugging data. A second reset -a command might be required in some hard hung situations or if the reset -x command fails.

NOTE

There is no SC reset button on the 1280 server. Use the resetsc command to reset the SC in the event of a hardware or software problem.

You can use the following LOM commands to check the system status and configuration:

  • lom -f to check the fans

  • lom -v to check the status of the supply rails and internal circuit breakers

  • lom -c to check the LOM configuration

  • lom -t to check the temperatures

  • lom -l to check whether the fault LED and alarms are on or off

  • lom -e n, [x] to check the event log

To view all LOM status and configuration data, execute the lom -a command from the Solaris OS prompt. You can also check the output of commands such as cfgadm(1M), psrinfo(1M), and prtdiag -v.

  • + Share This
  • 🔖 Save To Your Account