Home > Articles > Operating Systems, Server > Solaris

  • Print
  • + Share This
Like this article? We recommend

HPC Administration and Usage Tips

This section gives simple recommendations to Sun HPC system users and administrators on various issues that relate to configuration and usage of the Sun HPC software installation. The following is a list that addresses the most frequently encountered issues:

  • Do not mix different versions of the hostname syntax for the cluster nodes to prevent an HPC installation from successfully completing.

  • Use the output of the hostname command in the hpc_config file. Make sure the same syntax is used in the /etc/hpc_system and the /.rhosts or /etc/sunhpc_rhosts files.

  • Provide a superuser (root) readable and writeable directory for synchronization.

  • Usually the hpc_config file is saved in this directory, and all of the SYNC files used by the install script are created in this directory. Check the sync directory for correct permissions before starting the installation.

  • Change the permissions to 0600 and the ownership to root on the /.rhosts and /etc/sunhpc_rhosts authentication files.

  • Make sure that the authentication files contain the hostname of the cluster nodes, including the host on which they reside.

  • Refresh the resource database.

  • The resource database sometimes gets out of sync and needs to be refreshed. This is demonstrated by wrong and unexpected output from the CRE commands. You should stop the CRE daemons, remove the /var/rdb-* files, and restart the CRE daemons.

  • Try to avoid NFS-type installations.

  • Use SMP-local or cluster-local installations only. The latter type has generated more clean installations than the NFS type, due to the wide variations of network configurations.

  • Do not remove the /tmp/CRE-ctblfile file because it is needed by the CRE software.

  • A job spawned by the Sun CRE is closely tied to a the /tmp/CRE-ctblfile file that lives as long as the CRE daemons are running. Most computer sites have scripts that regularly clean up the /tmp directory. There have been instances where long running jobs that take days to complete have failed due to the unexpected disappearance of the /tmp/CRE-ctblfile file.

  • Use the -t scale_factor option with the mprun(1) command to increase the timeout period.

  • Jobs that spawn a large number of processes may, on rare occasions, fail with the following message:

    mprun: tmrte_proc_spawn: select: Operation
    timed out: Operation timed out

    This is due to the default timeout value used by the Sun CRE to spawn all of the processes of the job.

  • Configure a large /tmp swap partition because the MPI programs running on a particular node use shared memory files that are mapped to the /tmp area.

TABLE 1 contains two examples of shared memory sizes with respect to the number of processes running on the same SMP:

TABLE 1 Shared Memory Sizes Per Processes

Processes per Job

Required Shared Memory

2 35 Mbytes
16 85 Mbytes

  • Keep MPI network traffic separate from administrative and other network traffic to improve MPI application performance.

The above tips and recommendations frequently reappear on the support forums. See the Frequently Asked Questions at the following site:

http://supportforum.sun.com/clustertools

Read the Sun HPC ClusterTools 4 User's Guide and the Sun HPC ClusterTools 4 Product Notes at the following site:

http://docs.sun.com/

Use the Sun Cluster Support forum at:

http://supportforum.sun.com/clustertools

There are several forums at this site that users and experts use to discuss issues that pertain to the Sun HPC ClusterTools and the Sun Grid Engine products.

Appendix

This appendix contains a copy of the /etc/sudoers file. It includes the necessary changes to administer the Sun HPC ClusterTools 4 software.

-------------Start of /etc/sudoers file-------------------
...
snip
...
#
Host_Alias HPCHOSTS=<hostname>,<hostname>,...
#
User_Alias HPCUSERS=<username>,<username>,...
# Used for HPC CT 3.1
Cmnd_Alias HPCCMNDS=/opt/SUNWhpc/bin/*,/etc/init.d/sunhpc*,/opt/SUNWhpc/etc/*,/opt/SUNWhpc/etc
/isa.hHPCUSERS HPCHOSTS=HPCCMNDS/opt/SUNWhpc/etc/sparc*/*,/opt/SUNWhpc/etc/pfs/sparc*/*,/opt
/SUNWhpc/bin/Install_Utilities/*
#
...
snip
...
-------------End of /etc/sudoers file-------------------

Acknowledgements

I would like to thank all of my colleagues from the many HPC-related groups for reviewing the original SUPerG white paper and offering their valuable feedback and suggestions.

References

This section contains the references used in this article.

Ordering Sun Documents

The SunDocsSM program provides more than 250 manuals from Sun Microsystems, Inc. If you live in the United States, Canada, Europe, or Japan, you can purchase documentation sets or individual manuals through this program.

Accessing Sun Documentation Online

The docs.sun.com web site enables you to access Sun technical documentation online. You can browse the docs.sun.com archive or search for a specific book title or subject. The URL is http://docs.sun.com/

To reference Sun BluePrints OnLine articles, visit the Sun BluePrints OnLine Web site at: http://www.sun.com/blueprints/online.html

  • + Share This
  • 🔖 Save To Your Account