Enterprise Server Features
Certain hardware and software features available on the Sun platform are beneficial in HPC environments. Using these features provides greater system flexibility and optimal use of hardware resources. This section describes the following:
"Dynamic System Domains"
"Processor Sets Versus System Domains"
Dynamic System Domains
This attractive feature allows a machine to be logically divided into several domains or machines where each runs its own copy of the Solaris OE. The effect is like having several machines combined into one hardware box. The latest Sun Fire™ mid range servers and the last two generations of high-end servers include Dynamic System Domains.
This feature provides a command-line and GUI interface for performing the following operations:
showing status of domains
The following are some example uses of Dynamic System Domains:
Consolidating many servers in one small footprint
Creating a small domain for testing upgrades
Separating development domains and production domains
Partitioning domains to scale and improve performance of applications
The first three uses apply to both business and HPC environments. In particular, item three is a popular practice by customers who want two domains, where one is for developing code and the other is for submitting compute-intensive production code. The size of a domain varies with customer sites and is related to the workload required by development tasks and the size of projects at a site.
The fourth example improves system throughput and performance of applications by optimally managing hardware resources of a machine. Domain partitioning involves either a domain expansion or a domain splitting operation. The next two subsections describe both of these topics.
This operation expands the size of a domain by merging it with another domain or borrowing hardware resources from another domain. Expanding a domain can improve the performance of parallel applications that have great potential for scaling beyond the maximum number of processors in a domain.
Examples of applications that take advantage of domain expansion are parallel applications and other scalable compute-intensive code. Site administrators at large compute sites can merge whole or parts of idle domains into one larger domain, thereby making it available to candidate applications at appropriate times such as nights or weekends.
Domain expansion incurs an overhead when a whole domain is merged, because it needs to be shut down and later restarted to its original state.
The following figure depicts a domain expansion scenario for scaling HPC applications.
FIGURE 3 Merging Domains to Scale HPC Applications
This operation divides a large domain into two or more smaller domains. Splitting domains can improve overall performance of a system by configuring the most optimal size domains for running HPC applications.
Splitting a domain is beneficial when parallel applications would not scale beyond a certain number of processors (see the following figure). The example we illustrate is the Mesoscale Model weather program, also known as MM5. The MM5 is a public domain program developed by Pennsylvania State University and the National Center of Atmospheric Research (PSU/NCAR).
The MM5 program uses a variable number of cells that decide how many outer loops are used in the most time consuming part of the program. If only 25 cells are used, then the program needs 25 processors to optimally run because each cell (outer loop) is distributed to a separate processor.
When the program is run with a smaller number of cells, domain splitting may free up hardware resources by configuring the most optimal size domain, which includes only the needed number of processors. We see cases at customer sites where domain splitting is used to assign domains to groups at intervals to run their critical compute-intensive jobs.
FIGURE 4 Splitting Domains
This feature allows a system administrator to add/delete system boards from either an idle domain or a domain running the Solaris OE. This feature was introduced with the Sun Enterprise™ 10000 server release and integrated with the Dynamic System Domains feature previously discussed.
Dynamic reconfiguration now supports automated dynamic configuration (ADR). This enhancement is attractive because the dynamic reconfiguration operation is performed automatically without requiring an operator's attendance.
ADR is helpful when workloads on domains vary at different times of the day, because it provides the capability to move system boards between domains to meet the requirements of load constraints.
ADR provides the following commands that you can execute either from a command line or a shell script:
addboard: attach a board to a domain
deleteboard: detach a board from a domain
moveboard: detach a board from a domain and attach it to another domain
showusage: display board and dynamic reconfiguration data
These commands are wrappers for lower level commands that perform manual dynamic configuration.
HPC sites that benefit most from dynamic reconfiguration are sites that have machines configured with a domain for application development and another domain for job execution (computing). These sites are excellent candidates because the development domain releases hardware resources to the compute domain, which desperately needs resources at night time or other non-peak usage times for scheduled batch jobs. The reverse operation happens in the morning or at peak usage times where the development domain needs to reclaim its original resources to serve the development community.
The following figure illustrates dynamic reconfiguration. This operation can be launched using scripts that monitor the load of domains and move resources according to dynamic needs of the system.
FIGURE 5 Dynamic Reconfiguration Example
This feature allows a multiprocessor system to be divided into two or more logical groups of processors. Processor sets (also called processor partitions) provide a mechanism for scheduling processes to run exclusively on one processor set. This feature was introduced in Solaris OE, version 2.6. Processor sets serve the following uses:
increase performance of a system by dividing a machine into processor sets
assign and dedicate applications to a specific processor set
In this next section, we compare the Processor Sets feature with the Dynamic Systems Domains feature.
FIGURE 6 Processor Sets Architecture
Processor Sets Versus System Domains
The Processor Sets feature is somewhat similar to the Dynamic Systems Domains feature; a machine can be divided into groups of processors where applications can run exclusively. The Processor Sets feature is generally less robust than the Dynamic System Domains feature. The following information provides a brief comparison of the two features.
Dynamic system domains divide a machine into two or more virtual systems. For example, a machine with two domains literally runs two copies of Solaris OE whereas Processor Sets operate within a single operating system instance. Both features permit isolation of one application from another, assuming that the two applications are operating in different processor sets.
Memory, virtual memory, and I/O are shared by all processor sets. Dynamic system domains carve out their own memory and I/O that are used exclusively by the domain that owns them. All processor sets within a single Solaris OE instance use the same pool of memory, virtual memory, and I/O resources. If an application in one processor set stumbles on a bug and consumes all available memory, applications in other processor sets are affected.
Applications that consume all available CPU affect only their own processor sets. The same also applies to dynamic system domains.
From an administrative standpoint, it is a lot easier to create, destroy, and handle processor sets than to maintain dynamic system domains. To create a dynamic system domain, an administrator has to make sure that the required hardware is available, install the Solaris OE, and install additional software such as the Sun HPC ClusterTools software suite and the Sun Grid Engine software.
The Dynamic System Domains feature allows an administrator to test a new version of software without affecting other domains. Unfortunately, the Processor Sets feature does not provide this capability. In contrast, it is possible to test a new version of an application in a separate processor set, however, there is always a risk that the application might affect hardware common to all processor sets.
Both processor sets and dynamic system domains provide the system administrator the capability to set up a separate environment for HPC applications.
The Extended Accounting feature was introduced in the Solaris™ 8 Operating Environment, Update 1. This feature extends the Solaris OE environment's traditional system accounting with task and project ID concepts. The task and project IDs proved a way to tag a program so that it belongs to a job, which in turn can belong to a project.
In business environments, this feature is used by the third-party accounting tool PerfAcct 3.1, from Instrumental Inc. This product provides sophisticated accounting reports by gathering and processing data from machines within a network. Unfortunately, this accounting tool does not currently support parallel distributed environments.
A typical HPC site is characterized by multiple users who execute long-running programs, which compete for finite machine cycles that are strictly assigned to projects. HPC sites that lease computer time to external users need an advanced accounting infrastructure that provides an efficient charge-back accounting feature for a wide range of jobs. Also, they especially need the accounting feature for parallel jobs that span across nodes of computer clusters. To perform system accounting for typical HPC configurations, there is a dire need for a sophisticated accounting tool that provides this required functionality. The majority of HPC sites deploy a job management system product that regulates the use of resources by jobs and users. The underlying operating system provides the required hooks such as the project and task ID, which allow the job management system to provide more comprehensive job accounting.
The Sun Grid Engine software currently does not take advantage of extended accounting, and there is no accounting report tool that provides sophisticated reports and charge-back reports for SGE jobs in a Solaris OE.
Currently, the only product that satisfies the full compute-intensive job accounting requirements on a Sun platform is the LSF Analyzer tool, which reports accounting only for jobs that are submitted using the LSF job management system.
It is clear that the underlying API is available for future implementation of distributed job accounting on Sun platforms.