Our implementation of the tight integration architecture offers a number of additional features that improve the usability of the system, including:
- Job integration
- Support for threads placement
- Verbose output
A nice property of our implementation is that commands from both Sun CRE and the RM can be used to monitor and control a job, so users can choose the command set they prefer. Further, we arrange that the job identifier shown by Sun CRE matches the identifier chosen by the RM, so that users can see the correspondence between jobs in each system. FIGURE 4 illustrates that both the PBS qstat command and the Sun CRE mpps command have a view of a running job named 53.hpc4.
FIGURE 4 Example of a Job Visible in Both Sun CRE and the Resource Manager
Developers of parallel applications are increasingly interested in programming models that mix multithreaded parallelism with multiprocess MPI parallelism. However, to date there has been no easy way to run such an application under an RM and guarantee that the MPI processes will be placed on hosts with adequate spare CPUs to handle the spawned threads. To solve this problem, we have extended the syntax of the mprun -np n option to allow the form -np PxT, where P is the number of MPI processes to launch, and T is the number of threads that will run in each process. When mprun is invoked under an RM, it takes the first T resources from the host list, starts a single MPI process there, takes the second T resources, starts a single MPI process there, and so on.
A system administrator must perform some manual configuration steps to enable the integration between a resource manager and Sun CRE. As an aid in debugging possible configuration problems, we have added a -v verbose option to mprun, which causes mprun to print extra information about its interactions with the RM.