Process Management for Scalable Parallel Programs
Large-scale parallel programs present multiple problems in process management, from scalable process startup to runtime monitoring and signal delivery, to rundown and cleanup. Interactive parallel jobs present special problems in management of standard I/O.
In this talk we will present an approach that addresses these issues. The key concept is that of a process management interface, which is used by application layers such as MPI implementations or the runtime systems for languages like UPC or Co-Array Fortran, and implemented by vendor-supplied or publicly available process management systems. We will describe multiple implementations of such a process management interface, focusing on MPD, which is distributed with the MPICH implementation of MPI but is independent of it. MPD provides process management support for the MPICH-2 implementation of MPI, described elsewhere in this conference, as well as scalable process startup of large interactive jobs. It cooperates with various types of parallel tools such as monitors and debuggers.
Finally, we will describe how this or any process management systems can be integrated into a complete scalable systems software solution, using the interfaces under development by a broadly based group attempting to define a cluster system software architecture.