A Scalable Process-Management Environment for Parallel Programs

  • Ralph Butler
  • William Gropp
  • Ewing Lusk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1908)

Abstract

We present a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising a thousand processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of MPICH jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. MPD is implemented and freely distributed with MPICH.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Chiba City home page. http://www.mcs.anl.gov/chiba
  2. [2]
    The Maui scheduler home page. http://maui-scheduler.mhpcc.edu/newdoc, http://www.mhpcc.edu/maui.
  3. [3]
    M. A. Baker, G. C. Fox, and H. W. Yau. Review of cluster management software. NHSE Review, 1(1), May 1996.Google Scholar
  4. [4]
    Amnon Barak, Shai Guday, and Richard G. Wheeler. The MOSIX distributed operating system: Load balancing for UNIX, volume 672 of Lecture Notes in Computer Science. Springer-Verlag, New York, 1993.MATHGoogle Scholar
  5. [5]
    Micah Beck, Jack J. Dongarra, Graham E. Fagg, G. Al Geist, Paul Gray, James Kohl, Mauro Migliardi, Keith Moore, Terry Moore, Philip Papadopoulous, Stephen L. Scott, and Vaidy Sunderam. HARNESS: A next generation distributed virtual machine. International Journal on Future Generation Computer Systems, 15(5/6), 1999.Google Scholar
  6. [6]
    Greg Burns, Raja Daoud, and James Vaigl. LAM: An open cluster environment for MPI. In John W. Ross, editor, Proceedings of Supercomputing Symposium’ 94, pages 379–386. University of Toronto, 1994.Google Scholar
  7. [7]
    Ralph Butler and Ewing Lusk. Monitors, messages, and clusters: The p4 parallel programming system. Parallel Computing, 20:547–564, April 1994.Google Scholar
  8. [8]
  9. [9]
    I. Foster and C. Kesselman, editors. The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999.Google Scholar
  10. [10]
    Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Bob Manchek, and Vaidy Sunderam. PVM: Parallel Virtual Machine—A User’s Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge, Mass., 1994.Google Scholar
  11. [11]
    Douglas P. Ghormley, David Petrou, Steven H. Rodrigues, Amin M. Vahdat, and Thomas E. Anderson. GLUnix: A Global Layer Unix for a network of workstations. Software—Practice and Experience, 28(9):929–961, July 1998.Google Scholar
  12. [12]
    William Gropp and Ewing Lusk. Scalable Unix tools on parallel processors. In Proceedings of the Scalable High-Performance Computing Conference, pages 56–62. IEEE Computer Society Press, 1994.Google Scholar
  13. [13]
    William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. A high-performance, portable implementation of the MPI Message-Passing Interface standard. Parallel Computing, 22(6):789–828, 1996.MATHCrossRefGoogle Scholar
  14. [14]
    IBM. Loadleveler: Using and Administering, version 2 release 1 edition, November 1998. SA22-7311-00.Google Scholar
  15. [15]
    M. J. Litzkow, M. Livny, and M. W. Mutka. Condor-A hunter of idle workstations. In Proc. 8th Intl. Conf. on Distributed Computing Systems, pages 104–111, San Jose, Calif., June 1988.Google Scholar
  16. [16]
    M. Migliardi and V. Sunderam. PVM emulation in the Harness metacomput-ing system: A plug-in based approach. In J.J. Dongarra, E. Luque, and Tomas Margalef, editors, Recent advances in parallel virtual machine and message passing interface: 6th European PVM/MPI Users’ Group Meeting, Barcelona, Spain, September 26–29, 1999: Proceedings, volume 1697 of Lecture Notes in Computer Science, pages 117–124, Berlin, 1999. Springer-Verlag.CrossRefGoogle Scholar
  17. [17]
    PBS home page. http://pbs.mrj.com/.
  18. [18]
    Load Sharing Facility (LSF). http://www.platform.com.
  19. [19]
    J. Pruyne and M. Livny. Interfacing Condor and PVM to harness the cycles of workstation clusters. Future Generation Computer Systems, 12(1):67–85, May 1996.Google Scholar
  20. [20]
    Andrew S. Tanenbaum. Computer Networks. Prentice Hall, third edition, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Ralph Butler
    • 1
  • William Gropp
    • 2
  • Ewing Lusk
    • 2
  1. 1.University of North FloridaUSA
  2. 2.Argonne National LaboratoryUSA

Personalised recommendations