Abstract
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, fault-tolerant, highly available, and dynamic as a system internally, and in supporting fault-tolerance and dynamicity for its application programs as well. Starfish achieves these goals by combining group communication technology with checkpoint/restart, and uses a novel architecture that is both flexible and portable and keeps group communication outside the critical data path, for maximum performance.
Similar content being viewed by others
References
A. Agbaria, H. Attiya, R. Friedman and R. Vitenberg, Quantifying rollback propagation in distributed checkpointing, in: Proc. IEEE 20th Symposium on Reliable Distributed Systems, October 2001, to appear.
A. Agbaria and R. Friedman, Virtual machine based heterogeneous checkpointing, Technical report CS-2000-11, Technion, Israel Institute of Technology, 2000.
A. Agbaria and J.S. Plank, Design, implementation, and performance of checkpointing in NetSolve, in: Proc. IEEE of the 1st Conference on Dependable Systems and Networks, June 2000, pp. 49-54.
Y. Amir, L.E. Moser, P.M. Melliar-Smith, D.A. Agarwal and P. Ciarfella, Fast message ordering and membership using a logical tokenpassing ring, in: Proc. of the 13th International Conference on Distributed Computing Systems, May 1993, pp. 551-560.
T.E. Anderson, D.E. Culler and D.A. Patterson, A case for NOW (network of workstations), IEEE Micro (February 1995).
Basic Interface for Parallelism, http://lhpca.univ-lyon1.fr/bip.html.
A. Basu, V. Buch, W. Vogels and T. von Eiken, U-net: A user-level network interface for parallel and distributed computing, in: Proc. of the 15th ACM Symposium on Operating Systems Principles, December 1996, pp. 40-53.
K. Birman, The process group approach to reliable distributed computing, Communications of the ACM 36(12) (1993) 37-53.
K. Birman, R. Friedman and M. Hayden, The Maestro Group manager: A structuring tool for applications with multiple quality of service requirements, Technical report TR96-1619, Department of Computer Science, Cornell University, March 1996.
K.M. Chandy and L. Lamport, Distributed snapshots: Determining global states of distributed systems, ACM Transactions on Computer Systems 3(1) (February 1985) 63-75.
A. Chien, M. Lauria, R. Pennington, M. Showerman, G. Ianello, M. Buchanan, K. Hane, L. Giannini, G. Koenig, S. Krishnamurthy, Q. Liu, S. Pakin and G. Sampemane, The design and evaluation of an HPVM-based Windows-NT supercomputer, Unpublished manuscript (1999).
O.P. Damani, P.Y. Chung, Y. Huang, C. Kintala and Y.M. Wang, One-IP: Techniques for hosting a service on a cluster of machines, in: Proc. of the 6th World Wide Web Conference, April 1997.
E.N. Elnozahy, Manetho: Fault tolerance in distributed systems using rollback-recovery and process replication, Ph.D. thesis, Houston University, October 1993.
E.N. Elnozahy, L. Alvisi, Y.M. Wang and D.B. Johnson, A survey of rollback-recovery protocols in message-passing systems, Technical report CMU-CS-99-148, Department of Computer Science, Carnegie Mellon University, June 1999.
E.N. Elnozahy, D.B. Johnson and Y.M. Wang, A survey of rollbackrecovery protocols in message-passing systems, Technical report CMU-CS-96-181, Department of Computer Science, Carnegie Mellon University, October 1996.
R. Friedman, M. Goldin, A. Itzkovitz and A. Schuster, Millipede: Easy parallel programming in available distributed environments, Software: Practice and Experience 27(8) (1997) 929-965.
A.S. Grimshaw and W.A. Wulf, The legion vision of a Worldwide virtual computer, Communications of the ACM 40(1) (1997).
W. Gropp and E. Lusk, Mpich working note: Creating a new mpich device using the channel interface, Technical report ANL/MCS-TM-000, Argonne National Laboratory.
K. Guo and L. Rodrigues, Dynamic light-weight groups, in: Proc. of the 17th International Conference on Distributed Computing and Systems, May 1997, pp. 33-42.
M. Hayden, The ensemble system, Technical report TR98-1662, Department of Computer Science, Cornell University, January 1998.
A. Itzkovitz, A. Schuster and L. Shalev, The Millipede Virtual Parallel Machine for NT/PC Clusters, http://www.cs.technion.ac.il/Labs/ Millipede/millipede.html.
A. Itzkovitz, A. Schuster and L. Wolfovich, Thread migration and its applications in distributed shared memory systems, The Journal of Systems and Software (1998), to appear; also available as Technion CS Technical report LPCR #9603.
M. Litzkow, M. Livny and M. Mutka, Condor: A hunter of idle workstations, in: Proc. of the 8th International Conference on Distributed Computing Systems (ICDCS'88) (1988).
M. Litzkow, T. Tannenbaum, J. Basney and M. Livny, Matchmaking: Distributed resource management for high throughput computing, Technical report 1346, University of Wisconsin-Madison Computer Sciences, April 1997.
LoadLeveler home page, http://www.austin.ibm.com/software.
Message Passing Interface Forum, MPI-2: Extensions to the Message-Passing Interface, http://www.mcs.anl.gov/mpi (July 1997).
Myricom Home Page, http://www.myri.com.
NetSolve Home Page, http://www.cs.utk.edu/netsolve.
R.H.B. Netzer and J. Xu, Adaptive independent checkpointing for reducing rollback propagation, Technical report CS-93-25, Department of Computer Science, Brown University, September 1993.
S. Pakin, V. Karamcheti and A.A. Chien, Fast messages (FM): Efficient, portable communication for workstations clusters and massively parallel processors, IEEE Concurrency 5(2) (1997) 60-73.
J.S. Plank, Efficient checkpointing on MIMD architectures, Ph.D. thesis, Princeton Unversity, January 1993.
J.S. Plank, An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance, Technical report UT-CS-97-372, Department of Computer Science, Tennessee University, July 1997.
J.S. Plank, M. Bech, G. Kingsley and K. Li, Libckpt: Transparent checkpointing under UNIX, in: Usenix Winter 1995. Technical Conference, New Orleans, January 1995, pp. 220-232.
B. Randell, System structure for software fault tolerance, IEEE Trans. on Software Engineering SE-1(1) (June 1975) 220-232.
L. Rodrigques, K. Guo, A. Sargento, R. van Renesse, B. Glade, P. Verissimo and K. Birman, Reducing interprocessor dependence in recoverable distributed shared memory, in: Proc. of the 13th International Symposium on Reliable Distributed Systems (1994) pp. 34-41.
Starfish Home Page, http://dsl.cs.technion.ac.il/Starfish.
Tandem Home Page, http://www.tandem.com.
The Ensemble Home Page, http://www.cs.cornell.edu/Info/Projects/Ensemble.
The OCaml Home Page, http://pauillac.inria.fr/ocaml.
Virtual Interface (VI) Architecture Home Page, http://www.viarch.org.
Y.M. Wang and W.K. Fuchs, Scheduling message processing for reducing rollback propagation, in: Proc. IEEE Fault-Tolerance Computing Symposium, July 1992, pp. 204-211.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Agbaria, A., Friedman, R. Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations. Cluster Computing 6, 227–236 (2003). https://doi.org/10.1023/A:1023540604208
Issue Date:
DOI: https://doi.org/10.1023/A:1023540604208