Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
T. Angskun, G. E. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Scalable fault tolerant protocol for parallel runtime environments. In Proceedings of the 13th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Bonn, Germany, September 2006. Springer-Verlag.
M. Beck, J. J. Dongarra, G. E. Fagg, G. A. Geist, P. Gray, J. Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. L. Scott, and V. Sunderam. HARNESS: A next generation distributed virtual machine. Future Generation Computer Systems, 15(5-6):571-582, 1999.
G. Burns, R. Daoud, and J. Vaigl. LAM: An Open Cluster Environment for MPI. In Proceedings Supercomputing Symposium, pages 379-386, 1994.
R. Butler, W. Gropp, and E. L. Lusk. A scalable process-management environment for parallel program. In Proceedings of the 7th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 168-175, London, UK, 2000. Springer-Verlag.
R. H. Castain, T. S. Woodall, D. J. Daniel, J. M. Squyres, B. Barrett, and G. E. Fagg. The open run-time environment (openrte): A transparent multi-cluster environment for high-performance computing. In Proceedings 12th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Sorrento(Naples), Italy, September 2005. Springer-Verlag.
J. J. Dongarra, H. Meuer, and E. Strohmaier. TOP500 supercomputer sites. Supercomputer, 13(1):89-120, 1997.
G. E. Fagg, E. Gabriel, G. Bosilca, T. Angskun, Z. Chen, J. Pjesivac-Grbovic, K. London, and J. Dongarra. Extending the mpi specification for process fault tolerance on high performance computing systems. In Proceedings of the International Supercomputer Conference (ICS) 2004, Heidelberg, Germany, June 2006. Primeur.
E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings 11th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 97-104, Budapest, Hungary, September 2004. Springer-Verlag.
W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high - performance, portable implementation of MPI message passing interface standard. Parallel Computing, 22(6):789-828, 1996.
I. Gupta, R. van Renesse, and K. Birman. Scalable fault-tolerant aggregation in large process groups. In Proceedings of The International Conference on Dependable Systems and Networks (DSN), pages 433-442, 2001.
MPI Forum. MPI: A message-passing interface standard. Technical report, 1994.
S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA, 2000.
R. V. Renesse, Y. Minsky, and M. Hayden. A gossip-style failure detection service. Technical Report TR98-1687, 28, 1998
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. Lecture Notes in Computer Science, 2218:329-350, 2001.
I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proceedings of the 2001 ACM SIGCOMM Conference, pages 149-160, 2001.
B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, UC Berkeley, April 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this paper
Cite this paper
Angskun, T., Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Dongarra, J.J. (2007). Self-Healing Network for Scalable Fault Tolerant Runtime Environments. In: Kacsuk, P., Fahringer, T., Németh, Z. (eds) Distributed and Parallel Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69858-8_8
Download citation
DOI: https://doi.org/10.1007/978-0-387-69858-8_8
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69857-1
Online ISBN: 978-0-387-69858-8
eBook Packages: Computer ScienceComputer Science (R0)