Skip to main content

Self-Healing Network for Scalable Fault Tolerant Runtime Environments

  • Conference paper
Distributed and Parallel Systems

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T. Angskun, G. E. Fagg, G. Bosilca, J. Pjesivac-Grbovic, and J. Dongarra. Scalable fault tolerant protocol for parallel runtime environments. In Proceedings of the 13th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Bonn, Germany, September 2006. Springer-Verlag.

    Google Scholar 

  2. M. Beck, J. J. Dongarra, G. E. Fagg, G. A. Geist, P. Gray, J. Kohl, M. Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. L. Scott, and V. Sunderam. HARNESS: A next generation distributed virtual machine. Future Generation Computer Systems, 15(5-6):571-582, 1999.

    Article  Google Scholar 

  3. G. Burns, R. Daoud, and J. Vaigl. LAM: An Open Cluster Environment for MPI. In Proceedings Supercomputing Symposium, pages 379-386, 1994.

    Google Scholar 

  4. R. Butler, W. Gropp, and E. L. Lusk. A scalable process-management environment for parallel program. In Proceedings of the 7th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 168-175, London, UK, 2000. Springer-Verlag.

    Google Scholar 

  5. R. H. Castain, T. S. Woodall, D. J. Daniel, J. M. Squyres, B. Barrett, and G. E. Fagg. The open run-time environment (openrte): A transparent multi-cluster environment for high-performance computing. In Proceedings 12th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Sorrento(Naples), Italy, September 2005. Springer-Verlag.

    Google Scholar 

  6. J. J. Dongarra, H. Meuer, and E. Strohmaier. TOP500 supercomputer sites. Supercomputer, 13(1):89-120, 1997.

    Google Scholar 

  7. G. E. Fagg, E. Gabriel, G. Bosilca, T. Angskun, Z. Chen, J. Pjesivac-Grbovic, K. London, and J. Dongarra. Extending the mpi specification for process fault tolerance on high performance computing systems. In Proceedings of the International Supercomputer Conference (ICS) 2004, Heidelberg, Germany, June 2006. Primeur.

    Google Scholar 

  8. E. Gabriel, G. E. Fagg, G. Bosilca, T. Angskun, J. J. Dongarra, J. M. Squyres, V. Sahay, P. Kambadur, B. Barrett, A. Lumsdaine, R. H. Castain, D. J. Daniel, R. L. Graham, and T. S. Woodall. Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings 11th European PVM/MPI User’s Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 97-104, Budapest, Hungary, September 2004. Springer-Verlag.

    Google Scholar 

  9. W. Gropp, E. Lusk, N. Doss, and A. Skjellum. A high - performance, portable implementation of MPI message passing interface standard. Parallel Computing, 22(6):789-828, 1996.

    Article  MATH  Google Scholar 

  10. I. Gupta, R. van Renesse, and K. Birman. Scalable fault-tolerant aggregation in large process groups. In Proceedings of The International Conference on Dependable Systems and Networks (DSN), pages 433-442, 2001.

    Google Scholar 

  11. MPI Forum. MPI: A message-passing interface standard. Technical report, 1994.

    Google Scholar 

  12. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA, 2000.

    Google Scholar 

  13. R. V. Renesse, Y. Minsky, and M. Hayden. A gossip-style failure detection service. Technical Report TR98-1687, 28, 1998

    Google Scholar 

  14. A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. Lecture Notes in Computer Science, 2218:329-350, 2001.

    Article  Google Scholar 

  15. I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. Chord: A scalable Peer-To-Peer lookup service for internet applications. In Proceedings of the 2001 ACM SIGCOMM Conference, pages 149-160, 2001.

    Google Scholar 

  16. B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, UC Berkeley, April 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Science+Business Media, LLC

About this paper

Cite this paper

Angskun, T., Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Dongarra, J.J. (2007). Self-Healing Network for Scalable Fault Tolerant Runtime Environments. In: Kacsuk, P., Fahringer, T., Németh, Z. (eds) Distributed and Parallel Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69858-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-69858-8_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-69857-1

  • Online ISBN: 978-0-387-69858-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics