Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4806))

Abstract

The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. However, as the number of components increases, so does the probability of failure. The logical network topologies must also support the fault-tolerant capability in such dynamic environments. This paper presents a self-healing mechanism to improve the fault-tolerant capability of a Binomial graph (BMG) network. The self-healing mechanism protects BMG from network bisection and helps maintain optimal routing even in failure circumstances. The experimental results show that self-healing with an adaptive method significantly reduces the overhead from reconstructing the networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dongarra, J.J., Meuer, H., Strohmaier, E.: TOP500 supercomputer sites. Supercomputer 13, 89–120 (1997)

    Google Scholar 

  2. Saad, Y., Schultz, M.H.: Topological properties of hypercubes. IEEE Transactions on Computers 37, 867–872 (1988)

    Article  Google Scholar 

  3. Banerjee, S., Sarkar, D.: Hypercube connected rings: A scalable and fault-tolerant logical topology for optical networks 24, 1060–1079 (2001)

    Google Scholar 

  4. Malluhi, Q., Bayoumi, M.: The hierarchical hypercube: A new interconnection topology for massively parallel systems. IEEE Transactions on Parallel and Distributed Systems 05, 17–30 (1994)

    Article  MathSciNet  Google Scholar 

  5. El-Amawy, A., Latifi, S.: Properties and performance of folded hypercubes. IEEE Transactions on Parallel and Distributed Systems 2, 31–42 (1991)

    Article  Google Scholar 

  6. Kumar, J.M., Patnaik, L.M.: Extended hypercube: A hierarchical interconnection network of hypercubes. IEEE Transactions on Parallel and Distributed Systems 3, 45–57 (1992)

    Article  Google Scholar 

  7. Tzeng, N.F., Wei, S.: Enhanced hypercubes. IEEE Transactions on Computers 40, 284–294 (1991)

    Article  Google Scholar 

  8. Preparata, F.P., Vuillemin, J.: The cube-connected cycles: a versatile network for parallel computation. Commun. ACM 24, 300–309 (1981)

    Article  MathSciNet  Google Scholar 

  9. Louri, A., Neocleous, C.: A spanning bus connected hypercube: A new scalable optical interconnection network for multiprocessors and massively parallel systems. IEEE/OSA Journal of Lightwave Technology 15, 1241–1252 (1997)

    Article  Google Scholar 

  10. Louri, A., Sung, H.: An optical multi-mesh hypercube: A scalable optical interconnection network for massively parallel computing. Journal of Lightware Technology 12, 704–716 (1994)

    Article  Google Scholar 

  11. Ohring, S., Das, S.K.: Folded petersen cube networks: New competitors for the hypercubes. IEEE Transactions on Parallel and Distributed Systems 7, 151–168 (1996)

    Article  Google Scholar 

  12. Sivarajan, K.N., Ramaswami, R.: Lightwave networks based on de bruijn graphs. IEEE/ACM Trans. Netw. 2, 70–79 (1994)

    Article  Google Scholar 

  13. Ganesan, E., Pradhan, D.K.: The hyper-debruijn networks: Scalable versatile architecture. IEEE Transactions on Parallel and Distributed Systems 04, 962–978 (1993)

    Article  Google Scholar 

  14. Chen, C., Agrawal, D.P., Burke, J.R.: dbcube: A new class of hierarchical multiprocessor interconnection networks with area efficient layout. IEEE Trans. Parallel Distrib. Syst. 4, 1332–1344 (1993)

    Article  Google Scholar 

  15. Panchapakesan, G., Sengupta, A.: On a lightwave network topology using kautz digraphs. IEEE Transactions on Computers 48, 1131–1138 (1999)

    Article  MathSciNet  Google Scholar 

  16. Karol, M.J.: Optical interconnection using shufflenet multihop networks in multi-connected ring topologies. In: SIGCOMM 1988: Symposium proceedings on Communications architectures and protocols, pp. 25–34. ACM Press, New York (1988)

    Chapter  Google Scholar 

  17. Maxemchuck, N.F.: Regular mesh topologies in local and metropolitan area networks. AT&T Technical Journal 64, 1659–1685 (1985)

    Google Scholar 

  18. Campbell, S., Kumar, M., Olariu, S.: The hierarchical cliques interconnection network. Journal of Parallel and Distributed Computing 64, 16–28 (2004)

    Article  MATH  Google Scholar 

  19. Goodman, J.R., Sequin, C.H.: Hypertree: A multiprocessor interconnection topology. IEEE Transactions on Computers 30, 923–933 (1981)

    Article  Google Scholar 

  20. Angskun, T., Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Dongarra, J.: Scalable fault tolerant protocol for parallel runtime environments. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface. LNCS, vol. 4192, pp. 141–149. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Angskun, T., Fagg, G.E., Bosilca, G., Pješivac-Grbović, J., Dongarra, J.J.: Self-healing network for scalable fault tolerant runtime environments. In: Proceedings of 6th Austrian-Hungarian workshop on distributed and parallel systems, Innsbruck, Austria, Springer, Heidelberg (2006)

    Google Scholar 

  22. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content addressable network. Technical Report TR-00-010, Berkeley, CA (2000)

    Google Scholar 

  23. Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149–160 (2001)

    Google Scholar 

  24. Harvey, N.J.A., Jones, M.B., Marvin Theimer, S.S., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: USENIX Symposium on Internet Technologies and Systems. proceedings of the 4th USENIX Symposium on Internet Technol ogies and Systems (USITS 2003), Seattle, WA, USA, pp. 113–126 (2003)

    Google Scholar 

  25. Maymounkov, P., Mazieres, D.: Kademlia: A peer-to-peer information system based on the xor metric. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  26. Malkhi, D., Naor, M., Ratajczak, D.R.: Viceroy: A scalable and dynamic emulation of the butterfly. In: Proceedings of the 21st ACM Symposium on Principles of Distributed Comput ing, pp. 183–192. ACM Press, New York (2002)

    Google Scholar 

  27. Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Google Scholar 

  28. Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An infrastructure for fault-tolerant wide-area location and routing. Technical Report UCB/CSD-01-1141, UC Berkeley (2001)

    Google Scholar 

  29. Angskun, T., Bosilca, G., Dongarra, J.: Binomial graph: A scalable and fault-tolerant logical network topology. In: ISPA 2007. LNCS, pp. 471–482. Springer, Heidelberg (2007)

    Google Scholar 

  30. Bermond, J.C., Comellas, F., Hsu, D.F.: Distributed loop computer networks: A survey. Journal of Parallel and Distributed Computing 24, 2–10 (1995)

    Article  Google Scholar 

  31. Beivide, R., Herrada, E., Balcázar, J.L., Arruabarrena, A.: Optimal distance networks of low degree for parallel computers. IEEE Trans. Comput. 40, 1109–1124 (1991)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Robert Meersman Zahir Tari Pilar Herrero

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Angskun, T., Bosilca, G., Dongarra, J. (2007). Self-healing in Binomial Graph Networks. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops. OTM 2007. Lecture Notes in Computer Science, vol 4806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76890-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76890-6_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76889-0

  • Online ISBN: 978-3-540-76890-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics