Handling multiple faults in wormhole mesh networks

  • Tor Skeie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1470)


We present a fault tolerant method tailored for n-dimensional mesh networks that is able to handle multiple faults, even for two dimensional meshes. The method does not require existence of virtual channels. The traditional way of achieving fault tolerance based on adaptivity and adding virtual channels as the main mechanisms, has not shown the ability to handle multiple faults in wormhole mesh networks. In this paper we propose another strategy to provide high degree of fault-tolerance, we describe a technique which alters the routing function on the fly. The alteration action is always taken locally and distributed to a limited number of non-neighbor nodes.


  1. 1.
    R. V. Boppana and S. Chalasani. Fault-tolerant wormhole routing algorithms for mesh networks. IEEE Transactions on Computers, 44(7):848–864, 1995.MATHCrossRefGoogle Scholar
  2. 2.
    S. Chalasani and R. V. Boppana Communication in Multicomputers with Nonconvex Faults. IEEE Transactions on Computers, 46(5):616–622, 1997.MathSciNetCrossRefGoogle Scholar
  3. 3.
    A. A. Chien and J. H. Kim. Planar-adaptive routing: Low-cost adaptive networks for multiprocessors. Journal of the Association for Computing Machinery, 42(1):91–123, 1995.MATHGoogle Scholar
  4. 4.
    W. J. Dally and H. Aoki. Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Transactions on Parallel and Distributed Systems, 4(4):466–475, 1993.CrossRefGoogle Scholar
  5. 5.
    W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5):547–553, 1987.Google Scholar
  6. 6.
    B.V. Dao, J. Duato, and S. Yalamanchili. Configurable flow control mechanisms for fault-tolerant routing. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 220–229. ACM Press, 1995.Google Scholar
  7. 7.
    J. Duato. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. Int. Conf. on Parallel Processing, I:142–149, Aug. 1994.Google Scholar
  8. 8.
    J. Duato. A theory to increase the effective redundancy in wormhole networks. Parallel Processing Letters, 4:125–138, 1994.CrossRefGoogle Scholar
  9. 9.
    P. T. Gaughan and S. Yalamanchili. A family of fault-tolerant routing protocols for direct multiprocessor networks. IEEE Transactions on Parallel and Distributed Systems, 6(5):482–497, 1995.CrossRefGoogle Scholar
  10. 10.
    C. J. Glass and L. M. Ni. The turn model for adaptive routing. In Proceedings of the 19th International Simposium on Computer Architechture, pages 278–287. IEEE CS Press, California, 1992.Google Scholar
  11. 11.
    C. J. Glass and L. M. Ni. Fault-tolerant wormhole routing in meshes. In Twenty-Third Annual Int. Symp. on Fault-Tolerant Computing, pages 240–249, 1993.Google Scholar
  12. 12.
    C. J. Glass and L. M. Ni. The turn model for adaptive routing. Journal of the Association for Computing Machinery, 41(5):874–902, 1994.Google Scholar
  13. 13.
    C. J. Glass and L. M. Ni. Fault-tolerant wormhole routing in meshes without virtual channels. IEEE Transactions on Parallel and Distributed Systems, 7(6):620–636, June 1996.CrossRefGoogle Scholar
  14. 14.
    IEEE 1355-1995. IEEE standard for Heterogeneous InterConnect (HIC) (Low cost, low latency scalable serial interconnect for parallel system construction), 1995.Google Scholar
  15. 15.
    P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3:267–286, 1979.MATHMathSciNetCrossRefGoogle Scholar
  16. 16.
    D. H. Linder and J. C. Harden. An adaptive and fault tolerant wormhole routing strategy for k-ary n-cubes. IEEE Transactions on Computers, 40(1):2–12, 1991.MathSciNetCrossRefGoogle Scholar
  17. 17.
    O. Lysne, T. Skeie and T. Waadeland. One-Fault Tolerance and Beyond in Wormhole Routed Meshes. Microprocessors and Microsystems, Elsevier, 21(7–8):471–481, 1998.CrossRefGoogle Scholar
  18. 18.
    M. D. May, P. W. Thompson, and P. H. Welch, editors. Networks, routers and transputers: function performance and application. IOS Press, 1993.Google Scholar
  19. 19.
    L. M. Ni and P.K. McKinley. A survey of wormhole routing techniques in direct networks. Computer, 26:62–76, 1993.CrossRefGoogle Scholar
  20. 20.
    Paragon XP/S product overview. Intel Corp., Supercomputer Systems Div, 1991.Google Scholar
  21. 21.
    C. L. Seitz, W. C. Athas, C. M. Flaig, A. J. Martin, J. Seizovic, C. S. Steele, and W.-K. Su. The architecture and programming of the Ametek series 2010 multicomputer. In Proceedings of the Third Conference Hypercube Concurrent Computers and Applications, Pasadena (California), volume I, pages 33–36, 1988.Google Scholar
  22. 22.
    T. Skeie. Topics in Interconnect Networking. PhD Thesis, ISBN 82-7368-190-4, Dept. of Informatics, University of Oslo, 1998.Google Scholar
  23. 23.
    P. W. Thompson and J. D. Lewis. The STC104 asynchronous packet switch. VLSI Design, 2(4):305–314, 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Tor Skeie
    • 1
  1. 1.Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations