Fault-tolerant message routing for multiprocessors

  • Lev Zakrevski
  • Mark Karpovsky
Workshop on Fault-Tolerant Parallel and Distributed Systems Dimiter Avresky, Boston University David B. Kaeli, Notheastern University
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1388)


In this paper the problem of fault-tolerant message routing in two-dimensional meshes, with each inner node having 4 neighbors, is investigated. It is assumed that some nodes/links can be faulty, so it is necessary to route messages, using local information at each step. A new and efficient algorithm is proposed to solve this problem. This algorithm is local and consists of pre-routing and routing stages. The pre-routing algorithm is implemented off-line. The complexity of the pre-routing stage is O(W), where N is the number of nodes in the system, and t is the number of faulty nodes. The complexity of the online routing stage (the size of the routing table stored in the local memory) is O(t). The pre-routing algorithm is performed only once, after a new fault is detected. The algorithm allows 100% of deliverable messages to be delivered in the presence of faulty nodes with no deadlocks or lifelocks. No nodes are declared unsafe. The main idea is to construct fault free rectangular clusters during the pre-routing stage and store the information about their boundaries in local memories. At the routing stage the direction for sending a message at any node is determined by a cluster to which the destination node belongs. The algorithm is generalized on the case of multidimensional meshes.

Key Words

Fault-tolerant network computing multiprocessors meshes routing algorithms adaptive routing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    C. J. Class and L. M.Ni,The Turn Model for Adaptive Routing, Proc. of the 19th Annual Int. Symp. on Computer Architecture, pp. 278–286, May 1992.Google Scholar
  2. 2.
    C. Cunningham and D. Avresky, Fault-Tolerant Adaptive Routing for Two-Dimensional Meshes, Proc. of First Int. Symp. on High Performance Computing Architecture, Raleigh, North Carolina, USA, January 1995.Google Scholar
  3. 3.
    C. Cunningham and D. Avresky, Fault-Tolerant Adaptive Broadcasting and Multicasting using Wormhole Routing in Two-Dimensional Meshes, Technical Report 95-033, Department of Computer Science, Texas A&M University.Google Scholar
  4. 4.
    J. Duato, A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks, Proc. of Int. Conf on Parallel Processing, vol. I., pp. 142–149, August 1994.Google Scholar
  5. 5.
    R.V. Boppana and S. Chalasani, A Comparison of Adaptive Wormhole Routing Algorithms, Computer Architecture News, 21(2), pp. 351–360, May 1993.CrossRefGoogle Scholar
  6. 6.
    S. Chalasani and R V. Boppana, Communication in Multicomputers with Nonconvex Faults, IEEE Trans. on Computers, vol. 46, pp. 616–622, May 1997.CrossRefGoogle Scholar
  7. 7.
    H.-L. Chen and N.-F. Tzeng, Subcube determination in faulty hypercubes, IEEE Trans. on Comput., vol.46, pp. 871–879, August 1997.CrossRefGoogle Scholar
  8. 8.
    L. M. Ni and P.K. McKinley, A Survey of Wormhole Routing Techniques in Directed Networks, Computer, vol. 26, pp. 62–76, February 1993.CrossRefGoogle Scholar
  9. 9.
    W. Dally and C.L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Trans. on Comput., vol. 36, pp.547–553, May 1987.Google Scholar
  10. 10.
    Y.M. Boura and C.R. Das,Fault-Tolerant Routing in Mesh Networks, Proc. of Int. Conf. on Parallel Processing, vol. O., pp. 106–109, August 1995.Google Scholar
  11. 11.
    R.V. Boppana and S. Chalasani, Fault-Tolerant Wormhole Routing Algorithms in Mesh Networks, IEEE Trans. on Comput., vol. 44, pp.848–864, July 1995.CrossRefGoogle Scholar
  12. 12.
    W.J. Dally and H. Aoki, Deadlock-Free Adaptive Routing in Multiprocessor Networks Using Virtual Channels, IEEE Trans. on Parallel and Distibuted Systems, vol. 44, pp. 66–475, April 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Lev Zakrevski
    • 1
  • Mark Karpovsky
    • 1
  1. 1.Research Lab. on Reliable ComputingBoston University, Department of Computer EngineeringBostonUSA

Personalised recommendations