Fault-tolerant message routing for multiprocessors
In this paper the problem of fault-tolerant message routing in two-dimensional meshes, with each inner node having 4 neighbors, is investigated. It is assumed that some nodes/links can be faulty, so it is necessary to route messages, using local information at each step. A new and efficient algorithm is proposed to solve this problem. This algorithm is local and consists of pre-routing and routing stages. The pre-routing algorithm is implemented off-line. The complexity of the pre-routing stage is O(W), where N is the number of nodes in the system, and t is the number of faulty nodes. The complexity of the online routing stage (the size of the routing table stored in the local memory) is O(t). The pre-routing algorithm is performed only once, after a new fault is detected. The algorithm allows 100% of deliverable messages to be delivered in the presence of faulty nodes with no deadlocks or lifelocks. No nodes are declared unsafe. The main idea is to construct fault free rectangular clusters during the pre-routing stage and store the information about their boundaries in local memories. At the routing stage the direction for sending a message at any node is determined by a cluster to which the destination node belongs. The algorithm is generalized on the case of multidimensional meshes.
Key WordsFault-tolerant network computing multiprocessors meshes routing algorithms adaptive routing
Unable to display preview. Download preview PDF.
- 1.C. J. Class and L. M.Ni,The Turn Model for Adaptive Routing, Proc. of the 19th Annual Int. Symp. on Computer Architecture, pp. 278–286, May 1992.Google Scholar
- 2.C. Cunningham and D. Avresky, Fault-Tolerant Adaptive Routing for Two-Dimensional Meshes, Proc. of First Int. Symp. on High Performance Computing Architecture, Raleigh, North Carolina, USA, January 1995.Google Scholar
- 3.C. Cunningham and D. Avresky, Fault-Tolerant Adaptive Broadcasting and Multicasting using Wormhole Routing in Two-Dimensional Meshes, Technical Report 95-033, Department of Computer Science, Texas A&M University.Google Scholar
- 4.J. Duato, A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks, Proc. of Int. Conf on Parallel Processing, vol. I., pp. 142–149, August 1994.Google Scholar
- 9.W. Dally and C.L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnection Networks, IEEE Trans. on Comput., vol. 36, pp.547–553, May 1987.Google Scholar
- 10.Y.M. Boura and C.R. Das,Fault-Tolerant Routing in Mesh Networks, Proc. of Int. Conf. on Parallel Processing, vol. O., pp. 106–109, August 1995.Google Scholar
- 12.W.J. Dally and H. Aoki, Deadlock-Free Adaptive Routing in Multiprocessor Networks Using Virtual Channels, IEEE Trans. on Parallel and Distibuted Systems, vol. 44, pp. 66–475, April 1997.Google Scholar