Handling multiple faults in wormhole mesh networks
We present a fault tolerant method tailored for n-dimensional mesh networks that is able to handle multiple faults, even for two dimensional meshes. The method does not require existence of virtual channels. The traditional way of achieving fault tolerance based on adaptivity and adding virtual channels as the main mechanisms, has not shown the ability to handle multiple faults in wormhole mesh networks. In this paper we propose another strategy to provide high degree of fault-tolerance, we describe a technique which alters the routing function on the fly. The alteration action is always taken locally and distributed to a limited number of non-neighbor nodes.
Unable to display preview. Download preview PDF.
- 5.W. J. Dally and C. L. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers, C-36(5):547–553, 1987.Google Scholar
- 6.B.V. Dao, J. Duato, and S. Yalamanchili. Configurable flow control mechanisms for fault-tolerant routing. In Proceedings of the 22nd International Symposium on Computer Architecture, pages 220–229. ACM Press, 1995.Google Scholar
- 7.J. Duato. A necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. Int. Conf. on Parallel Processing, I:142–149, Aug. 1994.Google Scholar
- 10.C. J. Glass and L. M. Ni. The turn model for adaptive routing. In Proceedings of the 19th International Simposium on Computer Architechture, pages 278–287. IEEE CS Press, California, 1992.Google Scholar
- 11.C. J. Glass and L. M. Ni. Fault-tolerant wormhole routing in meshes. In Twenty-Third Annual Int. Symp. on Fault-Tolerant Computing, pages 240–249, 1993.Google Scholar
- 12.C. J. Glass and L. M. Ni. The turn model for adaptive routing. Journal of the Association for Computing Machinery, 41(5):874–902, 1994.Google Scholar
- 14.IEEE 1355-1995. IEEE standard for Heterogeneous InterConnect (HIC) (Low cost, low latency scalable serial interconnect for parallel system construction), 1995.Google Scholar
- 18.M. D. May, P. W. Thompson, and P. H. Welch, editors. Networks, routers and transputers: function performance and application. IOS Press, 1993.Google Scholar
- 20.Paragon XP/S product overview. Intel Corp., Supercomputer Systems Div, 1991.Google Scholar
- 21.C. L. Seitz, W. C. Athas, C. M. Flaig, A. J. Martin, J. Seizovic, C. S. Steele, and W.-K. Su. The architecture and programming of the Ametek series 2010 multicomputer. In Proceedings of the Third Conference Hypercube Concurrent Computers and Applications, Pasadena (California), volume I, pages 33–36, 1988.Google Scholar
- 22.T. Skeie. Topics in Interconnect Networking. PhD Thesis, ISBN 82-7368-190-4, Dept. of Informatics, University of Oslo, 1998.Google Scholar
- 23.P. W. Thompson and J. D. Lewis. The STC104 asynchronous packet switch. VLSI Design, 2(4):305–314, 1995.Google Scholar