Abstract
Torus networks have been widely used by modern commercial supercomputers due to low node degree and linear scalability cost. Meanwhile, the ring in each dimension of a torus creates cyclic channel dependencies, which pose challenges to the design of deadlock-free routing and virtual channel allocation schemes. Existing virtual channel allocation schemes such as dateline employ two virtual channels to avoid deadlock in a ring. While the unbalanced utilization of virtual channels caused by these deadlock avoidance schemes brings lots of performance pathologies. This paper focuses on improving deadlock-free routing algorithms in torus by balancing virtual channel utilization. First, we propose BVCU, a novel virtual channel allocation scheme that can balance the utilization of the two virtual channels used to prevent deadlock. The technique is deadlock free and can achieve almost perfect balanced utilization. Second, we propose a balanced dimension-order routing (BDOR) to provide deadlock-free deterministic routing for \(n\)-D torus by combining BVCU with dimension-order routing. Also, a balanced fully adaptive routing (BFAR) is proposed. These routing algorithms yield higher performance than existing solutions as they achieve a proper balanced utilization of virtual channels. Finally, simulation results show that the proposed methods achieve up to 16 % throughput improvement.
Similar content being viewed by others
Notes
See Sect. 5 for experimental configuration and description
The saturation point is measured as the injection rate at which the average latency is three times the zero-load latency.
References
Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann, San Francisco
Duato J, Yalamanchili S, Ni L (1997) Interconnection networks: an engineering approach. IEEE Press, New York
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor – theoretical properties and algorithms. Parallel Comput 21(11):1783–1806
Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, Alberta, Canada
Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114
Arif WM, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63
Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433
Andujar-Munoz FJ, Villar-Ortiz JA, Sanchez JL, Alfaro FJ, Duato J (2014) Building 3d torus using low-profile expansion cards. IEEE Trans Comput. doi:10.1109/TC.2013.155
Scott SL, Thorson GM (1996) The cray t3e network: adaptive routing in a high performance 3d Torus. In: Proceedings of high-performance interconnects symposium, hot interconnects IV, Stanford University
Abts D (2011) The cray xt4 and seastar 3-d torus interconnect. In: Encyclopedia of parallel computing. doi:10.1007/978-0-387-09766-4_22
Adiga NR, Blumrich MA et al (2005) Blue gene/l torus interconnection network. IBM J Res Dev 49(2):265–276
Ajima YS, Sumimoto S, Shimizu T (2009) Tofu: a 6-d mesh/torus interconnect for exascale computers. Computer 42(11):36–40
Towles B, Grossman JP, Greskamp B, Shaw DE (2014) Unifying on-chip and inter-node switching within the Anton 2 network. In: Proceeding of the 41st annual international symposium on computer architecture, ISCA 2014. IEEE Press, Piscataway, pp 1–12
http://www.eurotech.com.cn/en/hpc/hpc+solutions/data+center+hpc
Liu R, Gu HX, Yu X, Nian X (2013) Distributed flow scheduling in energy-aware datacenter networks. IEEE Commun Lett 17(4):801–804
Dally WJ (1992) Virtual-channel flow control. IEEE Trans Parallel Distrib Syst 3(3):194–205
Bolding K (1992) Non-uniformities introduced by virtual channel deadlock prevention. Technical Report UW-CSE-92-07-07, University of Washington
Alonso MG, Xiang D, Flich J, Yu ZG, Duato J (2014) Achieving balanced buffer utilization with a proper co-design of flow control and routing algorithm. In: Proceeding of the 8th IEEE/ACM international symposium on networks-on-chip, NOCS 2014, pp 25–32
Yu ZG, Xiang D, Wang XY (2013) VCBR: virtual channel balanced routing in torus networks. In: Proceedings of 2013 IEEE international conference on high performance computing and communications, pp 1359–1365
Duato J (1993) A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 4(12):1320–1331
Wang XY, Xiang D, Yu ZG (2013) A cost-effective interconnect architecture for interconnection network. IETE J Res 59(2):109–117
Xiang D, Pan Y, Wang Q, Chen Z (2008) Deadlock-free fully adaptive routing in 2-dimensional tori based on new virtual network partitioning scheme. In: Proceedings of 28th international conference of distributed computing systems, pp 454–461
Linder DH, Harden JC (1991) An adaptive and fault tolerant wormhole routing strategy for k-ary n-cubes. IEEE Trans Comput 40(1):2–12
Jesshope CR, Miller PR, Yantchev JT (1989) High performance communication processor networks. In: Proceedings of the 16th international symposium on computer architecture, ISCA 1989, pp 150–157
Dally WJ, Seitz GL (1986) The torus routing chip. Distrib Comput 1(4):187–196
Dally WJ, Seitz GL (1987) Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans Comput 36(5):547–553
Brirrittella MS, Kessler RE, Oberlin SM, Passint RS, Thorson G (1996) System for allocating messages between virtual channels to avoid deadlock and to optimize the amount of message traffic on each type of virtual channel. United States Patent 5583990
Jiang N, Becker DU, Michelogiannakis G, Balfour J, Towles B, Kim J, Dally WJ (2013) A detailed and flexible cycle-accurate network-on-chip simulator. In: Proceedings of the 2013 IEEE international symposium on performance analysis of systems and software, ISPASS 2013, pp 86–96
Verbeek F, Schialtz J (2011) A comment on a necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):1775–1776
Verbeek F, Schialtz J (2011) On necessary and sufficient conditions for deadlock-free routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):2022–2032
Wang Y, Zhang M, Fu Q, Pang Z (2012) Adaptive bubble scheme with minimal buffers in torus networks. In: Proceedings of international conference on high performance computing and communication, pp 914–919
Schwiebert L, Jayasimha DN (1996) A necessary and sufficient condition for deadlock-free wormhole routing. J Parallel Distrib Comput 32(1):103–117
Ma S, Wang ZY, Jerger NE, Shen L, Xiao N (2014) Novel flow control for fully adaptive routing in cache-coherent NoCs. IEEE Trans Parallel Distrib Syst 25(9):2397–2407
Singh Dally WJ, Gupta AK (2003) GOAL: a load-balanced adaptive routing algorithm for torus network. In: Proceedings of international symposium on computer architecture, ISCA 2003, pp 194–205
Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193
Arabnia HR (1996) Distributed stereocorrelation algorithm. Int J Comput Commun (Elsevier Science) 1996:707–712
Hu J, Marculescu R (2004) DyAD: smart routing for networks-on-chip. In: Proceedings of the design automation conference, DAC 2004, pp 260–263
Li M, Zeng Q, Jone WB (2006) DyXY: a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: Proceedings of the design automation conference, DAC 2006, pp 849–852
Xiang D (2011) Deadlock-free adaptive routing in meshes with fault-tolerance ability based on channel overlapping. IEEE Trans Dependable Secur Comput 8(1):74–88
Glass CJ, Ni L (1994) The turn model for adaptive routing. J ACM 41(5):874–902
Chiu GM (2000) The odd-even turn model for adaptive routing. IEEE Trans Parallel Distrib Syst 11(7):729–738
Gu HX, Liu Z, Wang K (2006) Distribute adaptive routing in torus networks. J Xidian Univ (Science) 33(3):352–358
Fu B, Han YH, Li HW (2011) An abacus turn model for time/space-efficient reconfigurable routing. In: Proceedings of international symposium on computer architecture, ISCA 2011, pp 259–270
Luo W, Xiang D (2012) An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parallel Distrib Syst 23(5):800–808
Puente V, Izu C, Beivide R, Gregorio JA, Vallejo F, Prellezo JM (2001) The adaptive bubble router. J Parallel Distrib Comput 61(9):1180–1208
Chen LZ, Wang RS, Pinkston TM (2011) Critical Bubble Scheme: an efficient implementation of globally aware network flow control. In: Proceedings of the 2011 IEEE international parallel distributed processing symposium, IPDPS 2011, pp 592–603
Ma S, Wang ZY, Jerger NE (2014) Leaving one slot empty: flit bubble flow control for torus cache-coherent NoCs. IEEE Trans Comput. doi:10.1109/TC.2013.2295523
Chen LZ, Pinkston TM (2013) Worm-bubble flow control. In: Proceedings of the 19th IEEE international symposium on high-performance computer architecture, HPCA 2013, pp 366–377
Ausavarungnirun R, Chris Fallin, Yu XY, Chang K, Nazario G, Das R, Loh G, Mutlu O (2014) Design and evaluation of hierarchical rings with deflection routing. In: Proceedings of the 26th international symposium on computer architecture and high performance computing, SBAC-PAD 2014, pp 230–237
Acknowledgments
We sincerely thank the anonymous reviewers for their helpful comments and suggestions. This work is supported in part by the National Science Foundation of China under Grants 60910003, 61170063, 61373021 and 61402086 and the research grant from Education Ministry under Grant 20111081042.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, Z., Xiang, D. & Wang, X. Balancing virtual channel utilization for deadlock-free routing in torus networks. J Supercomput 71, 3094–3115 (2015). https://doi.org/10.1007/s11227-015-1428-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1428-6