Skip to main content
Log in

Balancing virtual channel utilization for deadlock-free routing in torus networks

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Torus networks have been widely used by modern commercial supercomputers due to low node degree and linear scalability cost. Meanwhile, the ring in each dimension of a torus creates cyclic channel dependencies, which pose challenges to the design of deadlock-free routing and virtual channel allocation schemes. Existing virtual channel allocation schemes such as dateline employ two virtual channels to avoid deadlock in a ring. While the unbalanced utilization of virtual channels caused by these deadlock avoidance schemes brings lots of performance pathologies. This paper focuses on improving deadlock-free routing algorithms in torus by balancing virtual channel utilization. First, we propose BVCU, a novel virtual channel allocation scheme that can balance the utilization of the two virtual channels used to prevent deadlock. The technique is deadlock free and can achieve almost perfect balanced utilization. Second, we propose a balanced dimension-order routing (BDOR) to provide deadlock-free deterministic routing for \(n\)-D torus by combining BVCU with dimension-order routing. Also, a balanced fully adaptive routing (BFAR) is proposed. These routing algorithms yield higher performance than existing solutions as they achieve a proper balanced utilization of virtual channels. Finally, simulation results show that the proposed methods achieve up to 16 % throughput improvement.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. See Sect. 5 for experimental configuration and description

  2. The saturation point is measured as the injection rate at which the average latency is three times the zero-load latency.

    Fig. 6
    figure 6

    The performance comparison of different virtual channel allocation schemes for 16-node 1-D torus under different traffic patterns. a Uniform, b neignbor, c transpose, d bit complement

References

  1. Dally WJ, Towles B (2004) Principles and practices of interconnection networks. Morgan Kaufmann, San Francisco

    Google Scholar 

  2. Duato J, Yalamanchili S, Ni L (1997) Interconnection networks: an engineering approach. IEEE Press, New York

    Google Scholar 

  3. Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor – theoretical properties and algorithms. Parallel Comput 21(11):1783–1806

    Article  Google Scholar 

  4. Arabnia HR, Smith JW (1993) A reconfigurable interconnection network for imaging operations and its implementation using a multi-stage switching box. In: Proceedings of the 7th annual international high performance computing conference. The 1993 high performance computing: new horizons supercomputing symposium, Alberta, Canada

  5. Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  6. Arif WM, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multi-ring network. J Supercomput 25(1):43–63

    Article  MATH  Google Scholar 

  7. Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433

    Article  Google Scholar 

  8. Andujar-Munoz FJ, Villar-Ortiz JA, Sanchez JL, Alfaro FJ, Duato J (2014) Building 3d torus using low-profile expansion cards. IEEE Trans Comput. doi:10.1109/TC.2013.155

  9. Scott SL, Thorson GM (1996) The cray t3e network: adaptive routing in a high performance 3d Torus. In: Proceedings of high-performance interconnects symposium, hot interconnects IV, Stanford University

  10. Abts D (2011) The cray xt4 and seastar 3-d torus interconnect. In: Encyclopedia of parallel computing. doi:10.1007/978-0-387-09766-4_22

  11. Adiga NR, Blumrich MA et al (2005) Blue gene/l torus interconnection network. IBM J Res Dev 49(2):265–276

    Article  Google Scholar 

  12. Ajima YS, Sumimoto S, Shimizu T (2009) Tofu: a 6-d mesh/torus interconnect for exascale computers. Computer 42(11):36–40

    Article  Google Scholar 

  13. Towles B, Grossman JP, Greskamp B, Shaw DE (2014) Unifying on-chip and inter-node switching within the Anton 2 network. In: Proceeding of the 41st annual international symposium on computer architecture, ISCA 2014. IEEE Press, Piscataway, pp 1–12

  14. http://www.eurotech.com.cn/en/hpc/hpc+solutions/data+center+hpc

  15. Liu R, Gu HX, Yu X, Nian X (2013) Distributed flow scheduling in energy-aware datacenter networks. IEEE Commun Lett 17(4):801–804

    Article  MATH  Google Scholar 

  16. Dally WJ (1992) Virtual-channel flow control. IEEE Trans Parallel Distrib Syst 3(3):194–205

    Article  Google Scholar 

  17. Bolding K (1992) Non-uniformities introduced by virtual channel deadlock prevention. Technical Report UW-CSE-92-07-07, University of Washington

  18. Alonso MG, Xiang D, Flich J, Yu ZG, Duato J (2014) Achieving balanced buffer utilization with a proper co-design of flow control and routing algorithm. In: Proceeding of the 8th IEEE/ACM international symposium on networks-on-chip, NOCS 2014, pp 25–32

  19. Yu ZG, Xiang D, Wang XY (2013) VCBR: virtual channel balanced routing in torus networks. In: Proceedings of 2013 IEEE international conference on high performance computing and communications, pp 1359–1365

  20. http://www.top500.org/

  21. Duato J (1993) A new theory of deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 4(12):1320–1331

    Article  Google Scholar 

  22. Wang XY, Xiang D, Yu ZG (2013) A cost-effective interconnect architecture for interconnection network. IETE J Res 59(2):109–117

    Article  Google Scholar 

  23. Xiang D, Pan Y, Wang Q, Chen Z (2008) Deadlock-free fully adaptive routing in 2-dimensional tori based on new virtual network partitioning scheme. In: Proceedings of 28th international conference of distributed computing systems, pp 454–461

  24. Linder DH, Harden JC (1991) An adaptive and fault tolerant wormhole routing strategy for k-ary n-cubes. IEEE Trans Comput 40(1):2–12

    Article  MathSciNet  Google Scholar 

  25. Jesshope CR, Miller PR, Yantchev JT (1989) High performance communication processor networks. In: Proceedings of the 16th international symposium on computer architecture, ISCA 1989, pp 150–157

  26. Dally WJ, Seitz GL (1986) The torus routing chip. Distrib Comput 1(4):187–196

    Article  Google Scholar 

  27. Dally WJ, Seitz GL (1987) Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans Comput 36(5):547–553

    Article  MATH  Google Scholar 

  28. Brirrittella MS, Kessler RE, Oberlin SM, Passint RS, Thorson G (1996) System for allocating messages between virtual channels to avoid deadlock and to optimize the amount of message traffic on each type of virtual channel. United States Patent 5583990

  29. Jiang N, Becker DU, Michelogiannakis G, Balfour J, Towles B, Kim J, Dally WJ (2013) A detailed and flexible cycle-accurate network-on-chip simulator. In: Proceedings of the 2013 IEEE international symposium on performance analysis of systems and software, ISPASS 2013, pp 86–96

  30. Verbeek F, Schialtz J (2011) A comment on a necessary and sufficient condition for deadlock-free adaptive routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):1775–1776

    Article  Google Scholar 

  31. Verbeek F, Schialtz J (2011) On necessary and sufficient conditions for deadlock-free routing in wormhole networks. IEEE Trans Parallel Distrib Syst 22(10):2022–2032

    Article  Google Scholar 

  32. Wang Y, Zhang M, Fu Q, Pang Z (2012) Adaptive bubble scheme with minimal buffers in torus networks. In: Proceedings of international conference on high performance computing and communication, pp 914–919

  33. Schwiebert L, Jayasimha DN (1996) A necessary and sufficient condition for deadlock-free wormhole routing. J Parallel Distrib Comput 32(1):103–117

    Article  Google Scholar 

  34. Ma S, Wang ZY, Jerger NE, Shen L, Xiao N (2014) Novel flow control for fully adaptive routing in cache-coherent NoCs. IEEE Trans Parallel Distrib Syst 25(9):2397–2407

    Article  Google Scholar 

  35. Singh Dally WJ, Gupta AK (2003) GOAL: a load-balanced adaptive routing algorithm for torus network. In: Proceedings of international symposium on computer architecture, ISCA 2003, pp 194–205

  36. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–193

    Article  Google Scholar 

  37. Arabnia HR (1996) Distributed stereocorrelation algorithm. Int J Comput Commun (Elsevier Science) 1996:707–712

    Article  Google Scholar 

  38. Hu J, Marculescu R (2004) DyAD: smart routing for networks-on-chip. In: Proceedings of the design automation conference, DAC 2004, pp 260–263

  39. Li M, Zeng Q, Jone WB (2006) DyXY: a proximity congestion-aware deadlock-free dynamic routing method for network on chip. In: Proceedings of the design automation conference, DAC 2006, pp 849–852

  40. Xiang D (2011) Deadlock-free adaptive routing in meshes with fault-tolerance ability based on channel overlapping. IEEE Trans Dependable Secur Comput 8(1):74–88

    Article  Google Scholar 

  41. Glass CJ, Ni L (1994) The turn model for adaptive routing. J ACM 41(5):874–902

    Article  Google Scholar 

  42. Chiu GM (2000) The odd-even turn model for adaptive routing. IEEE Trans Parallel Distrib Syst 11(7):729–738

    Article  Google Scholar 

  43. Gu HX, Liu Z, Wang K (2006) Distribute adaptive routing in torus networks. J Xidian Univ (Science) 33(3):352–358

    Google Scholar 

  44. Fu B, Han YH, Li HW (2011) An abacus turn model for time/space-efficient reconfigurable routing. In: Proceedings of international symposium on computer architecture, ISCA 2011, pp 259–270

  45. Luo W, Xiang D (2012) An efficient adaptive deadlock-free routing algorithm for torus networks. IEEE Trans Parallel Distrib Syst 23(5):800–808

    Article  Google Scholar 

  46. Puente V, Izu C, Beivide R, Gregorio JA, Vallejo F, Prellezo JM (2001) The adaptive bubble router. J Parallel Distrib Comput 61(9):1180–1208

    Article  MATH  Google Scholar 

  47. Chen LZ, Wang RS, Pinkston TM (2011) Critical Bubble Scheme: an efficient implementation of globally aware network flow control. In: Proceedings of the 2011 IEEE international parallel distributed processing symposium, IPDPS 2011, pp 592–603

  48. Ma S, Wang ZY, Jerger NE (2014) Leaving one slot empty: flit bubble flow control for torus cache-coherent NoCs. IEEE Trans Comput. doi:10.1109/TC.2013.2295523

  49. Chen LZ, Pinkston TM (2013) Worm-bubble flow control. In: Proceedings of the 19th IEEE international symposium on high-performance computer architecture, HPCA 2013, pp 366–377

  50. Ausavarungnirun R, Chris Fallin, Yu XY, Chang K, Nazario G, Das R, Loh G, Mutlu O (2014) Design and evaluation of hierarchical rings with deflection routing. In: Proceedings of the 26th international symposium on computer architecture and high performance computing, SBAC-PAD 2014, pp 230–237

Download references

Acknowledgments

We sincerely thank the anonymous reviewers for their helpful comments and suggestions. This work is supported in part by the National Science Foundation of China under Grants 60910003, 61170063, 61373021 and 61402086 and the research grant from Education Ministry under Grant 20111081042.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhigang Yu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Z., Xiang, D. & Wang, X. Balancing virtual channel utilization for deadlock-free routing in torus networks. J Supercomput 71, 3094–3115 (2015). https://doi.org/10.1007/s11227-015-1428-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1428-6

Keywords

Navigation