Skip to main content
Log in

Optimal circulant graphs as low-latency network topologies

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Communication latency has become one of the determining factors for the performance of parallel clusters. To design low-latency network topologies for high-performance computing clusters, we optimize the diameters, mean path lengths, and bisection widths of circulant topologies. We obtain a series of optimal circulant topologies of size \(2^5\) through \(2^{10}\) and compare them with torus and hypercube of the same sizes and degrees. We further benchmark on a broad variety of applications including effective bandwidth, FFTE, Graph 500 and NAS parallel benchmarks to compare the optimal circulant topologies and Cartesian products of optimal circulant topologies and fully connected topologies with corresponding torus and hypercube. Simulation results demonstrate superior potentials of the optimal circulant topologies for communication-intensive applications. We also find the strengths of the Cartesian products in exploiting global communication with data traffic patterns of specific applications or internal algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Top 500 supercomputer site (2021) http://www.top500.org

  2. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117

    Google Scholar 

  3. Deng Y, Ramos AF, Hornos JEM (2012) Symmetry insights for design of supercomputer network topologies: roots and weights lattices. Int J Mod Phys B 26(31):1250169. https://doi.org/10.1142/s021797921250169x

    Article  Google Scholar 

  4. Dally W, Towles B (2003) Principles and practices of interconnection networks. Elsevier Science & Technology, Amsterdam

    Google Scholar 

  5. Brightwell R, Pedretti K, Underwood K et al (2006) SeaStar interconnect: balanced bandwidth for scalable performance. IEEE Micro 26(3):41–57. https://doi.org/10.1109/mm.2006.65

    Article  Google Scholar 

  6. IBM Blue Gene Team (2008) Overview of the IBM Blue Gene/P project. IBM J Res Dev 52(1.2):199–220. https://doi.org/10.1147/rd.521.0199

    Article  Google Scholar 

  7. Alverson R, Roweth D, Kaplan L (2010) The gemini system interconnect. In: (2010) 18th IEEE Symposium on High Performance Interconnects. IEEE, Mountain View, CA, USA,. https://doi.org/10.1109/hoti.2010.23

  8. Chen D, Parker JJ, Eisley NA et al (2011) The IBM Blue Gene/Q interconnection network and message unit. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC\(^{\prime }\)11. ACM Press, Seattle, WA, USA, https://doi.org/10.1145/2063384.2063419

  9. Ajima Y, Sumimoto S, Shimizu T (2009) Tofu: a 6D mesh/torus interconnect for exascale computers. Computer 42(11):36–40. https://doi.org/10.1109/mc.2009.370

    Article  Google Scholar 

  10. Ajima Y, Kawashima T, Okamoto T et al (2018) The tofu interconnect d. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, Belfast, UK, https://doi.org/10.1109/cluster.2018.00090

  11. Hayes J, Mudge T (1989) Hypercube supercomputers. Proc IEEE 77(12):1829–1841. https://doi.org/10.1109/5.48826

    Article  Google Scholar 

  12. Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput C 34(10):892–901. https://doi.org/10.1109/tc.1985.6312192

    Article  Google Scholar 

  13. Fu H, Liao J, Yang J et al (2016) The sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci. https://doi.org/10.1007/s11432-016-5588-7

    Article  Google Scholar 

  14. Wu CL, Feng TY (1980) On a class of multistage interconnection networks. IEEE Trans Comput C 29(8):694–702. https://doi.org/10.1109/tc.1980.1675651

    Article  MathSciNet  MATH  Google Scholar 

  15. Dally W (1990) Performance analysis of k-ary n-cube interconnection networks. IEEE Trans Comput 39(6):775–785. https://doi.org/10.1109/12.53599

    Article  MathSciNet  Google Scholar 

  16. Kim J, Dally WJ, Scott S, (2008) Technology-driven, highly-scalable dragonfly topology. In: (2008) International Symposium on Computer Architecture. IEEE, Beijing, China. https://doi.org/10.1109/isca.2008.19

  17. Faanes G, Bataineh A, Roweth D, (2012) Cray cascade: a scalable HPC system based on a dragonfly network. In: (2012) International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Salt Lake City, UT, USA. https://doi.org/10.1109/sc.2012.39

  18. Sensi DD, Girolamo SD, McMahon KH et al (2020) An in-depth analysis of the slingshot interconnect. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Atlanta, GA, USA, https://doi.org/10.1109/sc41405.2020.00039

  19. Cerf VG, Cowan DD, Mullin RC et al (1974) A lower bound on the average shortest path length in regular graphs. Networks 4(4):335–342. https://doi.org/10.1002/net.3230040405

    Article  MathSciNet  MATH  Google Scholar 

  20. Zhang P, Powell R, Deng Y (2011) Interlacing bypass rings to torus networks for more efficient networks. IEEE Trans Parallel Distrib Syst 22(2):287–295. https://doi.org/10.1109/tpds.2010.89

    Article  Google Scholar 

  21. Zhang P, Deng Y (2012) An analysis of the topological properties of the interlaced bypass torus (iBT) networks. Appl Math Lett 25(12):2147–2155. https://doi.org/10.1016/j.aml.2012.05.013

    Article  MathSciNet  MATH  Google Scholar 

  22. Zhang P, Deng Y (2012) Design and analysis of pipelined broadcast algorithms for the all-port interlaced bypass torus networks. IEEE Trans Parallel Distrib Syst 23(12):2245–2253. https://doi.org/10.1109/tpds.2012.93

    Article  Google Scholar 

  23. Zhang P, Deng Y, Feng R et al (2015) Evaluation of various networks configurated by adding bypass or torus links. IEEE Trans Parallel Distrib Syst 26(4):984–996. https://doi.org/10.1109/tpds.2014.2315201

    Article  Google Scholar 

  24. Feng R, Zhang P, Deng Y (2013) Deadlock-free routing algorithms for 6d mesh/iBT interconnection networks. In: 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, Honolulu, HI, USA, https://doi.org/10.1109/snpd.2013.43

  25. Feng R, Zhang P, Deng Y (2012) Simulated performance evaluation of a 6d mesh/iBT interconnect. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, Kyoto, Japan, https://doi.org/10.1109/snpd.2012.19

  26. Xu Z, Huang X, Jimenez F et al (2019) A new record of graph enumeration enabled by parallel processing. Mathematics 7(12):1214. https://doi.org/10.3390/math7121214

    Article  Google Scholar 

  27. Deng Y, Guo M, Ramos AF et al (2020) Optimal low-latency network topologies for cluster performance enhancement. J Supercomput 76(12):9558–9584. https://doi.org/10.1007/s11227-020-03216-y

    Article  Google Scholar 

  28. Zhang Y, Huang X, Xu Z et al (2019) A structured table of graphs with symmetries and other special properties. Symmetry 12(1):2. https://doi.org/10.3390/sym12010002

    Article  Google Scholar 

  29. Truong NT, Fujiwara I, Koibuchi M et al (2017) Distributed shortcut networks: low-latency low-degree non-random topologies targeting the diameter and cable length trade-off. IEEE Trans Parallel Distrib Syst 28(4):989–1001. https://doi.org/10.1109/tpds.2016.2613043

    Article  Google Scholar 

  30. Yasudo R, Koibuchi M, Nakano K et al (2019) Designing high-performance interconnection networks with host-switch graphs. IEEE Trans Parallel Distrib Syst 30(2):315–330. https://doi.org/10.1109/tpds.2018.2864286

    Article  Google Scholar 

  31. Sabino AU, Vasconcelos MFS, Deng Y et al (2018) Symmetry-guided design of topologies for supercomputer networks. Int J Mod Phys C 29(07):1850048. https://doi.org/10.1142/s0129183118500481

    Article  MathSciNet  Google Scholar 

  32. Nakao M, Sakai M, Hanada Y et al (2021) Graph optimization algorithm for low-latency interconnection networks. Parallel Comput 106(102):805. https://doi.org/10.1016/j.parco.2021.102805

    Article  MathSciNet  Google Scholar 

  33. Bermond J, Comellas F, Hsu D (1995) Distributed loop computer-networks: a survey. J Parallel Distrib Comput 24(1):2–10. https://doi.org/10.1006/jpdc.1995.1002

    Article  Google Scholar 

  34. Hwang F (2003) A survey on multi-loop networks. Theor Comput Sci 299(1–3):107–121. https://doi.org/10.1016/s0304-3975(01)00341-3

    Article  MathSciNet  MATH  Google Scholar 

  35. Monakhova EA (2012) A survey on undirected circulant graphs. Discret Math Algorithms Appl 04(01):1250002. https://doi.org/10.1142/s1793830912500024

    Article  MathSciNet  MATH  Google Scholar 

  36. Boesch F, Wang JF (1985) Reliable circulant networks with minimum transmission delay. IEEE Trans Circuits Syst 32(12):1286–1291. https://doi.org/10.1109/tcs.1985.1085667

    Article  MathSciNet  MATH  Google Scholar 

  37. Beivide R, Herrada E, Balcazar J et al (1991) Optimal distance networks of low degree for parallel computers. IEEE Trans Comput 40(10):1109–1124. https://doi.org/10.1109/12.93744

    Article  MathSciNet  MATH  Google Scholar 

  38. Lau F, Chen G (1996) Optimal layouts of midimew networks. IEEE Trans Parallel Distrib Syst 7(9):954–961. https://doi.org/10.1109/71.536939

    Article  Google Scholar 

  39. Mukhopadhyaya K, Sinha B (1995) Fault-tolerant routing in distributed loop networks. IEEE Trans Comput 44(12):1452–1456. https://doi.org/10.1109/12.477250

    Article  MATH  Google Scholar 

  40. Gómez D, Gutierrez J, Ibeas Á (2007) Optimal routing in double loop networks. Theor Comput Sci 381(1–3):68–85. https://doi.org/10.1016/j.tcs.2007.04.002

    Article  MathSciNet  MATH  Google Scholar 

  41. Obradoviç N, Peters J, Ružić G (2005) Reliable broadcasting in double loop networks. Networks 46(2):88–97. https://doi.org/10.1002/net.20076

    Article  MathSciNet  MATH  Google Scholar 

  42. Monakhov O, Monakhova E, (2019) A comparative analysis of bioinspired algorithms for solving the problem of optimization of circulant and hypercirculant networks. In: (2019) 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS), IEEE. https://doi.org/10.1109/opcs.2019.8880247

  43. Feria-Purón R, Ryan J, Pérez-Rosés H (2014) Searching for large multi-loop networks. Electron Notes Discret Math 46:233–240. https://doi.org/10.1016/j.endm.2014.08.031

    Article  MathSciNet  MATH  Google Scholar 

  44. Bevan D, Erskine G, Lewis R (2017) Large circulant graphs of fixed diameter and arbitrary degree. Ars Math Contemp 13(2):275–291. https://doi.org/10.26493/1855-3974.969.659,

  45. Lewis RR (2018) The degree-diameter problem for circulant graphs of degrees 10 and 11. Discret Math 341(9):2553–2566. https://doi.org/10.1016/j.disc.2018.05.024

    Article  MathSciNet  MATH  Google Scholar 

  46. Gross JL, Yellen J, Zhang P (2013) Handbook of graph theory. CRC Press, Boca Raton

    Book  Google Scholar 

  47. Boesch F, Tindell R (1984) Circulants and their connectivities. J Graph Theory 8(4):487–499. https://doi.org/10.1002/jgt.3190080406

    Article  MathSciNet  MATH  Google Scholar 

  48. Arndt J (2010) Matters computational: ideas, algorithms, source code. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  49. Fxt: a library of algorithms (2021) https://www.jjj.de/fxt/

  50. Ruskey F (2003) Combinatorial generation. Preliminary working draft University of Victoria, Victoria, BC, Canada 11:20

  51. Sanders P, Schulz C (2013) Think locally, act globally: highly balanced graph partitioning. In: Experimental Algorithms. Lecture notes in computer science, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 164–175, https://doi.org/10.1007/978-3-642-38527-8_16

  52. McKay BD, Piperno A (2014) Practical graph isomorphism, II. J Symb Comput 60:94–112. https://doi.org/10.1016/j.jsc.2013.09.003

    Article  MathSciNet  MATH  Google Scholar 

  53. Casanova H, Giersch A, Legrand A et al (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917. https://doi.org/10.1016/j.jpdc.2014.06.008

    Article  Google Scholar 

  54. Effective Bandwidth (b_eff) Benchmark (2021) https://fs.hlrs.de/projects/par/mpi/b_eff/

  55. Koniges A, Rabenseifner R, Solchenbach K (2001) Benchmark design for characterization of balanced high-performance architectures. In: Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. IEEE Comput. Soc, San Francisco, CA, USA. https://doi.org/10.1109/ipdps.2001.925208

  56. FFTE : A fast fourier transform package (2021) http://www.ffte.jp/

  57. Takahashi D, Kanada Y (2000) High-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. J Supercomput 15(2):207–228. https://doi.org/10.1023/a:1008160021085

    Article  MATH  Google Scholar 

  58. Graph 500 (2021) http://graph500.org/

  59. Murphy RC, Wheeler KB, Barrett BW et al (2010) Introducing the graph 500. Cray Users Group (CUG) 19:45–74

    Google Scholar 

  60. NPB: NAS parallel benchmarks (2021) http://www.nas.nasa.gov/publications/npb.html

  61. Bailey D, Barszcz E, Barton J et al (1991) The NAS parallel benchmarks. Int J Supercomput Appl 5(3):63–73. https://doi.org/10.1177/109434209100500306

    Article  Google Scholar 

  62. Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19(1):49–66. https://doi.org/10.1177/1094342005051521

    Article  Google Scholar 

  63. Deveci M, Kaya K, Ucar B (2015) Fast and high quality topology-aware task mapping. In: (2015) IEEE International Parallel and Distributed Processing Symposium. IEEE. https://doi.org/10.1109/ipdps.2015.93

  64. Ma T, Bosilca G, Bouteiller A (2012) HierKNEM: an adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters. In: (2012) IEEE 26th International Parallel and Distributed Processing Symposium, IEEE. https://doi.org/10.1109/ipdps.2012.91

  65. Wolf W (2003) A decade of hardware/ software codesign. Computer 36(4):38–43. https://doi.org/10.1109/mc.2003.1193227

    Article  Google Scholar 

  66. Teich J (2012) Hardware/software codesign: the past, the present, and predicting the future. Proc IEEE 100(Special Centennial Issue):1411–1430. https://doi.org/10.1109/jproc.2011.2182009

  67. Monakhov OG, Monakhova EA, Romanov AY et al (2021) Adaptive dynamic shortest path search algorithm in networks-on-chip based on circulant topologies. IEEE Access 9:160836–160846. https://doi.org/10.1109/access.2021.3131635

    Article  Google Scholar 

  68. Mirsadeghi SH, Afsahi A (2016) PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. In: (2016) IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE. https://doi.org/10.1109/ipdpsw.2016.146

  69. Parsonage E, Nguyen HX, Bowden R et al (2011) Generalized graph products for network design and analysis. In: 2011 19th IEEE International Conference on Network Protocols. IEEE, https://doi.org/10.1109/icnp.2011.6089084,

  70. Hammack RH, Imrich W, Klavžar S et al (2011) Handbook of product graphs, vol 2. CRC Press, Boca Raton

    Book  Google Scholar 

  71. Dandamudi S, Eager D (1990) Hierarchical interconnection networks for multicomputer systems. IEEE Trans Comput 39(6):786–797. https://doi.org/10.1109/12.53600

    Article  Google Scholar 

  72. Abd-El-Barr M, Al-Somani TF (2011) Topological properties of hierarchical interconnection networks: a review and comparison. J Electr Comput Eng 2011:1–12. https://doi.org/10.1155/2011/189434

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system, which was made possible by a $1.4M National Science Foundation grant (#1531492). The authors also thank Dr. A. Y. Romanov for beneficial suggestions on the manuscript via e-mails.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuefan Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, X., F. Ramos, A. & Deng, Y. Optimal circulant graphs as low-latency network topologies. J Supercomput 78, 13491–13510 (2022). https://doi.org/10.1007/s11227-022-04396-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04396-5

Keywords

Navigation