Optimal circulant graphs as low-latency network topologies

Huang, Xiaolong; F. Ramos, Alexandre; Deng, Yuefan

doi:10.1007/s11227-022-04396-5

Optimal circulant graphs as low-latency network topologies

Published: 21 March 2022

Volume 78, pages 13491–13510, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

327 Accesses
17 Citations
Explore all metrics

Abstract

Communication latency has become one of the determining factors for the performance of parallel clusters. To design low-latency network topologies for high-performance computing clusters, we optimize the diameters, mean path lengths, and bisection widths of circulant topologies. We obtain a series of optimal circulant topologies of size $2^5$ through $2^{10}$ and compare them with torus and hypercube of the same sizes and degrees. We further benchmark on a broad variety of applications including effective bandwidth, FFTE, Graph 500 and NAS parallel benchmarks to compare the optimal circulant topologies and Cartesian products of optimal circulant topologies and fully connected topologies with corresponding torus and hypercube. Simulation results demonstrate superior potentials of the optimal circulant topologies for communication-intensive applications. We also find the strengths of the Cartesian products in exploiting global communication with data traffic patterns of specific applications or internal algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal low-latency network topologies for cluster performance enhancement

Article 02 March 2020

Torus-Connected Cycles: An Implementation-Friendly Topology for Interconnection Networks of Massively Parallel Systems

Hierarchical Tori Connected Mesh Network

References

Top 500 supercomputer site (2021) http://www.top500.org
Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117
Google Scholar
Deng Y, Ramos AF, Hornos JEM (2012) Symmetry insights for design of supercomputer network topologies: roots and weights lattices. Int J Mod Phys B 26(31):1250169. https://doi.org/10.1142/s021797921250169x
Article Google Scholar
Dally W, Towles B (2003) Principles and practices of interconnection networks. Elsevier Science & Technology, Amsterdam
Google Scholar
Brightwell R, Pedretti K, Underwood K et al (2006) SeaStar interconnect: balanced bandwidth for scalable performance. IEEE Micro 26(3):41–57. https://doi.org/10.1109/mm.2006.65
Article Google Scholar
IBM Blue Gene Team (2008) Overview of the IBM Blue Gene/P project. IBM J Res Dev 52(1.2):199–220. https://doi.org/10.1147/rd.521.0199
Article Google Scholar
Alverson R, Roweth D, Kaplan L (2010) The gemini system interconnect. In: (2010) 18th IEEE Symposium on High Performance Interconnects. IEEE, Mountain View, CA, USA,. https://doi.org/10.1109/hoti.2010.23
Chen D, Parker JJ, Eisley NA et al (2011) The IBM Blue Gene/Q interconnection network and message unit. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC$^{\prime }$11. ACM Press, Seattle, WA, USA, https://doi.org/10.1145/2063384.2063419
Ajima Y, Sumimoto S, Shimizu T (2009) Tofu: a 6D mesh/torus interconnect for exascale computers. Computer 42(11):36–40. https://doi.org/10.1109/mc.2009.370
Article Google Scholar
Ajima Y, Kawashima T, Okamoto T et al (2018) The tofu interconnect d. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, Belfast, UK, https://doi.org/10.1109/cluster.2018.00090
Hayes J, Mudge T (1989) Hypercube supercomputers. Proc IEEE 77(12):1829–1841. https://doi.org/10.1109/5.48826
Article Google Scholar
Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput C 34(10):892–901. https://doi.org/10.1109/tc.1985.6312192
Article Google Scholar
Fu H, Liao J, Yang J et al (2016) The sunway TaihuLight supercomputer: system and applications. Sci China Inf Sci. https://doi.org/10.1007/s11432-016-5588-7
Article Google Scholar
Wu CL, Feng TY (1980) On a class of multistage interconnection networks. IEEE Trans Comput C 29(8):694–702. https://doi.org/10.1109/tc.1980.1675651
Article MathSciNet MATH Google Scholar
Dally W (1990) Performance analysis of k-ary n-cube interconnection networks. IEEE Trans Comput 39(6):775–785. https://doi.org/10.1109/12.53599
Article MathSciNet Google Scholar
Kim J, Dally WJ, Scott S, (2008) Technology-driven, highly-scalable dragonfly topology. In: (2008) International Symposium on Computer Architecture. IEEE, Beijing, China. https://doi.org/10.1109/isca.2008.19
Faanes G, Bataineh A, Roweth D, (2012) Cray cascade: a scalable HPC system based on a dragonfly network. In: (2012) International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Salt Lake City, UT, USA. https://doi.org/10.1109/sc.2012.39
Sensi DD, Girolamo SD, McMahon KH et al (2020) An in-depth analysis of the slingshot interconnect. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, Atlanta, GA, USA, https://doi.org/10.1109/sc41405.2020.00039
Cerf VG, Cowan DD, Mullin RC et al (1974) A lower bound on the average shortest path length in regular graphs. Networks 4(4):335–342. https://doi.org/10.1002/net.3230040405
Article MathSciNet MATH Google Scholar
Zhang P, Powell R, Deng Y (2011) Interlacing bypass rings to torus networks for more efficient networks. IEEE Trans Parallel Distrib Syst 22(2):287–295. https://doi.org/10.1109/tpds.2010.89
Article Google Scholar
Zhang P, Deng Y (2012) An analysis of the topological properties of the interlaced bypass torus (iBT) networks. Appl Math Lett 25(12):2147–2155. https://doi.org/10.1016/j.aml.2012.05.013
Article MathSciNet MATH Google Scholar
Zhang P, Deng Y (2012) Design and analysis of pipelined broadcast algorithms for the all-port interlaced bypass torus networks. IEEE Trans Parallel Distrib Syst 23(12):2245–2253. https://doi.org/10.1109/tpds.2012.93
Article Google Scholar
Zhang P, Deng Y, Feng R et al (2015) Evaluation of various networks configurated by adding bypass or torus links. IEEE Trans Parallel Distrib Syst 26(4):984–996. https://doi.org/10.1109/tpds.2014.2315201
Article Google Scholar
Feng R, Zhang P, Deng Y (2013) Deadlock-free routing algorithms for 6d mesh/iBT interconnection networks. In: 2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, Honolulu, HI, USA, https://doi.org/10.1109/snpd.2013.43
Feng R, Zhang P, Deng Y (2012) Simulated performance evaluation of a 6d mesh/iBT interconnect. In: 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. IEEE, Kyoto, Japan, https://doi.org/10.1109/snpd.2012.19
Xu Z, Huang X, Jimenez F et al (2019) A new record of graph enumeration enabled by parallel processing. Mathematics 7(12):1214. https://doi.org/10.3390/math7121214
Article Google Scholar
Deng Y, Guo M, Ramos AF et al (2020) Optimal low-latency network topologies for cluster performance enhancement. J Supercomput 76(12):9558–9584. https://doi.org/10.1007/s11227-020-03216-y
Article Google Scholar
Zhang Y, Huang X, Xu Z et al (2019) A structured table of graphs with symmetries and other special properties. Symmetry 12(1):2. https://doi.org/10.3390/sym12010002
Article Google Scholar
Truong NT, Fujiwara I, Koibuchi M et al (2017) Distributed shortcut networks: low-latency low-degree non-random topologies targeting the diameter and cable length trade-off. IEEE Trans Parallel Distrib Syst 28(4):989–1001. https://doi.org/10.1109/tpds.2016.2613043
Article Google Scholar
Yasudo R, Koibuchi M, Nakano K et al (2019) Designing high-performance interconnection networks with host-switch graphs. IEEE Trans Parallel Distrib Syst 30(2):315–330. https://doi.org/10.1109/tpds.2018.2864286
Article Google Scholar
Sabino AU, Vasconcelos MFS, Deng Y et al (2018) Symmetry-guided design of topologies for supercomputer networks. Int J Mod Phys C 29(07):1850048. https://doi.org/10.1142/s0129183118500481
Article MathSciNet Google Scholar
Nakao M, Sakai M, Hanada Y et al (2021) Graph optimization algorithm for low-latency interconnection networks. Parallel Comput 106(102):805. https://doi.org/10.1016/j.parco.2021.102805
Article MathSciNet Google Scholar
Bermond J, Comellas F, Hsu D (1995) Distributed loop computer-networks: a survey. J Parallel Distrib Comput 24(1):2–10. https://doi.org/10.1006/jpdc.1995.1002
Article Google Scholar
Hwang F (2003) A survey on multi-loop networks. Theor Comput Sci 299(1–3):107–121. https://doi.org/10.1016/s0304-3975(01)00341-3
Article MathSciNet MATH Google Scholar
Monakhova EA (2012) A survey on undirected circulant graphs. Discret Math Algorithms Appl 04(01):1250002. https://doi.org/10.1142/s1793830912500024
Article MathSciNet MATH Google Scholar
Boesch F, Wang JF (1985) Reliable circulant networks with minimum transmission delay. IEEE Trans Circuits Syst 32(12):1286–1291. https://doi.org/10.1109/tcs.1985.1085667
Article MathSciNet MATH Google Scholar
Beivide R, Herrada E, Balcazar J et al (1991) Optimal distance networks of low degree for parallel computers. IEEE Trans Comput 40(10):1109–1124. https://doi.org/10.1109/12.93744
Article MathSciNet MATH Google Scholar
Lau F, Chen G (1996) Optimal layouts of midimew networks. IEEE Trans Parallel Distrib Syst 7(9):954–961. https://doi.org/10.1109/71.536939
Article Google Scholar
Mukhopadhyaya K, Sinha B (1995) Fault-tolerant routing in distributed loop networks. IEEE Trans Comput 44(12):1452–1456. https://doi.org/10.1109/12.477250
Article MATH Google Scholar
Gómez D, Gutierrez J, Ibeas Á (2007) Optimal routing in double loop networks. Theor Comput Sci 381(1–3):68–85. https://doi.org/10.1016/j.tcs.2007.04.002
Article MathSciNet MATH Google Scholar
Obradoviç N, Peters J, Ružić G (2005) Reliable broadcasting in double loop networks. Networks 46(2):88–97. https://doi.org/10.1002/net.20076
Article MathSciNet MATH Google Scholar
Monakhov O, Monakhova E, (2019) A comparative analysis of bioinspired algorithms for solving the problem of optimization of circulant and hypercirculant networks. In: (2019) 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS), IEEE. https://doi.org/10.1109/opcs.2019.8880247
Feria-Purón R, Ryan J, Pérez-Rosés H (2014) Searching for large multi-loop networks. Electron Notes Discret Math 46:233–240. https://doi.org/10.1016/j.endm.2014.08.031
Article MathSciNet MATH Google Scholar
Bevan D, Erskine G, Lewis R (2017) Large circulant graphs of fixed diameter and arbitrary degree. Ars Math Contemp 13(2):275–291. https://doi.org/10.26493/1855-3974.969.659,
Lewis RR (2018) The degree-diameter problem for circulant graphs of degrees 10 and 11. Discret Math 341(9):2553–2566. https://doi.org/10.1016/j.disc.2018.05.024
Article MathSciNet MATH Google Scholar
Gross JL, Yellen J, Zhang P (2013) Handbook of graph theory. CRC Press, Boca Raton
Book Google Scholar
Boesch F, Tindell R (1984) Circulants and their connectivities. J Graph Theory 8(4):487–499. https://doi.org/10.1002/jgt.3190080406
Article MathSciNet MATH Google Scholar
Arndt J (2010) Matters computational: ideas, algorithms, source code. Springer Science & Business Media, Berlin
MATH Google Scholar
Fxt: a library of algorithms (2021) https://www.jjj.de/fxt/
Ruskey F (2003) Combinatorial generation. Preliminary working draft University of Victoria, Victoria, BC, Canada 11:20
Sanders P, Schulz C (2013) Think locally, act globally: highly balanced graph partitioning. In: Experimental Algorithms. Lecture notes in computer science, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 164–175, https://doi.org/10.1007/978-3-642-38527-8_16
McKay BD, Piperno A (2014) Practical graph isomorphism, II. J Symb Comput 60:94–112. https://doi.org/10.1016/j.jsc.2013.09.003
Article MathSciNet MATH Google Scholar
Casanova H, Giersch A, Legrand A et al (2014) Versatile, scalable, and accurate simulation of distributed applications and platforms. J Parallel Distrib Comput 74(10):2899–2917. https://doi.org/10.1016/j.jpdc.2014.06.008
Article Google Scholar
Effective Bandwidth (b_eff) Benchmark (2021) https://fs.hlrs.de/projects/par/mpi/b_eff/
Koniges A, Rabenseifner R, Solchenbach K (2001) Benchmark design for characterization of balanced high-performance architectures. In: Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. IEEE Comput. Soc, San Francisco, CA, USA. https://doi.org/10.1109/ipdps.2001.925208
FFTE : A fast fourier transform package (2021) http://www.ffte.jp/
Takahashi D, Kanada Y (2000) High-performance radix-2, 3 and 5 parallel 1-D complex FFT algorithms for distributed-memory parallel computers. J Supercomput 15(2):207–228. https://doi.org/10.1023/a:1008160021085
Article MATH Google Scholar
Graph 500 (2021) http://graph500.org/
Murphy RC, Wheeler KB, Barrett BW et al (2010) Introducing the graph 500. Cray Users Group (CUG) 19:45–74
Google Scholar
NPB: NAS parallel benchmarks (2021) http://www.nas.nasa.gov/publications/npb.html
Bailey D, Barszcz E, Barton J et al (1991) The NAS parallel benchmarks. Int J Supercomput Appl 5(3):63–73. https://doi.org/10.1177/109434209100500306
Article Google Scholar
Thakur R, Rabenseifner R, Gropp W (2005) Optimization of collective communication operations in MPICH. Int J High Perform Comput Appl 19(1):49–66. https://doi.org/10.1177/1094342005051521
Article Google Scholar
Deveci M, Kaya K, Ucar B (2015) Fast and high quality topology-aware task mapping. In: (2015) IEEE International Parallel and Distributed Processing Symposium. IEEE. https://doi.org/10.1109/ipdps.2015.93
Ma T, Bosilca G, Bouteiller A (2012) HierKNEM: an adaptive framework for kernel-assisted and topology-aware collective communications on many-core clusters. In: (2012) IEEE 26th International Parallel and Distributed Processing Symposium, IEEE. https://doi.org/10.1109/ipdps.2012.91
Wolf W (2003) A decade of hardware/ software codesign. Computer 36(4):38–43. https://doi.org/10.1109/mc.2003.1193227
Article Google Scholar
Teich J (2012) Hardware/software codesign: the past, the present, and predicting the future. Proc IEEE 100(Special Centennial Issue):1411–1430. https://doi.org/10.1109/jproc.2011.2182009
Monakhov OG, Monakhova EA, Romanov AY et al (2021) Adaptive dynamic shortest path search algorithm in networks-on-chip based on circulant topologies. IEEE Access 9:160836–160846. https://doi.org/10.1109/access.2021.3131635
Article Google Scholar
Mirsadeghi SH, Afsahi A (2016) PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. In: (2016) IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE. https://doi.org/10.1109/ipdpsw.2016.146
Parsonage E, Nguyen HX, Bowden R et al (2011) Generalized graph products for network design and analysis. In: 2011 19th IEEE International Conference on Network Protocols. IEEE, https://doi.org/10.1109/icnp.2011.6089084,
Hammack RH, Imrich W, Klavžar S et al (2011) Handbook of product graphs, vol 2. CRC Press, Boca Raton
Book Google Scholar
Dandamudi S, Eager D (1990) Hierarchical interconnection networks for multicomputer systems. IEEE Trans Comput 39(6):786–797. https://doi.org/10.1109/12.53600
Article Google Scholar
Abd-El-Barr M, Al-Somani TF (2011) Topological properties of hierarchical interconnection networks: a review and comparison. J Electr Comput Eng 2011:1–12. https://doi.org/10.1155/2011/189434
Article Google Scholar

Download references

Acknowledgements

The authors thank Stony Brook Research Computing and Cyberinfrastructure, and the Institute for Advanced Computational Science at Stony Brook University for access to the high-performance SeaWulf computing system, which was made possible by a $1.4M National Science Foundation grant (#1531492). The authors also thank Dr. A. Y. Romanov for beneficial suggestions on the manuscript via e-mails.

Author information

Authors and Affiliations

Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, 11794, USA
Xiaolong Huang & Yuefan Deng
Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, Avenida Arlindo Béttio 1000, São Paulo, SP, CEP 03828-000, Brazil
Alexandre F. Ramos
Mathematics, Division of Science, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
Yuefan Deng

Authors

Xiaolong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre F. Ramos
View author publications
You can also search for this author in PubMed Google Scholar
Yuefan Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuefan Deng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X., F. Ramos, A. & Deng, Y. Optimal circulant graphs as low-latency network topologies. J Supercomput 78, 13491–13510 (2022). https://doi.org/10.1007/s11227-022-04396-5

Download citation

Accepted: 22 February 2022
Published: 21 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11227-022-04396-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal circulant graphs as low-latency network topologies

Abstract

Access this article

Similar content being viewed by others

Optimal low-latency network topologies for cluster performance enhancement

Torus-Connected Cycles: An Implementation-Friendly Topology for Interconnection Networks of Massively Parallel Systems

Hierarchical Tori Connected Mesh Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal circulant graphs as low-latency network topologies

Abstract

Access this article

Similar content being viewed by others

Optimal low-latency network topologies for cluster performance enhancement

Torus-Connected Cycles: An Implementation-Friendly Topology for Interconnection Networks of Massively Parallel Systems

Hierarchical Tori Connected Mesh Network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation