The Journal of Supercomputing

, Volume 72, Issue 12, pp 4438–4467 | Cite as

Compact network reconfiguration in fat-trees

  • Feroz Zahid
  • Ernst Gunnar Gran
  • Bartosz Bogdański
  • Bjørn Dag Johnsen
  • Tor Skeie
  • Evangelos Tasoulas


In large high-performance computing systems, the probability of component failure is high. At the same time, for a sustained system performance, reconfiguration is often needed to ensure high utilization of available resources. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source–destination pairs. In this paper, we propose a novel routing algorithm for IB-based fat-tree topologies, SlimUpdate. SlimUpdate employs path preservation techniques to achieve a decrease of up to 80 % in the number of total path modifications, as compared to the OpenSM’s fat-tree routing algorithm, in most reconfiguration scenarios. Furthermore, we present a metabase-aided re-routing method for fat-trees, based on destination leaf-switch multipathing. Our proposed method significantly reduces network reconfiguration overhead, while providing greater routing flexibility. On successive runs, our proposed method saves up to 85 % of the total routing time over the traditional re-routing scheme. Based on the metabase-aided routing, we also present a modified SlimUpdate routing algorithm to dynamically optimize routes for a given MPI node order.


Routing algorithms Interconnection networks Network reconfiguration Fat-trees InfiniBand 



The authors would like to thank Mellanox Technologies for providing some of the hardware we use in our experiments.


  1. 1.
    (2015) Top 500 Super Computer Sites., accessed November 25, 2015
  2. 2.
    Bergman K, Borkar S, Campbell D, Carlson W, Dally W, Denneau M, Franzon P, Harrod W, Hill K, Hiller J, et al (2008) Exascale computing study: Technology challenges in achieving exascale systems. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech Rep 15Google Scholar
  3. 3.
    Cappello F, Geist A, Gropp W, Kale S, Kramer B, Snir M (2014) Toward exascale resilience: 2014 update. Supercomputing frontiers and innovations 1(1):5–28. doi: 10.14529/jsfi1401015
  4. 4.
    Schroeder B, Gibson GA (2010) A large-scale study of failures in high-performance computing systems. IEEE Transactions on Dependable and Secure Computing 7(4):337–350CrossRefGoogle Scholar
  5. 5.
    Berl A, Gelenbe E, Di Girolamo M, Giuliani G, De Meer H, Dang MQ, Pentikousis K (2010) Energy-efficient cloud computing. The Computer Journal 53(7):1045–1051CrossRefGoogle Scholar
  6. 6.
    Duato J, Lysne O, Pang R, Pinkston TM (2005) A theory for deadlock-free dynamic network reconfiguration. Part I. IEEE Transactions on Parallel and Distributed Systems 16(5):412–427CrossRefGoogle Scholar
  7. 7.
    Lysne O, Montanana JM, Flich J, Duato J, Pinkston TM, Skeie T (2008) An efficient and deadlock-free network reconfiguration protocol. IEEE Transactions on Computers 57(6):762–779MathSciNetCrossRefGoogle Scholar
  8. 8.
    Zahid F, Gran EG, Bogdanski B, Johnsen BD, Skeie T (2015a) SlimUpdate: Minimal Routing Update for Performance-Based Reconfigurations in Fat-Trees. In: 1st HiPINEB Workshop, IEEE International Conference on Cluster Computing (CLUSTER), 2015., IEEE, pp 849–856Google Scholar
  9. 9.
    Teodosiu D, Baxter J, Govil K, Chapin J, Rosenblum M, Horowitz M (1997) Hardware fault containment in scalable shared-memory multiprocessors. ACM SIGARCH Computer Architecture News 25(2):73–84CrossRefGoogle Scholar
  10. 10.
    Schroeder MD, Birrell AD, Burrows M, Murray H, Needham RM, Rodeheffer TL, Satterthwaite EH, Thacker CP (1991) Autonet: A high-speed, self-configuring local area network using point-to-point links. IEEE Journal on Selected Areas in Communications 9(8):1318–1335CrossRefGoogle Scholar
  11. 11.
    Sem-Jacobsen FO, Lysne O (2012) Topology agnostic dynamic quick reconfiguration for large-scale interconnection networks. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2012., IEEE Computer Society, pp 228–235Google Scholar
  12. 12.
    Domke J, Hoefler T, Matsuoka S (2014) Fail-in-place network design: interaction between topology, routing algorithm and failures. In: International Conference for High Performance Computing, Networking, Storage and Analysis, (SC), 2014, IEEE, pp 597–608Google Scholar
  13. 13.
    Zahid F, Gran EG, Bogdański B, Johnsen BD, Skeie T (2015b) A weighted fat-tree routing algorithm for efficient load-balancing in InfiniBand enterprise clusters. In: 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2015., IEEEGoogle Scholar
  14. 14.
    Zahid F, Gran EG, Bogdański B, Johnsen BD, Skeie T (2016) Efficient Network Isolation and Load Balancing in Multi-Tenant HPC Clusters. Future Generation Computer Systems. doi: 10.1016/j.future.2016.04.003
  15. 15.
    Skeie T, Lysne O, Theiss I (2002) Layered Shortest Path (LASH) Routing in Irregular System Area Networks. In: International Parallel and Distributed Processing Symposium (IPDPS), 2002., Citeseer, vol 2, p 194Google Scholar
  16. 16.
    Mejia A, Flich J, Duato J, Reinemo SA, Skeie T (2006) Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori. In: 20th International Parallel and Distributed Processing Symposium (IPDPS), 2006., IEEE, pp 10–ppGoogle Scholar
  17. 17.
    Sem-Jacobsen FO, Skeie T, Lysne O, Duato J (2011) Dynamic fault tolerance in fat trees. IEEE Transactions on Computers 60(4):508–525MathSciNetCrossRefGoogle Scholar
  18. 18.
    Zahavi E, Keslassy I, Kolodny A (2014) Quasi Fat Trees for HPC Clouds and Their Fault-Resilient Closed-Form Routing. In: Proceedings of the 22nd IEEE Annual Symposium on High-Performance Interconnects (HOTI), 2014., IEEE, pp 41–48Google Scholar
  19. 19.
    Tasoulas E, Gran EG, Johnsen BD, Begnum K, Skeie T (2015) Towards the InfiniBand SR-IOV vSwitch Architecture. In: 2015 IEEE International Conference on Cluster Computing (CLUSTER)., IEEE, pp 371–380Google Scholar
  20. 20.
    Lin XY, Chung YC, Huang TY (2004) A multiple LID routing scheme for fat-tree-based InfiniBand networks. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004Google Scholar
  21. 21.
    López P, Flich J, Duato J (2001) Deadlock-free routing in infiniband through destination renaming. In: International Conference on Parallel Processing, 2001., IEEE, pp 427–434Google Scholar
  22. 22.
    Nienaber W, Yuan X, Duan Z (2009) LID assignment in InfiniBand networks. IEEE Transactions on Parallel and Distributed Systems 20(4):484–497. doi: 10.1109/TPDS.2008.144
  23. 23.
    (2015) InfiniBand Architecture Specification: Release 1.3., accessed November 25, 2015
  24. 24.
    Bermúdez A, Casado R, Quiles FJ, Pinkston TM, Duato J (2003) On the infiniband subnet discovery process. In: Proceedings of the IEEE International Conference on Cluster Computing, 2003., IEEE, pp 512–517Google Scholar
  25. 25.
    Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Transactions on Computers 100(10):892–901CrossRefGoogle Scholar
  26. 26.
    Petrini F, Vanneschi M (1997) k-ary n-trees: High performance networks for massively parallel architectures. In: Proceedings of the 11th International Parallel Processing Symposium, 1997., IEEE, pp 87–93Google Scholar
  27. 27.
    Öhring SR, Ibel M, Das SK, Kumar MJ (1995) On generalized fat trees. In: Proceedings of the 9th International Parallel Processing Symposium, 1995., IEEE, pp 37–44Google Scholar
  28. 28.
    Zahavi E (2010) D-Mod-K routing providing non-blocking traffic for shift permutations on real life fat trees. CCIT Report 776, TechnionGoogle Scholar
  29. 29.
    Zahavi E (2012) Fat-tree routing and node ordering providing contention free traffic for MPI global collectives. Journal of Parallel and Distributed Computing 72(11):1423–1432CrossRefzbMATHGoogle Scholar
  30. 30.
    Huang W, Santhanaraman G, Jin HW, Gao Q, Panda DK (2006) Design of high performance MVAPICH2: MPI2 over InfiniBand. In: Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2006., IEEE, vol 1, pp 43–48Google Scholar
  31. 31.
    Luszczek P, Dongarra J, Kepner J (2006) Design and implementation of the HPC Challenge benchmark suite. CT Watch Quarterly 2(4A):18–23Google Scholar
  32. 32.
    Hoefler T, Mehlan T, Lumsdaine A, Rehm W (2007) Netgauge: A Network Performance Measurement Framework. In: Proceedings of High Performance Computing and Communications, HPCC’07, Springer, vol 4782Google Scholar
  33. 33.
    (2015) The OSU Micro-benchmark Suite., accessed November 25, 2015
  34. 34.
    Schneider T, Hoefler T, Lumsdaine A (2009) ORCS: An oblivious routing congestion simulator. Indiana University, Computer Science DepartmentGoogle Scholar
  35. 35.
    Bermúdez A, Casado R, Quiles FJ, Duato J (2004) Use of provisional routes to speed-up change assimilation in InfiniBand networks. In: Proceedings of 18th International Parallel and Distributed Processing Symposium (IPDPS), 2004., IEEE, p 186Google Scholar
  36. 36.
    T Hoefler, T Schneider, and A Lumsdaine (2008) Multistage switches are not crossbars: Effects of static routing in high-performance networks. In: IEEE International Conference on Cluster Computing, 2008., IEEEGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Feroz Zahid
    • 1
    • 3
  • Ernst Gunnar Gran
    • 1
  • Bartosz Bogdański
    • 2
  • Bjørn Dag Johnsen
    • 2
  • Tor Skeie
    • 1
    • 3
  • Evangelos Tasoulas
    • 1
  1. 1.Simula Research LaboratoryFornebuNorway
  2. 2.Oracle CorporationOsloNorway
  3. 3.Department of InformaticsUniversity of OsloOsloNorway

Personalised recommendations