The Journal of Supercomputing

, Volume 72, Issue 12, pp 4418–4437 | Cite as

The BXI routing architecture for exascale supercomputer

  • Pierre VignérasEmail author
  • Jean-Noël Quintin


BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for high-performance computing. It has been designed to meet the requirements of exascale supercomputers. At such scale, faults have to be expected and dealt with transparently so that applications remain unaffected by them. BXI features various mechanisms for this purpose, one of which is based on a clear separation between two modes of routing tables computation: offline mode used during bring-up and online mode used to deal with link failures and recoveries. This new architecture is presented along with several offline and online routing algorithms and their actual performance: the full routing tables for a 64k-node fat-tree can be computed in a few minutes in offline mode; and the online mode can withstand numerous inter-router link failures without any noticeable impact on running applications.


Fabric management Routing Fault-tolerant routing BXI Interconnect management High-performance computing 



We are thankful to the Portals team at Sandia Nat. Lab. for their unconditional support, particularly: Ron Brightwell, Brian Barrett (now at Amazon) and Ryan Grant. We also acknowledge the passionate discussions we had with Keith Underwood from Intel during the early stages of this project. We also would like to thank our colleagues, Jean-Pierre Panziera, Ben Bratu, Anne-Marie Fourel and Pascale Bernier-Bruna for their reviews and valuable comments.


  1. 1.
    Derradji S, Palfer-Sollier T, Panziera J-P, Poudes A, Wellenreiter F (2015) The bxi interconnect architecture. In: 2015 IEEE 23th annual symposium on high-performance interconnects (HOTI)Google Scholar
  2. 2.
    Agarwal A (1991) Limits on interconnection network performance. IEEE transactions on parallel and distributed systems, vol 2, pp 398–412 (online).
  3. 3.
    Duato J, Yalamanchili S, Lionel N (2002) Interconnection networks: an engineering approach. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  4. 4.
    Leiserson CE (Oct. 1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901 (online).
  5. 5.
    Ohring S, Ibel M, Das S, Kumar M (1995) On generalized fat trees. In: Proceedings of 9th international parallel processing symposiumGoogle Scholar
  6. 6.
    Petrini F, Vanneschi M (1997) k-ary n-trees: high performance networks for massively parallel architectures. In: Proceedings 11th international parallel processing symposiumGoogle Scholar
  7. 7.
    Zahavi E (2010) D-Mod-K routing providing non-blocking traffic for shift permutations on real life fat trees. Technical Report CCIT Report, Tech. Rep., 2010. (online).
  8. 8.
    Kim J, Dally WJ, Abts D (2007) Flattened butterfly: a cost-efficient topology for high-radix networks. SIGARCH Comput Archit News 35(2):126–137. doi: 10.1145/1273440.1250679 CrossRefGoogle Scholar
  9. 9.
    Ahn JH, Binkert N, Davis A, McLaren M, Schreiber RS (2009) Hyperx: Topology, routing, and packaging of efficient large-scale networks. In: Proceedings of the conference on high performance computing networking, storage and analysis, ser. SC ’09. ACM, New York, pp 41:1–41:11 (online). doi: 10.1145/1654059.1654101
  10. 10.
    Kim J, Dally WJ, Scott S, Abts D (2008) Technology-driven, highly-scalable dragonfly topology. SIGARCH Comput Archit News 36(3):77–88. doi: 10.1145/1394608.1382129 CrossRefGoogle Scholar
  11. 11.
    Kim J, Dally W, Scott S, Abts D (2009) Cost-efficient dragonfly topology for large-scale systems. IEEE Micro 29(1):33–40. doi: 10.1109/MM.2009.5 CrossRefGoogle Scholar
  12. 12.
    Besta M, Hoefler T (2014) Slim fly: a cost effective low-diameter network topology. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, ser. SC ’14. IEEE Press, Piscataway, pp 348–359. (online). doi: 10.1109/SC.2014.34
  13. 13.
    Duato J (1997) A theory of fault-tolerant routing in wormhole networks. IEEE Trans Parallel Distrib Syst 8:790–802CrossRefGoogle Scholar
  14. 14.
    Martínez JC, Flich J, Robles A, López P, Duato J (2003) Supporting fully adaptive routing in infiniband networks. In: Proceedings of the 17th international symposium on parallel and distributed processing, ser. IPDPS ’03. IEEE Computer Society, Washington, DC, p 44.1 (online).
  15. 15.
    Skeie T, Lysne O, Flich J, López P, Robles A, Duato J (2004) LASH-TOR: a generic transition-oriented routing algorithm. Proc Int Conf Parallel Distrib Syst ICPADS 10:595–604Google Scholar
  16. 16.
    Lysne O, Skeie T, Reinemo SA, Theiss IR (2006) Layered routing in irregular networks. IEEE Trans Parallel Distrib Syst 17:51–65CrossRefGoogle Scholar
  17. 17.
    Flich J, Skeie T, Mejia A, Lysne O, Lopez P, Robles A, Duato J, Koibuchi M, Rokicki T, Sancho JC (2012) A survey and evaluation of topology-agnostic deterministic routing algorithms. IEEE Trans Parallel Distrib Syst 23(3):405–425CrossRefGoogle Scholar
  18. 18.
    Cherkassky BV, Goldberg AV, Radzik T (1996) Shortest paths algorithms: theory and experimental evaluation, pp 129–174Google Scholar
  19. 19.
    Chen G, Pang M, Wang J (2007) Calculating shortest path on edge-based data structure of graph. In: Proceedings of 2nd workshop on digital media and its application in museum and heritage, DMAMH 2007, pp 416–421Google Scholar
  20. 20.
    Demetrescu C, Italiano GF (2006) Experimental analysis of dynamic all pairs shortest. ACM Trans Algorithms 2:578–601MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Theiss Ir, Lysne O (2006) FRoots: a fault tolerant and topology-flexible routing technique. IEEE Trans Parallel Distrib Syst 17:1136–1150CrossRefGoogle Scholar
  22. 22.
    Mejia A, Flich J, Duato J, Reinemo SA, Skeie T (2006) “Segment-based routing: an efficient fault-tolerant routing algorithm for meshes and tori. In: 20th International parallel and distributed processing symposium, IPDPS 2006, vol 2006Google Scholar
  23. 23.
    Flich J, Mejia A, Lopez P, Duato J (2007) Region-based routing: An efficient routing mechanism to tackle unreliable hardware in network on chips. In: Proceedings of NOCS 2007: first international symposium on networks-on-chip, pp 183–194Google Scholar
  24. 24.
    Sem-Jacobsen FO, Lysne O (2008) Fault tolerance with shortest paths in regular and irregular networks. IPDPS Miami 2008. In: Proceedings of the 22nd IEEE international parallel and distributed processing symposium, program and CD-ROM, no. 1Google Scholar
  25. 25.
    Zahavi E, Keslassy I, Kolodny A (2014) Quasi fat trees for HPC clouds and their fault-resilient closed-form routing. In: 2014 IEEE 22nd annual symposium on high-performance interconnects (HOTI). IEEE, pp 41–48Google Scholar
  26. 26.
    Dijkstra EW (1971) A short introduction to the art of programming. Technische Hogeschool Eindhoven Eindhoven, vol 4Google Scholar
  27. 27.
    Schwiebert L, Jayasimha DN (1996) A necessary and sufficient condition for deadlock-free wormhole routing. J Parallel Distrib Comput 32:103–117CrossRefGoogle Scholar
  28. 28.
    Schroeder MD, Birrell AD, Burrows M, Murray H, Needham RM, Rodeheffer TL, Satterthwaite EH, Thacker CP (1991) Autonet: a high-speed, self-configuring local area network using point-to-point links. IEEE J Select Areas Commun 9(8):1318–1335CrossRefGoogle Scholar
  29. 29.
    Greenberg RI, Leiserson CE (1985) Randomized routing on fat-trees. 26th annual symposium on foundations of computer science (sfcs 1985)Google Scholar
  30. 30.
    Rodriguez G, Minkenberg C, Beivide R, Luijten RP, Labarta J, Valero M (2009) Oblivious routing schemes in extended generalized fat tree networks. In: IEEE international conference on cluster computing and workshops, 2009. CLUSTER’09. IEEE, pp 1–8Google Scholar
  31. 31.
    Kerbyson DJ, Lang M, Johnson G (October 2006) PAL Roadrunner Report 2: application specific optimization of infiniband networks. Tech RepGoogle Scholar
  32. 32.
    Zahavi E (2012) Fat-tree routing and node ordering providing contention free traffic for MPI global collectives. J Parallel Distrib Comput 72(11):1423–1432. Communication Architectures for Scalable Systems (online).
  33. 33.
    Gómez C, Gilabert F, Gómez ME, López P, Duato J (2007) Deterministic versus adaptive routing in fat-trees. In: Proceedings of workshop on communication architecture on clusters (CAC07)Google Scholar
  34. 34.
    Kim J, Dally WJ, Abts D (2006) Adaptive routing in high-radix clos network. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing, ser. SC ’06. ACM, New York (online). doi: 10.1145/1188455.1188552
  35. 35.
    Underwood KD, Borch E (May 2011) A unified algorithm for both randomized deterministic and adaptive routing in torus networks. IEEE international symposium on parallel and distributed processing workshops and Phd forum, pp 723–732 (online).
  36. 36.
    Jean-Noël Q, Pierre V (2013) Transitively deadlock-free routing algorithms. In: Proceedings of the 2nd IEEE international workshop on high-performance interconnection networks in the exascale and big-data era, BarcelonaGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Campus Ter@tecBruyères-le-ChâtelFrance

Personalised recommendations