Skip to main content
Log in

Network unfairness in dragonfly topologies

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Dragonfly networks arrange network routers in a two-level hierarchy, providing a competitive cost-performance solution for large systems. Non-minimal adaptive routing (adaptive misrouting) is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Network fairness issues arise in the dragonfly for several combinations of traffic pattern, global misrouting and traffic prioritization policy. Such unfairness prevents a balanced use of the resources across the network nodes and degrades severely the performance of any application running on an affected node. This paper reviews the main causes behind network unfairness in dragonflies, including a new adversarial traffic pattern which can easily occur in actual systems and congests all the global output links of a single router. A solution for the observed unfairness is evaluated using age-based arbitration. Results show that age-based arbitration mitigates fairness issues, especially when using in-transit adaptive routing. However, when using source adaptive routing, the saturation of the new traffic pattern interferes with the mechanisms employed to detect remote congestion, and the problem grows with the network size. This makes source adaptive routing in dragonflies based on remote notifications prone to reduced performance, even when using age-based arbitration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Abts D (2011) Cray xt4 and seastar 3-d torus interconnect. In: Encyclopedia of parallel computing. Springer, Berlin, pp 470–477

  2. Abts D, Weisser D (2007) Age-based packet arbitration in large-radix k-ary n-cubes. In: Proceedings of the 2007 ACM/IEEE Conference on supercomputing, 2007. SC ’07, pp 1–11. doi:10.1145/1362622.1362630

  3. Adiga NR, Blumrich MA, Chen D, Coteus P, Gara A, Giampapa ME, Heidelberger P, Singh S, Steinmacher-Burow BD, Takken T, Tsao M, Vranas P (2005) Blue Gene/L torus interconnection network. IBM J Res Dev 49(2.3):265–276. doi:10.1147/rd.492.0265

    Article  Google Scholar 

  4. Allman M, Paxson V, Blanton E (2009) TCP congestion control. RFC 5681

  5. Alverson R (2012) Cray high speed networking. In: IEEE hot interconnects

  6. Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, et al (2010) The PERCS high-performance interconnect. In: 18th symposium on high performance interconnects. IEEE, pp 75–82

  7. Camarero C, Vallejo E, Beivide R (2014) Topological characterization of hamming and dragonfly networks and its implications on routing. ACM Trans Archit Code Optim 11(4):39:1–39:25

    Article  Google Scholar 

  8. Chen D, Eisley N, Heidelberger P, Senger R, Sugawara Y, Kumar S, Salapura V, Satterfield D, Steinmacher-Burow B, Parker J (2011) The IBM Blue Gene/Q interconnection network and message unit. In: SC: international conference for high performance computing, networking, storage and analysis, pp 1–10

  9. Duato J, Johnson I, Flich J, Naven F, Garcia P, Nachiondo T (2005) A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. In: HPCA-11: international symposium on high-performance computer architecture, pp 108–119. doi:10.1109/HPCA.2005.1

  10. Fuentes P, Vallejo E, Camarero C, Beivide R, Valero M (2015) Throughput unfairness in dragonfly networks under realistic traffic patterns. In: 1st IEEE international workshop on high-performance interconnection networks towards the exascale and big-data era (HiPINEB), pp 801–808. doi:10.1109/CLUSTER.2015.136

  11. García M, Vallejo E, Beivide R, Odriozola M, Valero M (2013) Efficient routing mechanisms for dragonfly networks. In: The 42nd international conference on parallel processing (ICPP-42)

  12. García M, Fuentes P, Odriozola M, Vallejo E, Beivide R (2014) FOGSim interconnection network simulator. University of Cantabria. http://fuentesp.github.io/fogsim/

  13. García M, Vallejo E, Beivide R, Odriozola M, Camarero C, Valero M, Labarta J, Rodríguez G (2013) Global misrouting policies in two-level hierarchical networks. In: INA-OCMC: workshop on interconnection network architecture: on-chip, multi-chip, pp 13–16. doi:10.1145/2482759.2482763

  14. Garcia M, Vallejo E, Beivide R, Odriozola M, Camarero C, Valero M, Rodriguez G, Labarta J, Minkenberg C (2012) On-the-fly adaptive routing in high-radix hierarchical networks. In: 41st international conference on parallel processing (ICPP), pp 279–288. doi:10.1109/ICPP.2012.46

  15. Izu C, Vallejo E (2012) Throughput fairness in indirect interconnection networks. In: 13th international conference on parallel and distributed computing, applications and technologies, PDCAT ’12. IEEE Computer Society, pp 233–238. doi:10.1109/PDCAT.2012.129

  16. Jiang N, Becker D, Michelogiannakis G, Dally W (2012) Network congestion avoidance through speculative reservation. In: 2012 IEEE 18th international symposium on high performance computer architecture (HPCA), pp 1–12. doi:10.1109/HPCA.2012.6169047

  17. Jiang N, Dennison L, Dally WJ (2015) Network endpoint congestion control for fine-grained communication. In: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC ’15. ACM, New York, pp 35:1–35:12. doi:10.1145/2807591.2807600

  18. Jiang N, Kim J, Dally WJ (2009) Indirect adaptive routing on large scale interconnection networks. In: International symposium on computer architecture (ISCA), pp 220–231

  19. Kim J, Dally W, Scott S, Abts D (2008) Technology-driven, highly-scalable dragonfly topology. In: ISCA’08: 35th international symposium on computer architecture. IEEE Computer Society, pp 77–88

  20. Kim J, Dally W, Towles B, Gupta A (2005) Microarchitecture of a high-radix router. In: ACM SIGARCH computer architecture news, vol 33. IEEE Computer Society, pp 420–431

  21. Lee JW, Ng MC, Asanovic K (2008) Globally-synchronized frames for guaranteed quality-of-service in on-chip networks. In: 35th international symposium on computer architecture. IEEE, pp 89–100

  22. Lee M, Kim J, Abts D, Marty M, Lee J (2010) Probabilistic distance-based arbitration: providing equality of service for many-core CMPs. In: 2010 43rd annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 509 –519. doi:10.1109/MICRO.2010.18

  23. Miao SJ, Hsu Y (2011) Group allocation: A novel fairness mechanism for on-chip network. In: 2011 IEEE 2nd international conference on networked embedded systems for enterprise applications (NESEA), pp 1–7. doi:10.1109/NESEA.2011.6144932

  24. Valiant L (1982) A scheme for fast parallel communication. SIAM J Comput 11:350

    Article  MathSciNet  MATH  Google Scholar 

  25. Won J, Kim G, Kim J, Jiang T, Parker M, Scott S (2015) Overcoming far-end congestion in large-scale networks. In: International symposium on high performance computer architecture (HPCA), pp 415–427. doi:10.1109/HPCA.2015.7056051

Download references

Acknowledgments

This work has been supported by the Spanish Ministry of Education, FPU Grant FPU13/00337, the Spanish Science and Technology Commission (CICYT) under contracts TIN2012-34557 and TIN2013-46957-C2-2-P, and the European HiPEAC Network of Excellence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enrique Vallejo.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fuentes, P., Vallejo, E., Camarero, C. et al. Network unfairness in dragonfly topologies. J Supercomput 72, 4468–4496 (2016). https://doi.org/10.1007/s11227-016-1758-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1758-z

Keywords

Navigation