Randomizing task placement and route selection do not randomize traffic (enough)

Abstract

Dragonflies are one of the most promising topologies for the Exascale effort for their scalability and cost. Dragonflies achieve very high throughput under uniform traffic, but have a pathological behavior under other regular traffic patterns, some of them very common in HPC applications, such as the multi-dimensional stencil communication pattern or certain permutation patterns. A recent study showed that randomization of task placement greatly improves the performance of these pathological traffic patterns by increasing the similarity of the load they induce to a uniformly distributed load. In this work we provide a theoretical model that is able to predict the expected performance of a generic dragonfly network under uniform traffic and characterize performance-optimal, minimal cost dragonflies. We then match the predictions of this model with the performance obtained through the detailed simulation of a wide range of dragonfly configurations. In these same scenarios, we explore the performance of other non-uniform traffic patterns and investigate the impact of randomization techniques based on both task placement and indirect routing. For these previously unexplored traffic patterns, we obtain similar results to those obtained in previous works for the multi-dimensional stencil communication pattern: randomizing task placement and/or path choice is effective in improving the performance of pathological workloads. However, we also show that neither uniformization technique is able to close the gap between the performance of these traffic patterns and the ideal performance of uniform random traffic, leaving significant room for improvement (best achieved performance is only roughly \(50~\%\) of uniform performance).

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    For topologies with a number of nodes that is not a power of two, we exercise the bit reversal pattern in the remaining nodes by recursively partitioning them in decreasing powers of two.

References

  1. 1.

    Arimilli B, Arimilli R, Chung V, Clark S, Denzel W, Drerup B, Hoefler T, Joyner J, Lewis J, Li J, Ni N, Rajamony R (2010) The PERCS high-performance interconnect. In: Proceedings of the 2010 18th IEEE symposium on high performance interconnects, HOTI ’10. IEEE Computer Society, Washington, pp 75–82. doi:10.1109/HOTI.2010.16

  2. 2.

    Bhatele A, Jain N, Gropp WD, Kale LV (2011) Avoiding hot-spots on two-level direct networks. In: Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11. ACM, New York, pp 76:1–76:11. doi:10.1145/2063384.2063486

  3. 3.

    Dally W, Towles B (2003) Principles and practices of interconnection networks. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  4. 4.

    Faanes G, et al (2012) Cray Cascade: a scalable HPC system based on a dragonfly network. In: Proceedings of the international conference on high performance computing, networking, storage and analysis, SC ’12. IEEE Computer Society Press, Los Alamitos, pp 103:1–103:9. http://dl.acm.org/citation.cfm?id=2388996.2389136

  5. 5.

    García M, Vallejo E, Beivide R, Odriozola M, Camarero C, Valero M, Rodríguez G, Labarta J, Minkenberg C (2012) On-the-fly adaptive routing in high-radix hierarchical networks. In: Proceedings of the 41st International Conference on Parallel Processing (ICPP)

  6. 6.

    Kim J, Dally WJ, Scott S (2008) Technology-driven, highly-scalable dragonfly topology. SIGARCH Comput Archit News 36(3):77–88. doi:10.1145/1394608.1382129

    Article  Google Scholar 

  7. 7.

    Minkenberg C, Denzel W, Rodriguez G, Birke R (2012) End-to-end modeling and simulation of high-performance computing systems. In: Springer proceedings in physics: use cases of discrete event simulation: appliance and research, Springer, New York, p 201

  8. 8.

    Minkenberg C, Rodriguez G (2009) Trace-driven co-simulation of high-performance computing systems using OMNeT++. In: Proceedings of the 2nd international conference on simulation tools and techniques, Simutools ’09. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (ICST), Brussels. doi:10.4108/ICST.SIMUTOOLS2009.5521

  9. 9.

    Valiant LG (1982) A scheme for fast parallel communication. SIAM J Comput 11(2):350–361

    MATH  MathSciNet  Article  Google Scholar 

Download references

Acknowledgments

This work is an extension of a previous work entitled “Randomizing task placement does not randomize traffic (enough)”, published in the Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip (INA-OCMC), ACM, New York, US. Partially supported by the Spanish Government through an FPI scholarship.

Author information

Affiliations

Authors

Corresponding author

Correspondence to German Rodriguez.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Prisacari, B., Rodriguez, G., Jokanovic, A. et al. Randomizing task placement and route selection do not randomize traffic (enough). Des Autom Embed Syst 18, 171–182 (2014). https://doi.org/10.1007/s10617-014-9133-x

Download citation

Keywords

  • Dragonfly networks
  • Network throughput
  • Uniform traffic
  • Random task placement
  • High performance computing