Skip to main content

Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12728))

Included in the following conference series:

Abstract

Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute resources are available continuously and homogeneously across the allocated set of compute nodes. However, long one-off delays on individual processes can cause global disturbances, so-called idle waves, by rippling through the system. This process is mainly governed by the communication topology of the underlying parallel code. This paper makes significant contributions to the understanding of idle wave dynamics. We study the propagation mechanisms of idle waves across the processes of MPI-parallel programs. We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns. We study the interaction of idle waves with MPI collectives and show that, depending on the implementation, a collective may be permeable to the wave. Finally we analyze two mechanisms of idle wave decay: topological decay, which is rooted in differences in communication characteristics among parts of the system, and noise-induced decay, which is caused by system or application noise. We show that noise-induced decay is largely independent of noise characteristics but depends only on the overall noise power. An analytic expression for idle wave decay rate with respect to noise power is derived. For model validation we use microbenchmarks and stencil algorithms on three different supercomputing platforms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://software.intel.com/en-us/trace-analyzer.

  2. 2.

    http://tiny.cc/intel-autotuning.

  3. 3.

    http://tiny.cc/intel-i-mpi-adjust-family.

  4. 4.

    https://www.hpcg-benchmark.org/.

References

  1. Afzal, A., et al.: An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs. In arXiv (2020). arXiv:2011.00243 [cs.DC]. Submitted

  2. Afzal, A., et al.: Delay flow mechanisms on clusters. Poster at EuroMPI 2019, 10–13 September 2019, Zurich, Switzerland. https://hpc.fau.de/ files/2019/09/EuroMPI2019_AHW-Poster.pdf

  3. Afzal, A., Hager, G., Wellein, G.: Desynchronization and wave pattern formation in mpi-parallel and hybrid memory-bound programs. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 391–411. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_20

    Chapter  Google Scholar 

  4. Afzal, A., et al.: Propagation and decay of injected one-off delays on clusters: a case study. In 2019 IEEE International Conference on Cluster Computing, CLUSTER 2019, Albuquerque, NM, USA, 23–26 September 2019, pp. 1–10 (2019). https://doi.org/10.1109/CLUSTER.2019.8890995

  5. Agarwal, S., Garg, R., Vishnoi, N.K.: The impact of noise on the scaling of collectives: a theoretical approach. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 280–289. Springer, Heidelberg (2005). https://doi.org/10.1007/11602569_31

    Chapter  Google Scholar 

  6. Ferreira, K.B., et al.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 19. IEEE Press (2008). https://doi.org/10.1109/SC.2008.5219920

  7. Gamell, M., et al.: Local recovery and failure masking for stencil-based applications at extreme scales. In: SC 2015: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, November 2015. https://doi.org/10.1145/2807591.2807672

  8. Hager, A.G., et al.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press (2010). ISBN: 978-1-4398-1192-4

    Google Scholar 

  9. Hoefler, T., et al.: LogGOPSim - simulating large-scale applications in the log- GOPS model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (2010). https://doi.org/10.1145/1851476.1851564

  10. Hoefler, T., et al.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010). https://doi.org/10.1109/SC.2010.12

  11. Hunold, S., et al.: Predicting MPI collective communication performance using machine learning. In: 2020 IEEE International Conference on Cluster Computing CLUSTER. IEEE (2020). https://doi.org/10.1109/CLUSTER49012.2020.00036

  12. Markidis, S., et al.: Idle waves in high-performance computing. Phys. Rev. E 91(1) (2015). https://doi.org/10.1103/PhysRevE.91.013306

  13. Nataraj, A., et al.: The ghost in the machine: observing the effects of kernel operation on parallel application performance. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (2007). https://doi.org/10.1145/1362622.1362662

  14. Vadhiyar, S.S., et al.: Automatically tuned collective communications. In: SC 2000: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, pp. 3–3. IEEE (2000). https://doi.org/10.1109/SC.2000.10024

Download references

Acknowledgments

This work was supported by KONWIHR, the Bavarian Competence Network for Scientific High Performance Computing in Bavaria, under project name “OMI4papps,” and by the BMBF under projects “Metacca” and “SeASiTe.” We are indebted to LRZ Garching and to HLRS Stuttgart for granting CPU hours on their “SuperMUC-NG” and “Hawk” systems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ayesha Afzal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Afzal, A., Hager, G., Wellein, G. (2021). Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact. In: Chamberlain, B.L., Varbanescu, AL., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12728. Springer, Cham. https://doi.org/10.1007/978-3-030-78713-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78713-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78712-7

  • Online ISBN: 978-3-030-78713-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics