Abstract
Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute resources are available continuously and homogeneously across the allocated set of compute nodes. However, long one-off delays on individual processes can cause global disturbances, so-called idle waves, by rippling through the system. This process is mainly governed by the communication topology of the underlying parallel code. This paper makes significant contributions to the understanding of idle wave dynamics. We study the propagation mechanisms of idle waves across the processes of MPI-parallel programs. We present a validated analytic model for their propagation velocity with respect to communication parameters and topology, with a special emphasis on sparse communication patterns. We study the interaction of idle waves with MPI collectives and show that, depending on the implementation, a collective may be permeable to the wave. Finally we analyze two mechanisms of idle wave decay: topological decay, which is rooted in differences in communication characteristics among parts of the system, and noise-induced decay, which is caused by system or application noise. We show that noise-induced decay is largely independent of noise characteristics but depends only on the overall noise power. An analytic expression for idle wave decay rate with respect to noise power is derived. For model validation we use microbenchmarks and stencil algorithms on three different supercomputing platforms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Afzal, A., et al.: An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs. In arXiv (2020). arXiv:2011.00243 [cs.DC]. Submitted
Afzal, A., et al.: Delay flow mechanisms on clusters. Poster at EuroMPI 2019, 10–13 September 2019, Zurich, Switzerland. https://hpc.fau.de/ files/2019/09/EuroMPI2019_AHW-Poster.pdf
Afzal, A., Hager, G., Wellein, G.: Desynchronization and wave pattern formation in mpi-parallel and hybrid memory-bound programs. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 391–411. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_20
Afzal, A., et al.: Propagation and decay of injected one-off delays on clusters: a case study. In 2019 IEEE International Conference on Cluster Computing, CLUSTER 2019, Albuquerque, NM, USA, 23–26 September 2019, pp. 1–10 (2019). https://doi.org/10.1109/CLUSTER.2019.8890995
Agarwal, S., Garg, R., Vishnoi, N.K.: The impact of noise on the scaling of collectives: a theoretical approach. In: Bader, D.A., Parashar, M., Sridhar, V., Prasanna, V.K. (eds.) HiPC 2005. LNCS, vol. 3769, pp. 280–289. Springer, Heidelberg (2005). https://doi.org/10.1007/11602569_31
Ferreira, K.B., et al.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 19. IEEE Press (2008). https://doi.org/10.1109/SC.2008.5219920
Gamell, M., et al.: Local recovery and failure masking for stencil-based applications at extreme scales. In: SC 2015: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12, November 2015. https://doi.org/10.1145/2807591.2807672
Hager, A.G., et al.: Introduction to High Performance Computing for Scientists and Engineers. CRC Press (2010). ISBN: 978-1-4398-1192-4
Hoefler, T., et al.: LogGOPSim - simulating large-scale applications in the log- GOPS model. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (2010). https://doi.org/10.1145/1851476.1851564
Hoefler, T., et al.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society (2010). https://doi.org/10.1109/SC.2010.12
Hunold, S., et al.: Predicting MPI collective communication performance using machine learning. In: 2020 IEEE International Conference on Cluster Computing CLUSTER. IEEE (2020). https://doi.org/10.1109/CLUSTER49012.2020.00036
Markidis, S., et al.: Idle waves in high-performance computing. Phys. Rev. E 91(1) (2015). https://doi.org/10.1103/PhysRevE.91.013306
Nataraj, A., et al.: The ghost in the machine: observing the effects of kernel operation on parallel application performance. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (2007). https://doi.org/10.1145/1362622.1362662
Vadhiyar, S.S., et al.: Automatically tuned collective communications. In: SC 2000: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, pp. 3–3. IEEE (2000). https://doi.org/10.1109/SC.2000.10024
Acknowledgments
This work was supported by KONWIHR, the Bavarian Competence Network for Scientific High Performance Computing in Bavaria, under project name “OMI4papps,” and by the BMBF under projects “Metacca” and “SeASiTe.” We are indebted to LRZ Garching and to HLRS Stuttgart for granting CPU hours on their “SuperMUC-NG” and “Hawk” systems.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Afzal, A., Hager, G., Wellein, G. (2021). Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact. In: Chamberlain, B.L., Varbanescu, AL., Ltaief, H., Luszczek, P. (eds) High Performance Computing. ISC High Performance 2021. Lecture Notes in Computer Science(), vol 12728. Springer, Cham. https://doi.org/10.1007/978-3-030-78713-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-78713-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78712-7
Online ISBN: 978-3-030-78713-4
eBook Packages: Computer ScienceComputer Science (R0)