Abstract
MPI collective communications play an important role in coordinating and exchanging data among parallel processes in high performance computing. Various algorithms exist for implementing MPI collectives, each of which exhibits different characteristics, such as message overhead, latency, and scalability, which can significantly impact overall system performance. Therefore, choosing a suitable algorithm for each collective operation is crucial to achieve optimal performance. In this paper, we present our experience with MPI collectives algorithm selection on a large-scale supercomputer and highlight the impact of network traffic and system workload as well as other previously-investigated parameters such as message size, communicator size, and network topology. Our analysis shows that network traffic and system workload can make the performance of MPI collectives highly variable and, accordingly, impact the algorithm selection strategy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Beni, M.S., Cosenza, B.: An analysis of performance variability on dragonfly+ topology. In: 2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 500–501. IEEE (2022)
Chunduri, S., Parker, S., Balaji, P., Harms, K., Kumaran, K.: Characterization of MPI usage on a production supercomputer. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 386–400. IEEE (2018)
Faraj, A., Yuan, X., Lowenthal, D.: STAR-MPI: self tuned adaptive routines for MPI collective operations. In: Proceedings of the 20th Annual International Conference on Supercomputing, pp. 199–208 (2006)
GitHub - cea-hpc/hp2p: Heavy Peer To Peer: a MPI based benchmark for network diagnostic. https://github.com/cea-hpc/hp2p. Accessed 25 Sept 2023
Hunold, S., Bhatele, A., Bosilca, G., Knees, P.: Predicting MPI collective communication performance using machine learning. In: 2020 IEEE International Conference on Cluster Computing (CLUSTER), pp. 259–269. IEEE (2020)
Hunold, S., Carpen-Amarie, A.: Autotuning MPI collectives using performance guidelines. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 64–74 (2018)
Hunold, S., Carpen-Amarie, A.: Reproducible MPI benchmarking is still not as easy as you think. IEEE Trans. Parallel Distrib. Syst. 27, 3617–3630 (2016)
Hunold, S., Steiner, S.: OMPICollTune: autotuning MPI collectives by incremental online learning. In: 2022 IEEE/ACM International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS), pp. 123–128. IEEE (2022)
Loch, W.J., Koslovski, G.P.: Sparbit: towards to a logarithmic-cost and data locality-aware MPI allgather algorithm. J. Grid Comput. 21, 18 (2023)
Marconi100, the new accelerated system. https://www.hpc.cineca.it/hardware/marconi100. Accessed 25 Sept 2023
Nuriyev, E., Rico-Gallego, J.-A., Lastovetsky, A.: Model-based selection of optimal MPI broadcast algorithms for multi-core clusters. J. Parallel Distrib. Comput. 165, 1–16 (2022)
Salimi Beni, M., Cosenza, B.: An analysis of long-tailed network latency distribution and background traffic on dragonfly+. In: Gainaru, A., Zhang, C., Luo, C. (eds.) Bench 2022. LNCS, vol. 13852, pp. 123–142. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-31180-2_8
Beni, M.S., Crisci, L., Cosenza, B.: EMPI: enhanced message passing interface in modern c++. In: 2023 23rd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 141–153. IEEE (2023)
Wilkins, M., Guo, Y., Thakur, R., Dinda, P., Hardavellas, N.: ACCLAiM: advancing the practicality of MPI collective communication autotuning using machine learning. In: 2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 161–171. IEEE (2022)
Wilkins, M., Guo, Y., Thakur, R., Hardavellas, N., Dinda, P., Si, M.: A fact-based approach: making machine learning collective autotuning feasible on exascale systems. In: 2021 Workshop on Exascale MPI (ExaMPI), pp. 36–45. IEEE (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Salimi Beni, M., Hunold, S., Cosenza, B. (2024). Algorithm Selection of MPI Collectives Considering System Utilization. In: Zeinalipour, D., et al. Euro-Par 2023: Parallel Processing Workshops. Euro-Par 2023. Lecture Notes in Computer Science, vol 14352. Springer, Cham. https://doi.org/10.1007/978-3-031-48803-0_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-48803-0_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48802-3
Online ISBN: 978-3-031-48803-0
eBook Packages: Computer ScienceComputer Science (R0)