Hierarchical redesign of classic MPI reduction algorithms

Hasanov, Khalid; Lastovetsky, Alexey

doi:10.1007/s11227-016-1779-7

Hierarchical redesign of classic MPI reduction algorithms

Published: 18 June 2016

Volume 73, pages 713–725, (2017)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Khalid Hasanov¹ &
Alexey Lastovetsky²

567 Accesses
22 Citations
Explore all metrics

Abstract

Optimization of MPI collective communication operations has been an active research topic since the advent of MPI in 1990s. Many general and architecture-specific collective algorithms have been proposed and implemented in the state-of-the-art MPI implementations. Hierarchical topology-oblivious transformation of existing communication algorithms has been recently proposed as a new promising approach to optimization of MPI collective communication algorithms and MPI-based applications. This approach has been successfully applied to the most popular parallel matrix multiplication algorithm, SUMMA, and the state-of-the-art MPI broadcast algorithms, demonstrating significant multifold performance gains, especially for large-scale HPC systems. In this paper, we apply this approach to optimization of the MPI Reduce and Allreduce operations. Theoretical analysis and experimental results on a cluster of Grid’5000 platform are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Optimization of MPI Reduce Algorithms

High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

Article 16 March 2023

References

Message passing interface forum. http://www.mpi-forum.org/. Accessed 20 Feb 2016
Rabenseifner R (2004) Optimization of collective reduction operations. In: 2004 International conference on computational science, pp 1–9
Venkata MG et al (2013) Optimizing blocking and nonblocking reduction operations for multicore systems: hierarchical design and implementation. In: 2013 IEEE international conference on cluster computing, pp 1–8
Hasanov K, Quintin JN, Lastovetsky A (2015) Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms. J Supercomput 71(11):3991–4014
Article Google Scholar
Hasanov K, Quintin JN, Lastovetsky A (2014) High-level topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms. In: Euro-Par 2014: parallel processing workshops, lecture notes in computer science, vol 8806, Springer, New York, pp 412–424
de Geijn RA, Jerrell W (1997) SUMMA: scalable universal matrix multiplication algorithm. Concurr Pract Exp 9(4):255–274
Article Google Scholar
Hasanov K, Lastovetsky A (2015) Hierarchical optimization of MPI reduce algorithms. In: PaCT 2015, lecture notes in computer science, vol 9251, Springer, New York, pp 21–34
Gabriel E, Fagg G, Bosilca G, Angskun T, Dongarra J et al (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: EuroPVM/MPI 2004, lecture notes in computer science, vol 3241, Springer, New York, pp 97–104
Bala V, Bruck J, Cypher R, Elustondo P, Ho C-T, Ho C-T, Kipnis S, Snir M (1995) CCL: a portable and tunable collective communication library for scalable parallel computers. IEEE Trans Parallel Distrib Syst 6(2):154–164
Article Google Scholar
Barnett M, Shuler L, van De Geijn R, Gupta S, Payne DG, Watts J (1994) Interprocessor collective communication library (InterCom). In: IEEE scalable high-performance computing conference, pp 357–364
Kielmann T, Hofman RF, Bal HE, Plaat A, Bhoedjang RA (1999) MagPIe: MPI’s collective communication operations for clustered wide area systems. ACM Sigplan Notices 34(8):131–140
Article Google Scholar
Chan EW, Heimlich MF, Purkayastha A, Van de Geijn RA (2004) On optimizing collective communication. In: 2004 IEEE international conference on cluster computing, pp 145–155
Vadhiyar SS, Fagg GE, Dongarra J (2000) Automatically tuned collective communications. In: ACM/IEEE conference on supercomputing, p 3
Hockney RW (1994) The communication challenge for MPP: intel paragon and Meiko CS-2. Parallel Comput 20(3):389–398
Article Google Scholar
Pjes̆ivac-Grbović J (2007) Towards automatic and adaptive optimizations of MPI collective operations. PhD thesis, University of Tennessee, Knoxville
Lastovetsky A, Rychkov V, O’Flynn M (2008) MPIBlib: Benchmarking MPI communications for parallel computing on homogeneous and heterogeneous clusters. In: EuroPVM/MPI 2008, lecture notes in computer science, vol 5205, Springer, New York, pp 227–238
Hasanov K, Quintin JN, Lastovetsky A (2015) Topology-oblivious optimization of MPI broadcast algorithms on extreme-scale platforms. Simul Model Pract Theory 58:30–39
Article Google Scholar
Hasanov K (2015) Hierarchical approach to optimization of MPI collective communication algorithms. PhD. thesis, University College Dublin
MPICH-A Portable Implementation of MPI. http://www.mpich.org/. Accessed 01 March 2016

Download references

Acknowledgments

The experiments presented in this publication were carried out using the Grid’5000 experimental testbed, being developed under the INRIA ALADDIN development action with support from CNRS, RENATER and several universities as well as other funding bodies (see https://www.grid5000.fr). This work was also supported by Science Foundation Ireland under Grant Number 14/IA/2474.

Author information

Authors and Affiliations

IBM Research Ireland, Dublin, Ireland
Khalid Hasanov
University College Dublin, Dublin, Ireland
Alexey Lastovetsky

Authors

Khalid Hasanov
View author publications
You can also search for this author in PubMed Google Scholar
Alexey Lastovetsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexey Lastovetsky.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hasanov, K., Lastovetsky, A. Hierarchical redesign of classic MPI reduction algorithms. J Supercomput 73, 713–725 (2017). https://doi.org/10.1007/s11227-016-1779-7

Download citation

Published: 18 June 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s11227-016-1779-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical redesign of classic MPI reduction algorithms

Abstract

Access this article

Similar content being viewed by others

Hierarchical Optimization of MPI Reduce Algorithms

High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical redesign of classic MPI reduction algorithms

Abstract

Access this article

Similar content being viewed by others

Hierarchical Optimization of MPI Reduce Algorithms

High-Level Topology-Oblivious Optimization of MPI Broadcast Algorithms on Extreme-Scale Platforms

Sparbit: Towards to a Logarithmic-Cost and Data Locality-Aware MPI Allgather Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation