Advertisement

An Investigation into the Performance of Reduction Algorithms under Load Imbalance

  • Petar Marendić
  • Jan Lemeire
  • Tom Haber
  • Dean Vučinić
  • Peter Schelkens
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)

Abstract

Today, most reduction algorithms are optimized for balanced workloads; they assume all processes will start the reduction at about the same time. However, in practice this is not always the case and significant load imbalances may occur and affect the performance of said algorithms. In this paper we investigate the impact of such imbalances on the most commonly employed reduction algorithms and propose a new algorithm specifically adapted to the presented context. Firstly, we analyze the optimistic case where we have a priori knowledge of all imbalances and propose a near-optimal solution. In the general case, where we do not have any foreknowledge of the imbalances, we propose a dynamically rebalanced tree reduction algorithm. We show experimentally that this algorithm performs better than the default OpenMPI and MVAPICH2 implementations.

Keywords

MPI imbalance collective reduction process skew benchmarking 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Rabenseifner, R., Träff, J.L.: More Efficient Reduction Algorithms for Non-Power-of-Two Number of Processors in Message-Passing Parallel Systems. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 36–46. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Kumar, V., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing. Benjamin/Cummings, Redwood City (1994)zbMATHGoogle Scholar
  3. 3.
    Yu, H., Wang, C., Ma, K.L.: Massively parallel volume rendering using 2-3 swap image compositing. In: Proceedings of IEEE/ACM Supercomputing 2008 Conference, SC (2008)Google Scholar
  4. 4.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. International Journal of High Performance Computing Applications 19(1), 49–66 (2005)CrossRefGoogle Scholar
  5. 5.
    Rabenseifner, R.: Optimization of Collective Reduction Operations. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004, Part I. LNCS, vol. 3036, pp. 1–9. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Hoefler, T., Schneider, T., Lumsdaine, A.: Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale. International Journal of Parallel, Emergent and Distributed Systems 25(4), 241–258 (2010)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Worsch, T., Reussner, R., Augustin, W.: On Benchmarking Collective MPI Operations. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J., Volkert, J. (eds.) PVM/MPI 2002. LNCS, vol. 2474, pp. 271–279. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Petar Marendić
    • 1
    • 2
  • Jan Lemeire
    • 1
    • 2
  • Tom Haber
    • 3
  • Dean Vučinić
    • 1
    • 2
  • Peter Schelkens
    • 1
    • 2
  1. 1.ETRO Dept.Vrije Universiteit Brussel (VUB)BrusselsBelgium
  2. 2.FMI Dept.Interdisciplinary Institute for Broadband Technology (IBBT)GhentBelgium
  3. 3.EDMUHasseltDiepenbeekBelgium

Personalised recommendations