An Investigation into the Performance of Reduction Algorithms under Load Imbalance
Today, most reduction algorithms are optimized for balanced workloads; they assume all processes will start the reduction at about the same time. However, in practice this is not always the case and significant load imbalances may occur and affect the performance of said algorithms. In this paper we investigate the impact of such imbalances on the most commonly employed reduction algorithms and propose a new algorithm specifically adapted to the presented context. Firstly, we analyze the optimistic case where we have a priori knowledge of all imbalances and propose a near-optimal solution. In the general case, where we do not have any foreknowledge of the imbalances, we propose a dynamically rebalanced tree reduction algorithm. We show experimentally that this algorithm performs better than the default OpenMPI and MVAPICH2 implementations.
KeywordsMPI imbalance collective reduction process skew benchmarking
Unable to display preview. Download preview PDF.
- 3.Yu, H., Wang, C., Ma, K.L.: Massively parallel volume rendering using 2-3 swap image compositing. In: Proceedings of IEEE/ACM Supercomputing 2008 Conference, SC (2008)Google Scholar
- 6.Hoefler, T., Schneider, T., Lumsdaine, A.: Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale. International Journal of Parallel, Emergent and Distributed Systems 25(4), 241–258 (2010)MathSciNetzbMATHCrossRefGoogle Scholar