Pattern-Independent Detection of Manual Collectives in MPI Programs
In parallel applications, a significant amount of communication occurs in a collective fashion to perform, for example, broadcasts, reductions, or complete exchanges. Although the MPI standard defines many convenience functions for this purpose, which not only improve code readability and maintenance but are usually also highly efficient, many application programmers still create their own, manual implementations using point-to-point communication. We show how instances of such hand-crafted collectives can be automatically detected. Matching pre- and post-conditions of hashed message exchanges recorded in event traces, our method is independent of the specific communication pattern employed. We demonstrate that replacing detected broadcasts in the HPL benchmark can yield significant performance improvements.
KeywordsMPI collective operations performance optimization HPL
Unable to display preview. Download preview PDF.
- 1.HPL – A portable implementation of the high-performance Linpack benchmark for distributed-memory computers, http://netlib.org/benchmark/hpl/
- 2.Bernaschi, M., Iannello, G., Lauria, M.: Efficient Implementation of Reduce-scatter in MPI. In: Proceedings. 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, pp. 301–308 (2002)Google Scholar
- 4.Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurrency and Computation: Practice and Experience 22(6), 702–719 (2010)Google Scholar
- 6.Hermanns, M.-A., Geimer, M., Wolf, F., Wylie, B.J.N.: Verifying causality between distant performance phenomena in large-scale MPI applications. In: Proc. of the 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), Weimar, Germany, pp. 78–84. IEEE Computer Society (February 2009)Google Scholar
- 7.Hoefler, T., Siebert, C., Lumsdaine, A.: Group Operation Assembly Language - A Flexible Way to Express Collective Communication. In: The 38th International Conference on Parallel Processing. IEEE (September 2009)Google Scholar
- 8.Hoefler, T., Siebert, C., Rehm, W.: A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast. In: Proceedings of the 21st IEEE International Parallel & Distributed Processing Symposium, pp. 1–8. IEEE Computer Society (March 2007)Google Scholar
- 9.Kumar, S., Sabharwal, Y., Garg, R., Heidelberger, P.: Optimization of All-to-All Communication on the Blue Gene/L Supercomputer. In: Proc. of the 37th International Conference on Parallel Processing, pp. 320–329. IEEE Computer Society, Washington, DC (2008)Google Scholar
- 10.Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 2.2. High Performance Computing Center Stuttgart, HLRS (2009)Google Scholar