Measuring Execution Times of Collective Communications in an Empirical Optimization Framework
An essential part of an empirical optimization library are the timing procedures with which the performance of different codelets is determined. In this paper, we present for four different timing methods to optimize collective MPI communications and compare their accuracy for the FFT NAS Parallel Benchmarks on a variety of systems with different MPI implementations. We find that timing larger code portions with infrequent synchronizations performs well on all systems.
KeywordsEmpirical Optimization Abstract Data and Communication Library (ADCL) Collective Communication NAS Parallel Benchmark
Unable to display preview. Download preview PDF.
- 3.Bilmes, J., Asanovic, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHIPAC: a Portable, High-Performance, ANSI C coding methodology. In: Proceedings of the International Conference on Supercomputing, Vienna, Austra (July 1997)Google Scholar
- 5.Gabriel, E., Feki, S., Benkert, K., Resch, M.M.: Towards Performance Portability through Runtime Adaption for High Performance Computing Applications. Concurrency and Computation — Practice and Experience (2010) (accepted for publication)Google Scholar
- 6.Benkert, K., Gabriel, E., Resch, M.M.: Outlier Detection in Performance Data of Parallel Applications. In: 9th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (2008)Google Scholar
- 7.Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS Parallel Benchmarks (1994)Google Scholar