Performance Evaluation of Compiler-Assisted OpenMP Codes on Various HPC Systems
As automatic parallelization functions are different among compilers, a serial code is often modified so that a particular target compiler can easily understand its code structure and data dependency, resulting in effective automatic optimizations. However, these code modifications might not be effective for a different compiler because the different compiler cannot always parallelize the modified code. In this paper, in order to achieve effective parallelization on various HPC systems, compiler messages obtained from various compilers on different HPC systems are utilized for the OpenMP parallelization. Because the message about one system may be useful to identify key loop nests even for other systems, performance portable OpenMP parallelization can be achieved. This paper evaluates the performance of the compiler-assisted OpenMP codes using compiler messages from various compilers. The evaluation results clarified that, when a code is modified for its target compiler, the compiler message given by the target compiler is the most helpful to achieve appropriate OpenMP parallelization.
KeywordsLoop Nest Performance Portability Automatic Optimization Parallelization Method Automatic Parallelization
This research was partially supported by Core Research of Evolutional Science and Technology of Japan Science and Technology Agency (JST CREST) “An Evolutionary Approach to Construction of a Software Development Environment for Massively-Parallel Heterogeneous Systems”.
- 1.Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: International Conference on Compiler Construction (ETAPS CC) (2008)Google Scholar
- 3.Himeno benchmark. http://accc.riken.jp/2444.htm
- 4.Intel compilers. https://software.intel.com/en-us/intel-compilers (2015)
- 5.Komatsu, K., Egawa, R., Takizawa, H., Kobayashi, H.: A compiler-assisted openmp migration method based on automatic parallelizing information. In: Proceedings of 29th International Supercomputing Conference ISC2014, pp. 450–459, Leipzig, 22–26 June 2014Google Scholar
- 6.Larsen, P., Ladelsky, R., Lidman, J., McKee, S., Karlsson, S., Zaks, A.: Parallelizing more loops with compiler guided refactoring. In: 2012 41st International Conference on Parallel Processing (ICPP), pp. 410–419 (2012). doi:10.1109/ICPP.2012.48Google Scholar
- 7.Liao, C., Quinlan, D.J., Willcock, J.J., Panas, T.: Extending automatic parallelization to optimize high-level abstractions for multicore. In: Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism, IWOMP ’09, pp. 28–41. Springer, Berlin, Heidelberg (2009)Google Scholar
- 8.Nakahashi, K.: High-density mesh flow computations with pre-/post-data compressions. AIAA paper, pp. 2005–4876 (2005)Google Scholar
- 9.Sasao, Y., Yamamoto, S.: Numerical prediction of unsteady flows through turbine stator-rotor channels with condensation. In: ASME 2005 Fluids Engineering Division Summer Conference (2005)Google Scholar
- 10.Top 500 supercomputers sites. http://www.top500.org/ (2015)
- 11.Vecanalysis python script for annotating intel c++ & fortran compilers vectorization reports. https://software.intel.com/en-us/articles/vecanalysis-python-script-for-annotating-intelr-compiler-vectorization-report (16 January 2013)