Advertisement

A Framework for Enabling OpenMP Autotuning

  • Vinu Sreenivasan
  • Rajath Javali
  • Mary HallEmail author
  • Prasanna Balaprakash
  • Thomas R. W. Scogland
  • Bronis R. de Supinski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11718)

Abstract

This paper describes a lightweight framework that enables autotuning of OpenMP pragmas to ease performance tuning of OpenMP codes across platforms. This paper describes a prototype of the framework and demonstrates its use in identifying best-performing parallel loop schedules and number of threads for five codes from the PolyBench benchmark suite. This process is facilitated by a tool for taking a compact search-space description of pragmas to apply to the loop nest and chooses the best solution using model-based search. This tool offers the potential to achieve performance portability of OpenMP across platforms without burdening the programmer with exploring this search space manually. Performance results show that the tool identifies different selections for schedule and thread count applied to parallel loops across benchmarks, data set sizes and architectures. Performance gain over the baseline with default settings of up to \(1.17{\times }\), but slowdowns of \(0.5{\times }\) show the importance of preserving default settings. More importantly, this experiment sets the stage for more elaborate experiments to map new OpenMP features such as GPU offloading and the new loop pragma.

Keywords

Autotuning Loop scheduling Performance portability 

References

  1. 1.
  2. 2.
    Balaprakash, P., et al.: Autotuning in high-performance computing applications. Proc. IEEE 106(11), 2068–2083 (2018).  https://doi.org/10.1109/JPROC.2018.2841200CrossRefGoogle Scholar
  3. 3.
    Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: ACM International Conference on Supercomputing 25th Anniversary Volume, pp. 253–260. ACM, New York (2014). http://doi.acm.org/10.1145/2591635.2667174
  4. 4.
    Chen, C., Chame, J., Hall, M.: Combining models and guided empirical search to optimize for multiple levels of the memory hierarchy. In: International Symposium on Code Generation and Optimization, pp. 111–122, March 2005.  https://doi.org/10.1109/CGO.2005.10
  5. 5.
    Katarzynski, J., Cytowski, M.: Towards autotuning of OpenMP applications on multicore architectures. CoRR abs/1401.4063 (2014). http://arxiv.org/abs/1401.4063
  6. 6.
    Liao, C., Quinlan, D.J., Vuduc, R., Panas, T.: Effective source-to-source outlining to support whole program empirical optimization. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 308–322. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-13374-9_21CrossRefGoogle Scholar
  7. 7.
    Mustafa, D., Aurangzeb, A., Eigenmann, R.: Performance analysis and tuning of automatically parallelized OpenMP applications. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 151–164. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-21487-5_12CrossRefGoogle Scholar
  8. 8.
    Nelson, T., et al.: Generating efficient tensor contractions for GPUs. In: 2015 44th International Conference on Parallel Processing, pp. 969–978, September 2015.  https://doi.org/10.1109/ICPP.2015.106
  9. 9.
    Pouchet, L.N., Yuki, T.: Polybench/c 4.2. http://sourceforge.net/projects/polybench/
  10. 10.
    Silvano, C., et al.: Autotuning and adaptivity in energy efficient HPC systems: the ANTAREX toolbox. In: Proceedings of the 15th ACM International Conference on Computing Frontiers, CF 2018, pp. 270–275. ACM, New York (2018).  https://doi.org/10.1145/3203217.3205338
  11. 11.
    Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC 1998, pp. 1–27. IEEE Computer Society, Washington, DC (1998). http://dl.acm.org/citation.cfm?id=509058.509096
  12. 12.
    Williams, S.: Auto-tuning performance on multicore computers. Ph.D. thesis, University of California, Berkeley (2008)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Vinu Sreenivasan
    • 1
  • Rajath Javali
    • 1
  • Mary Hall
    • 1
    Email author
  • Prasanna Balaprakash
    • 2
  • Thomas R. W. Scogland
    • 3
  • Bronis R. de Supinski
    • 3
  1. 1.University of UtahSalt Lake CityUSA
  2. 2.Argonne National LaboratoryArgonneUSA
  3. 3.Lawrence Livermore National LaboratoryLivermoreUSA

Personalised recommendations