Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Harel, Re’em; Mosseri, Idan; Levin, Harel; Alon, Lee-or; Rusanovsky, Matan; Oren, Gal

doi:10.1007/s10766-019-00640-3

Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Published: 08 August 2019

Volume 48, pages 1–31, (2020)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Re’em Harel^1,2,
Idan Mosseri^3,4,
Harel Levin^4,5,
Lee-or Alon^2,6,
Matan Rusanovsky^2,3 &
…
Gal Oren ORCID: orcid.org/0000-0002-8831-5423^3,4

573 Accesses
10 Citations
Explore all metrics

Abstract

Parallelization schemes are essential in order to exploit the full benefits of multi-core architectures, which have become widespread in recent years, especially for scientific applications. In shared memory architectures, the most common parallelization API is OpenMP. However, the introduction of correct and optimal OpenMP parallelization to applications is not always a simple task, due to common parallel shared memory management pitfalls and architecture heterogeneity. To ease this process, many automatic parallelization compilers were created. In this paper we focus on three source-to-source compilers—AutoPar, Par4All and Cetus—which were found to be most suitable for the task, point out their strengths and weaknesses, analyze their performances, inspect their capabilities and suggest new paths for enhancement. We analyze and compare the compilers’ performances over several different exemplary test cases, with each test case pointing out different pitfalls, and suggest several new ways to overcome these pitfalls, while yielding excellent results in practice. Moreover, we note that all of those source-to-source parallelization compilers function in the limits of OpenMP 2.5—an outdated version of the API which is no longer in optimal accordance with nowadays complicated heterogeneous architectures. Therefore we suggest a path to exploit the new features of OpenMP 4.5, as it provides new directives to fully utilize heterogeneous architectures, specifically ones that have a strong collaboration between CPUs and GPGPUs, thus it outperforms previous results by an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Fig. 8

The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs

Benchmarking Parallel Performance on Many-Core Processors

Notes

Source code of relevant sections can be found at: \(github.com/reemharel22/AutomaticParallelization\_NAS\)

References

Geer, D.: Chip makers turn to multicore processors. Computer 38(5), 11–13 (2005)
Article Google Scholar
Blake, G., Dreslinski, R.G., Mudge, T.: A survey of multicore processors. IEEE Signal Process. Mag. 26(6), 26–37 (2009)
Article Google Scholar
Pacheco, P.: An Introduction to Parallel Programming. Elsevier, Amsterdam (2011)
Google Scholar
Leopold, C.: Parallel and Distributed Computing: A Survey of Models, Paradigms and Approaches. Wiley, Hoboken (2001)
Google Scholar
Dagum, L., Menon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Gropp, W., Thakur, R., Lusk, E.: Using MPI-2: Advanced Features of the Message Passing Interface. MIT Press, Cambridge (1999)
Book Google Scholar
Snir, M., Otto, S., Huss-Lederman, S., Dongarra, J., Walker, D.: MPI-the Complete Reference: The MPI Core, vol. 1. MIT press, Cambridge (1998)
Google Scholar
Boku, Taisuke, Sato, Mitsuhisa, Matsubara, Masazumi, Takahashi, Daisuke: Openmpi-openmp like tool for easy programming in mpi. In Sixth European Workshop on OpenMP, pages 83–88, (2004)
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. In: ACM SIGGRAPH 2008 Classes, p. 16. ACM (2008)
Oren, G., Ganan, Y., Malamud, G.: Automp: an automatic openmp parallization generator for variable-oriented high-performance scientific codes. Int. J. Comb. Optim. Probl. Inform. 9(1), 46–53 (2018)
Google Scholar
Neamtiu, I., Foster, J.S., Hicks, M.: Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Softw. Eng. Notes 30(4), 1–5 (2005)
Article Google Scholar
AutoPar documentations. http://rosecompiler.org/ROSE_HTML_Reference/auto_par.html. Accessed 8 Aug 2019
ROSE homepage. http://rosecompiler.org. Accessed 8 Aug 2019
Dever, M.: AutoPar: automating the parallelization of functional programs. PhD thesis, Dublin City University (2015)
Par4All homepage. http://par4all.github.io/. Accessed 8 Aug 2019
PIPS homepage. https://pips4u.org/. Accessed 8 Aug 2019
Ventroux, N., Sassolas, T., Guerre, A., Creusillet, B., Keryell, R.: SESAM/Par4all: a tool for joint exploration of MPSoC architectures and dynamic dataflow code generation. In: Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, pp. 9–16. ACM (2012)
Cetus homepage. https://engineering.purdue.edu/Cetus/. Accessed 8 Aug 2019
Dave, C., Bae, H., Min, S.-J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. Computer 42(12), 36–42 (2009)
Article Google Scholar
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.-W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the suif compiler. Computer 29(12), 84–89 (1996)
Article Google Scholar
Pottenger, B., Eigenmann, R.: Idiom recognition in the Polaris parallelizing compiler. In: Proceedings of the 9th International Conference on Supercomputing, pp. 444–448. ACM (1995)
Tian, X., Bik, A., Girkar, M., Grey, P., Saito, H., Su, E.: Intel® openmp c++/fortran compiler for hyper-threading technology: implementation and performance. Intel Technol. J. 6(1), 36–46 (2002)
Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: PLUTO: a practical and fully automatic polyhedral program optimization system. In: Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI 08), Tucson, AZ (June 2008). Citeseer (2008)
Prema, S., Jehadeesan, R., Panigrahi, B.K.: Identifying pitfalls in automatic parallelization of NAS parallel benchmarks. In: 2017 National Conference on Parallel Computing Technologies (PARCOMPTECH), pp. 1–6. IEEE (2017)
Arenaz, M., Hernandez, O., Pleiter, D.: The technological roadmap of parallware and its alignment with the openpower ecosystem. In: International Conference on High Performance Computing, pp. 237–253. Springer (2017)
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The nas parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991)
Article Google Scholar
Graham, S.L., Kessler, P.B., McKusick, M.K.: Gprof: a call graph execution profiler. ACM SIGPLAN Not. 39(4), 49–57 (2004)
Article Google Scholar
Prema, S., Jehadeesan, R.: Analysis of parallelization techniques and tools. Int. J. Inf. Comput. Technol. 3(5), 471–478 (2013)
Google Scholar
Sohal, M., Kaur, R.: Automatic parallelization: a review. Int. J. Comput. Sci. Mob. Comput. 5(5), 17–21 (2016)
Google Scholar
Quinlan, D.: ROSE: compiler support for object-oriented frameworks. Parallel Process. Lett. 10(02n03), 215–226 (2000)
Article Google Scholar
Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.-X., Péan, G., Villalon, P.: Par4all: from convex array regions to heterogeneous computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012 (2012)
Lee, S.-I., Johnson, T.A., Eigenmann, R.: Cetus—an extensible compiler infrastructure for source-to-source transformation. In: International Workshop on Languages and Compilers for Parallel Computing, pp. 539–553. Springer (2003)
Liang, X., Humos, A.A., Pei, T.: Vectorization and parallelization of loops in C/C++ code. In: Proceedings of the International Conference on Frontiers in Education: Computer Science and Computer Engineering (FECS). The Steering Committee of The World Congress in Computer Science, Computer, pp. 203–206 (2017)
Jubb, C.: Loop optimizations in modern c compilers (2014)
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Newnes, Oxford (2013)
Google Scholar
Lu, M., Zhang, L., Huynh, H.P., Ong, Z., Liang, Y., He, B., Goh, R.S.M., Huynh, R.: Optimizing the mapreduce framework on Intel Xeon Phi coprocessor. In 2013 IEEE International Conference on Big Data, pp. 125–130. IEEE (2013)
Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R., Henry, G., Shet, A.G., Chrysos, G., Dubey, P.: Design and implementation of the linpack benchmark for single and multi-node systems based on Intel® Xeon Phi coprocessor. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 126–137. IEEE (2013)
Bailey, D.H.: NAS parallel benchmarks. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1254–1259. Springer, Heidelberg (2011)
Google Scholar
NPB in C homepage. http://aces.snu.ac.kr/software/snu-npb/. Accessed 8 Aug 2019
Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6), 702–719 (2010)
Google Scholar
Van der Pas, R., Stotzer, E., Terboven, C.: Using OpenMP The Next Step: Affinity, Accelerators, Tasking, and SIMD. MIT Press, Cambridge (2017)
Google Scholar
Sui, Y., Fan, X.I., Zhou, H., Xue, J.: Loop-oriented array-and field-sensitive pointer analysis for automatic SIMD vectorization. In: ACM SIGPLAN Notices, vol. 51, pp. 41–51. ACM (2016)
Zhou, H.: Compiler techniques for improving SIMD parallelism. PhD thesis, University of New South Wales, Sydney, Australia (2016)
NegevHPC Project. https://www.negevhpc.com. Accessed 8 Aug 2019

Download references

Acknowledgements

This work was supported by the Lynn and William Frankel Center for Computer Science. Computational support was provided by the NegevHPC project [44].

Author information

Authors and Affiliations

Department of Physics, Bar-Ilan University, 52900, Ramat Gan, Israel
Re’em Harel
Israel Atomic Energy Commission, P.O.B. 7061, 61070, Tel Aviv, Israel
Re’em Harel, Lee-or Alon & Matan Rusanovsky
Department of Computer Science, Ben-Gurion University of the Negev, P.O.B. 653, Be’er Sheva, Israel
Idan Mosseri, Matan Rusanovsky & Gal Oren
Department of Physics, Nuclear Research Center-Negev, P.O.B. 9001, Be’er-Sheva, Israel
Idan Mosseri, Harel Levin & Gal Oren
Department of Mathematics and Computer Science, The Open University of Israel, P.O.B. 808, Ra’anana, Israel
Harel Levin
Department of Computer Science, Bar-Ilan University, 52900, Ramat Gan, Israel
Lee-or Alon

Authors

Re’em Harel
View author publications
You can also search for this author in PubMed Google Scholar
Idan Mosseri
View author publications
You can also search for this author in PubMed Google Scholar
Harel Levin
View author publications
You can also search for this author in PubMed Google Scholar
Lee-or Alon
View author publications
You can also search for this author in PubMed Google Scholar
Matan Rusanovsky
View author publications
You can also search for this author in PubMed Google Scholar
Gal Oren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gal Oren.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Harel, R., Mosseri, I., Levin, H. et al. Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential. Int J Parallel Prog 48, 1–31 (2020). https://doi.org/10.1007/s10766-019-00640-3

Download citation

Received: 14 November 2018
Accepted: 31 July 2019
Published: 08 August 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10766-019-00640-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Abstract

Access this article

Similar content being viewed by others

The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs

Benchmarking Parallel Performance on Many-Core Processors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential

Abstract

Access this article

Similar content being viewed by others

The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned

The Productivity, Portability and Performance of OpenMP 4.5 for Scientific Applications Targeting Intel CPUs, IBM CPUs, and NVIDIA GPUs

Benchmarking Parallel Performance on Many-Core Processors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation