Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Fonseca, Alcides; Cabral, Bruno; Rafael, João; Correia, Ivo

doi:10.1007/s10766-016-0426-5

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Published: 16 April 2016

Volume 44, pages 1337–1358, (2016)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Alcides Fonseca¹,
Bruno Cabral¹,
João Rafael¹ &
…
Ivo Correia¹

649 Accesses
8 Citations
3 Altmetric
Explore all metrics

Abstract

There are billions of lines of sequential code inside nowadays’ software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available parallelism, has been a research goal for some time now. This work proposes a new approach for achieving such goal. We created a new parallelizing compiler that analyses the read and write instructions, and control-flow modifications in programs to identify a set of dependencies between the instructions in the program. Afterwards, the compiler, based on the generated dependencies graph, rewrites and organizes the program in a task-oriented structure. Parallel tasks are composed by instructions that cannot be executed in parallel. A work-stealing-based parallel runtime is responsible for scheduling and managing the granularity of the generated tasks. Furthermore, a compile-time granularity control mechanism also avoids creating unnecessary data-structures. This work focuses on the Java language, but the techniques are general enough to be applied to other programming languages. We have evaluated our approach on 8 benchmark programs against OoOJava, achieving higher speedups. In some cases, values were close to those of a manual parallelization. The resulting parallel code also has the advantage of being readable and easily configured to improve further its performance manually.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shared Memory Parallelism in Modern C++ and HPX

Article 20 April 2024

Efficient High-Level Programming in Plain Java

Article 05 December 2022

A Review of Parallel Implementations for the Smith–Waterman Algorithm

Article 06 September 2021

References

Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4all: from convex array regions to heterogeneous computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012 (2012)
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)
Article Google Scholar
Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An experimental evaluation of the new openmp tasking model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) Languages and Compilers for Parallel Computing, pp. 63–77. Springer, Berlin (2008)
Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A., et al.: Automatic program parallelization. Proc. IEEE 81(2), 211–243 (1993)
Article Google Scholar
Bik, A.J., Gannon, D.B.: Automatically exploiting implicit parallelism in Java. Concurr. Pract. Exp. 9(6), 579–619 (1997)
Article Google Scholar
Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Hendren, L. (ed.) Compiler Construction, pp. 132–146. Springer, Berlin (2008)
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Article Google Scholar
Chan, B., Abdelrahman, T.S.: Run-time support for the automatic parallelization of Java programs. J. Supercomput. 28(1), 91–117 (2004)
Article MATH Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In ACM SIGPLAN Notices, vol. 40, pp. 519–538. ACM (2005)
Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003, pp. 434–445. IEEE (2003)
Dagum, L., Enon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. Computer 12, 36–42 (2009)
Article Google Scholar
Dominguez, R.M.: Evaluating Different Java Bindings for OpenCL. Master thesis at Universidad Carlos III de Madrid. https://e-archivo.uc3m.es/handle/10016/17183 (2013)
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008, pp. 1–11. IEEE (2008)
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of openmp task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) OpenMP in a New Era of Parallelism, pp. 100–110. Springer, Berlin (2008)
Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model, pp. 79–103. Springer, Berlin (1996)
Fonseca, A.: Æminium Benchmark Suite. https://github.com/AEminium/AeminiumBenchmarks (2013). Accessed 23 Oct 2013
Fonseca, A., Cabral, B.: Æminiumgpu: an intelligent framework for gpu programming. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge III, pp. 96–107. Springer, Berlin (2013)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Michael Berman, A. (ed.) ACM SIGPLAN Notices, vol. 33, pp. 212–223. ACM, New York, NY (1998)
Hogen, G., Kindler, A., Loogen, R.: Automatic parallelization of lazy functional programs. In: ESOP’92, pp. 254–268. Springer, Berlin (1992)
Jenista, J.C., Demsky, B.C., et al.: Ooojava: software out-of-order execution. In: Gill, A. (ed.) ACM SIGPLAN Notices, vol. 46, pp. 57–68. ACM, New York, NY (2011)
Lea, D.: A Java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43. ACM (2000)
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM SIGPLAN Not. 44(4), 101–110 (2009)
Article Google Scholar
Leino, K., Poetzsch-Heffter, A., Zhou, Y.: Using data groups to specify and check side effects. ACM SIGPLAN Not. 37(5), 246–257 (2002)
Article Google Scholar
Marlow, S., Peyton Jones, S., Singh, S.: Runtime support for multicore haskell. In: Tolmach, A. (ed.) ACM SIGPLAN Notices, vol. 44, pp. 65–78. ACM, New York, NY (2009)
Pawlak, R., Monperrus, M., Petitprez, N., Noguera, C., Seinturier, L.: Spoon v2: Large scale source code analysis and transformation for Java. Tech. Rep. hal-01078532, INRIA (2006). https://hal.inria.fr/hal-01078532
Rafael, J., Correia, I., Fonseca, A., Cabral, B.: Dependency-based automatic parallelization of java applications. In: Euro-Par 2014: Parallel Processing Workshops, pp. 182–193. Springer, Berlin (2014)
Senghor, A., Konate, K.: Fjcomp, a Java parallelizing compiler for dealing with divide-and-conquer algorithm. In: 2013 International Conference on Computer Applications Technology (ICCAT), pp. 1–5. IEEE (2013)
Steele, G.: Parallel programming and parallel abstractions in fortress. Lect. Not. Comput. Sci. 3945, 1 (2006)
Article Google Scholar
Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., Aldrich, J.: Æminium: a permission-based concurrent-by-default programming language approach. ACM Trans. Program. Lang. Syst. 36(1), 2 (2014)
Article Google Scholar
Swaine, J., Tew, K., Dinda, P., Findler, R.B., Flatt, M.: Back to the futures: incremental parallelization of existing sequential runtime systems. In: Rinard, M., Sullivan, K.J., Steinberg, D.H. (eds.) ACM SIGPLAN Notices, vol. 45, pp. 583–597. ACM, New York, NY (2010)
Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Hall, M. (ed.) ACM SIGPLAN Notices, vol. 45, pp. 179–190. ACM, New York, NY (2010)
Wang, C., Li, X., Zhang, J., Zhou, X., Nie, X.: Mp-tomasulo: a dependency-aware automatic parallel execution engine for sequential programs. ACM Trans. Archit. Code Optim. 10(2), 9 (2013)
Article Google Scholar
Zhao, J., Rogers, I., Kirkham, C., Watson, I.: Loop parallelisation for the jikes rvm. In: Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, 2005. PDCAT 2005, pp. 35–39. IEEE (2005)

Download references

Acknowledgments

This work would not have been possible without the contributions to the Æminium language and runtime from Sven Stork, Paulo Marques and Jonathan Aldrich.

Author information

Authors and Affiliations

Department of Informatics Engineering, Universidade de Coimbra, Coimbra, Portugal
Alcides Fonseca, Bruno Cabral, João Rafael & Ivo Correia

Authors

Alcides Fonseca
View author publications
You can also search for this author in PubMed Google Scholar
Bruno Cabral
View author publications
You can also search for this author in PubMed Google Scholar
João Rafael
View author publications
You can also search for this author in PubMed Google Scholar
Ivo Correia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alcides Fonseca.

Additional information

This work was partially supported by the Portuguese Research Agency FCT, through CISUC (R&D Unit 326/97) and the CMU|Portugal program (R&D Project Æminium CMU-PT/SE/0038/2008).

Alcides Fonseca: Supported by the Portuguese National Foundation for Science and Technology (FCT) through a Doctoral Grant (SFRH/BD/84448/2012).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fonseca, A., Cabral, B., Rafael, J. et al. Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime. Int J Parallel Prog 44, 1337–1358 (2016). https://doi.org/10.1007/s10766-016-0426-5

Download citation

Received: 30 November 2014
Accepted: 04 April 2016
Published: 16 April 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10766-016-0426-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Abstract

Access this article

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Efficient High-Level Programming in Plain Java

A Review of Parallel Implementations for the Smith–Waterman Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Abstract

Access this article

Similar content being viewed by others

Shared Memory Parallelism in Modern C++ and HPX

Efficient High-Level Programming in Plain Java

A Review of Parallel Implementations for the Smith–Waterman Algorithm

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation