Skip to main content
Log in

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

There are billions of lines of sequential code inside nowadays’ software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available parallelism, has been a research goal for some time now. This work proposes a new approach for achieving such goal. We created a new parallelizing compiler that analyses the read and write instructions, and control-flow modifications in programs to identify a set of dependencies between the instructions in the program. Afterwards, the compiler, based on the generated dependencies graph, rewrites and organizes the program in a task-oriented structure. Parallel tasks are composed by instructions that cannot be executed in parallel. A work-stealing-based parallel runtime is responsible for scheduling and managing the granularity of the generated tasks. Furthermore, a compile-time granularity control mechanism also avoids creating unnecessary data-structures. This work focuses on the Java language, but the techniques are general enough to be applied to other programming languages. We have evaluated our approach on 8 benchmark programs against OoOJava, achieving higher speedups. In some cases, values were close to those of a manual parallelization. The resulting parallel code also has the advantage of being readable and easily configured to improve further its performance manually.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4all: from convex array regions to heterogeneous computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012 (2012)

  2. Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)

    Article  Google Scholar 

  3. Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An experimental evaluation of the new openmp tasking model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) Languages and Compilers for Parallel Computing, pp. 63–77. Springer, Berlin (2008)

  4. Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A., et al.: Automatic program parallelization. Proc. IEEE 81(2), 211–243 (1993)

    Article  Google Scholar 

  5. Bik, A.J., Gannon, D.B.: Automatically exploiting implicit parallelism in Java. Concurr. Pract. Exp. 9(6), 579–619 (1997)

    Article  Google Scholar 

  6. Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Hendren, L. (ed.) Compiler Construction, pp. 132–146. Springer, Berlin (2008)

  7. Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)

    Article  Google Scholar 

  8. Chan, B., Abdelrahman, T.S.: Run-time support for the automatic parallelization of Java programs. J. Supercomput. 28(1), 91–117 (2004)

    Article  MATH  Google Scholar 

  9. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In ACM SIGPLAN Notices, vol. 40, pp. 519–538. ACM (2005)

  10. Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003, pp. 434–445. IEEE (2003)

  11. Dagum, L., Enon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)

    Article  Google Scholar 

  12. Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. Computer 12, 36–42 (2009)

    Article  Google Scholar 

  13. Dominguez, R.M.: Evaluating Different Java Bindings for OpenCL. Master thesis at Universidad Carlos III de Madrid. https://e-archivo.uc3m.es/handle/10016/17183 (2013)

  14. Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008, pp. 1–11. IEEE (2008)

  15. Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of openmp task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) OpenMP in a New Era of Parallelism, pp. 100–110. Springer, Berlin (2008)

  16. Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model, pp. 79–103. Springer, Berlin (1996)

  17. Fonseca, A.: Æminium Benchmark Suite. https://github.com/AEminium/AeminiumBenchmarks (2013). Accessed 23 Oct 2013

  18. Fonseca, A., Cabral, B.: Æminiumgpu: an intelligent framework for gpu programming. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge III, pp. 96–107. Springer, Berlin (2013)

  19. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Michael Berman, A. (ed.) ACM SIGPLAN Notices, vol. 33, pp. 212–223. ACM, New York, NY (1998)

  20. Hogen, G., Kindler, A., Loogen, R.: Automatic parallelization of lazy functional programs. In: ESOP’92, pp. 254–268. Springer, Berlin (1992)

  21. Jenista, J.C., Demsky, B.C., et al.: Ooojava: software out-of-order execution. In: Gill, A. (ed.) ACM SIGPLAN Notices, vol. 46, pp. 57–68. ACM, New York, NY (2011)

  22. Lea, D.: A Java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43. ACM (2000)

  23. Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM SIGPLAN Not. 44(4), 101–110 (2009)

    Article  Google Scholar 

  24. Leino, K., Poetzsch-Heffter, A., Zhou, Y.: Using data groups to specify and check side effects. ACM SIGPLAN Not. 37(5), 246–257 (2002)

    Article  Google Scholar 

  25. Marlow, S., Peyton Jones, S., Singh, S.: Runtime support for multicore haskell. In: Tolmach, A. (ed.) ACM SIGPLAN Notices, vol. 44, pp. 65–78. ACM, New York, NY (2009)

  26. Pawlak, R., Monperrus, M., Petitprez, N., Noguera, C., Seinturier, L.: Spoon v2: Large scale source code analysis and transformation for Java. Tech. Rep. hal-01078532, INRIA (2006). https://hal.inria.fr/hal-01078532

  27. Rafael, J., Correia, I., Fonseca, A., Cabral, B.: Dependency-based automatic parallelization of java applications. In: Euro-Par 2014: Parallel Processing Workshops, pp. 182–193. Springer, Berlin (2014)

  28. Senghor, A., Konate, K.: Fjcomp, a Java parallelizing compiler for dealing with divide-and-conquer algorithm. In: 2013 International Conference on Computer Applications Technology (ICCAT), pp. 1–5. IEEE (2013)

  29. Steele, G.: Parallel programming and parallel abstractions in fortress. Lect. Not. Comput. Sci. 3945, 1 (2006)

    Article  Google Scholar 

  30. Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., Aldrich, J.: Æminium: a permission-based concurrent-by-default programming language approach. ACM Trans. Program. Lang. Syst. 36(1), 2 (2014)

    Article  Google Scholar 

  31. Swaine, J., Tew, K., Dinda, P., Findler, R.B., Flatt, M.: Back to the futures: incremental parallelization of existing sequential runtime systems. In: Rinard, M., Sullivan, K.J., Steinberg, D.H. (eds.) ACM SIGPLAN Notices, vol. 45, pp. 583–597. ACM, New York, NY (2010)

  32. Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Hall, M. (ed.) ACM SIGPLAN Notices, vol. 45, pp. 179–190. ACM, New York, NY (2010)

  33. Wang, C., Li, X., Zhang, J., Zhou, X., Nie, X.: Mp-tomasulo: a dependency-aware automatic parallel execution engine for sequential programs. ACM Trans. Archit. Code Optim. 10(2), 9 (2013)

    Article  Google Scholar 

  34. Zhao, J., Rogers, I., Kirkham, C., Watson, I.: Loop parallelisation for the jikes rvm. In: Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, 2005. PDCAT 2005, pp. 35–39. IEEE (2005)

Download references

Acknowledgments

This work would not have been possible without the contributions to the Æminium language and runtime from Sven Stork, Paulo Marques and Jonathan Aldrich.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alcides Fonseca.

Additional information

This work was partially supported by the Portuguese Research Agency FCT, through CISUC (R&D Unit 326/97) and the CMU|Portugal program (R&D Project Æminium CMU-PT/SE/0038/2008).

Alcides Fonseca: Supported by the Portuguese National Foundation for Science and Technology (FCT) through a Doctoral Grant (SFRH/BD/84448/2012).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fonseca, A., Cabral, B., Rafael, J. et al. Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime. Int J Parallel Prog 44, 1337–1358 (2016). https://doi.org/10.1007/s10766-016-0426-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-016-0426-5

Keywords

Navigation