Abstract
There are billions of lines of sequential code inside nowadays’ software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the available parallelism, has been a research goal for some time now. This work proposes a new approach for achieving such goal. We created a new parallelizing compiler that analyses the read and write instructions, and control-flow modifications in programs to identify a set of dependencies between the instructions in the program. Afterwards, the compiler, based on the generated dependencies graph, rewrites and organizes the program in a task-oriented structure. Parallel tasks are composed by instructions that cannot be executed in parallel. A work-stealing-based parallel runtime is responsible for scheduling and managing the granularity of the generated tasks. Furthermore, a compile-time granularity control mechanism also avoids creating unnecessary data-structures. This work focuses on the Java language, but the techniques are general enough to be applied to other programming languages. We have evaluated our approach on 8 benchmark programs against OoOJava, achieving higher speedups. In some cases, values were close to those of a manual parallelization. The resulting parallel code also has the advantage of being readable and easily configured to improve further its performance manually.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., McMahon, J.O., Pasquier, F.X., Péan, G., Villalon, P.: Par4all: from convex array regions to heterogeneous computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012 (2012)
Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The design of openmp tasks. IEEE Trans. Parallel Distrib. Syst. 20(3), 404–418 (2009)
Ayguadé, E., Duran, A., Hoeflinger, J., Massaioli, F., Teruel, X.: An experimental evaluation of the new openmp tasking model. In: Adve, V., Garzarán, M.J., Petersen, P. (eds.) Languages and Compilers for Parallel Computing, pp. 63–77. Springer, Berlin (2008)
Banerjee, U., Eigenmann, R., Nicolau, A., Padua, D.A., et al.: Automatic program parallelization. Proc. IEEE 81(2), 211–243 (1993)
Bik, A.J., Gannon, D.B.: Automatically exploiting implicit parallelism in Java. Concurr. Pract. Exp. 9(6), 579–619 (1997)
Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Hendren, L. (ed.) Compiler Construction, pp. 132–146. Springer, Berlin (2008)
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)
Chan, B., Abdelrahman, T.S.: Run-time support for the automatic parallelization of Java programs. J. Supercomput. 28(1), 91–117 (2004)
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In ACM SIGPLAN Notices, vol. 40, pp. 519–538. ACM (2005)
Chen, M.K., Olukotun, K.: The Jrpm system for dynamically parallelizing java programs. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, 2003, pp. 434–445. IEEE (2003)
Dagum, L., Enon, R.: Openmp: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Dave, C., Bae, H., Min, S.J., Lee, S., Eigenmann, R., Midkiff, S.: Cetus: a source-to-source compiler infrastructure for multicores. Computer 12, 36–42 (2009)
Dominguez, R.M.: Evaluating Different Java Bindings for OpenCL. Master thesis at Universidad Carlos III de Madrid. https://e-archivo.uc3m.es/handle/10016/17183 (2013)
Duran, A., Corbalán, J., Ayguadé, E.: An adaptive cut-off for task parallelism. In: International Conference for High Performance Computing, Networking, Storage and Analysis, 2008. SC 2008, pp. 1–11. IEEE (2008)
Duran, A., Corbalán, J., Ayguadé, E.: Evaluation of openmp task scheduling strategies. In: Eigenmann, R., de Supinski, B.R. (eds.) OpenMP in a New Era of Parallelism, pp. 100–110. Springer, Berlin (2008)
Feautrier, P.: Automatic parallelization in the polytope model. In: Perrin, G.-R., Darte, A. (eds.) The Data Parallel Programming Model, pp. 79–103. Springer, Berlin (1996)
Fonseca, A.: Æminium Benchmark Suite. https://github.com/AEminium/AeminiumBenchmarks (2013). Accessed 23 Oct 2013
Fonseca, A., Cabral, B.: Æminiumgpu: an intelligent framework for gpu programming. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge III, pp. 96–107. Springer, Berlin (2013)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Michael Berman, A. (ed.) ACM SIGPLAN Notices, vol. 33, pp. 212–223. ACM, New York, NY (1998)
Hogen, G., Kindler, A., Loogen, R.: Automatic parallelization of lazy functional programs. In: ESOP’92, pp. 254–268. Springer, Berlin (1992)
Jenista, J.C., Demsky, B.C., et al.: Ooojava: software out-of-order execution. In: Gill, A. (ed.) ACM SIGPLAN Notices, vol. 46, pp. 57–68. ACM, New York, NY (2011)
Lea, D.: A Java fork/join framework. In: Proceedings of the ACM 2000 Conference on Java Grande, pp. 36–43. ACM (2000)
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM SIGPLAN Not. 44(4), 101–110 (2009)
Leino, K., Poetzsch-Heffter, A., Zhou, Y.: Using data groups to specify and check side effects. ACM SIGPLAN Not. 37(5), 246–257 (2002)
Marlow, S., Peyton Jones, S., Singh, S.: Runtime support for multicore haskell. In: Tolmach, A. (ed.) ACM SIGPLAN Notices, vol. 44, pp. 65–78. ACM, New York, NY (2009)
Pawlak, R., Monperrus, M., Petitprez, N., Noguera, C., Seinturier, L.: Spoon v2: Large scale source code analysis and transformation for Java. Tech. Rep. hal-01078532, INRIA (2006). https://hal.inria.fr/hal-01078532
Rafael, J., Correia, I., Fonseca, A., Cabral, B.: Dependency-based automatic parallelization of java applications. In: Euro-Par 2014: Parallel Processing Workshops, pp. 182–193. Springer, Berlin (2014)
Senghor, A., Konate, K.: Fjcomp, a Java parallelizing compiler for dealing with divide-and-conquer algorithm. In: 2013 International Conference on Computer Applications Technology (ICCAT), pp. 1–5. IEEE (2013)
Steele, G.: Parallel programming and parallel abstractions in fortress. Lect. Not. Comput. Sci. 3945, 1 (2006)
Stork, S., Naden, K., Sunshine, J., Mohr, M., Fonseca, A., Marques, P., Aldrich, J.: Æminium: a permission-based concurrent-by-default programming language approach. ACM Trans. Program. Lang. Syst. 36(1), 2 (2014)
Swaine, J., Tew, K., Dinda, P., Findler, R.B., Flatt, M.: Back to the futures: incremental parallelization of existing sequential runtime systems. In: Rinard, M., Sullivan, K.J., Steinberg, D.H. (eds.) ACM SIGPLAN Notices, vol. 45, pp. 583–597. ACM, New York, NY (2010)
Tzannes, A., Caragea, G.C., Barua, R., Vishkin, U.: Lazy binary-splitting: a run-time adaptive work-stealing scheduler. In: Hall, M. (ed.) ACM SIGPLAN Notices, vol. 45, pp. 179–190. ACM, New York, NY (2010)
Wang, C., Li, X., Zhang, J., Zhou, X., Nie, X.: Mp-tomasulo: a dependency-aware automatic parallel execution engine for sequential programs. ACM Trans. Archit. Code Optim. 10(2), 9 (2013)
Zhao, J., Rogers, I., Kirkham, C., Watson, I.: Loop parallelisation for the jikes rvm. In: Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, 2005. PDCAT 2005, pp. 35–39. IEEE (2005)
Acknowledgments
This work would not have been possible without the contributions to the Æminium language and runtime from Sven Stork, Paulo Marques and Jonathan Aldrich.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by the Portuguese Research Agency FCT, through CISUC (R&D Unit 326/97) and the CMU|Portugal program (R&D Project Æminium CMU-PT/SE/0038/2008).
Alcides Fonseca: Supported by the Portuguese National Foundation for Science and Technology (FCT) through a Doctoral Grant (SFRH/BD/84448/2012).
Rights and permissions
About this article
Cite this article
Fonseca, A., Cabral, B., Rafael, J. et al. Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime. Int J Parallel Prog 44, 1337–1358 (2016). https://doi.org/10.1007/s10766-016-0426-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0426-5