Abstract
Task parallelism is a programming technique that has been shown to be applicable in a wide variety of problem domains. A central parameter that needs to be controlled to ensure efficient execution of task-parallel programs is the granularity of tasks. When they are too coarse-grained, scalability and load balance suffer, while very fine-grained tasks introduce execution overheads.
We present a combined compiler and runtime approach that enables automatic granularity control. Starting from recursive, task parallel programs, our compiler generates multiple versions of each task, increasing granularity by task unrolling and subsequent removal of superfluous synchronization primitives. A runtime system then selects among these task versions of varying granularity by tracking task demand.
Benchmarking on a set of task parallel programs using a work-stealing scheduler demonstrates that our approach is generally effective. For fine-grained tasks, we can achieve reductions in execution time exceeding a factor of 6, compared to state-of-the-art implementations.
Chapter PDF
Similar content being viewed by others
References
Asanovic, K., et al.: The Landscape of Parallel Computing Research: A View from Berkeley. Tech. rep. EECS Department, University of California (2006)
Blumofe, R.D., et al.: Cilk: an efficient multithreaded runtime system. In: Proc. 5th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPOPP 1995, Santa Barbara, California, United States, pp. 207–216 (1995)
Broquedis, F., Gautier, T., Danjean, V.: libKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 102–115. Springer, Heidelberg (2012)
Chen, X., Long, S.: Adaptive Multi-versioning for OpenMP Parallelization via Machine Learning. In: Proc. 15th Int. Conf. on Parallel and Distributed Systems, ICPADS 2009, pp. 907–912 (2009)
Deshpande, N.A., Edwards, S.A.: Statically Unrolling Recursion to Improve Opportunities for Parallelism. Tech. rep. Department of Computer Science, Columbia University (2012)
Duran, A., et al.: An adaptive cut-off for task parallelism. In: Proc. 2008 ACM/IEEE Conf. on Supercomputing, SC 2008, Austin, Texas, pp. 36:1–36:11 (2008)
Duran, A., et al.: Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP. In: Proc. 2009 Int. Conf. on Parallel Processing, ICPP 2009, pp. 124–131 (2009)
Turner, D.N. (ed.), et al.: On the Granularity of Divide-and-Conquer Parallelism. In: Glasgow Workshop on Functional Programming, pp. 8–10. Springer (1995)
Fitzpatrick, S., et al.: Unfolding Recursive Function Definitions Using the Paradoxical Combinator (1996)
Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: Proc. 19th Int. Conf. on Supercomputing, ICS 2005, Cambridge, Massachusetts, pp. 361–366 (2005)
Insieme Compiler and Runtime Infrastructure. Distributed and Parallel Systems Group, University of Innsbruck, http://insieme-compiler.org
Intel. Intel C and C++ Compilers (2012), http://software.intel.com/en-us/c-compilers/
Jordan, H., et al.: A Multi-Objective Auto-Tuning Framework for Parallel Codes. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, Article No. 10. IEEE Computer Society Press, Los Alamitos (2012)
Mohr, E., et al.: Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs. IEEE Transactions on Parallel and Distributed Systems 2 (1991)
Olivier, S., Prins, J.F.: Comparison of OpenMP 3.0 and Other Task Parallel Frameworks on Unbalanced Task Graphs. International Journal of Parallel Programming 38(5-6), 341–360 (2010)
Olivier, S.L., et al.: OpenMP task scheduling strategies for multicore NUMA systems. Int. J. High Perform. Comput. Appl. 26(2), 110–124 (2012)
OpenMP Architecture Review Board. OpenMP Specification. Version 3.1 (2011), http://www.openmp.org/mp-documents
Rugina, R., Rinard, M.: Recursion Unrolling for Divide and Conquer Programs. In: Midkiff, S.P., Moreira, J.E., Gupta, M., Chatterjee, S., Ferrante, J., Prins, J.F., Pugh, B., Tseng, C.-W. (eds.) LCPC 2000. LNCS, vol. 2017, pp. 34–48. Springer, Heidelberg (2001)
Stallman, R.: Using and Porting the GNU Compiler Collection. In: M.I.T. Artificial Intelligence Laboratory (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thoman, P., Jordan, H., Fahringer, T. (2013). Adaptive Granularity Control in Task Parallel Programs Using Multiversioning. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-40047-6_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)