Automatically Tuning Parallel and Parallelized Programs

Dave, Chirag; Eigenmann, Rudolf

doi:10.1007/978-3-642-13374-9_9

Chirag Dave¹⁸ &
Rudolf Eigenmann¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

883 Accesses
10 Citations

Abstract

In today’s multicore era, parallelization of serial code is essential in order to exploit the architectures’ performance potential. Parallelization, especially of legacy code, however, proves to be a challenge as manual efforts must either be directed towards algorithmic modifications or towards analysis of computationally intensive sections of code for the best possible parallel performance, both of which are difficult and time-consuming. Automatic parallelization uses sophisticated compile-time techniques in order to identify parallelism in serial programs, thus reducing the burden on the program developer. Similar sophistication is needed to improve the performance of hand-parallelized programs. A key difficulty is that optimizing compilers are generally unable to estimate the performance of an application or even a program section at compile-time, and so the task of performance improvement invariably rests with the developer. Automatic tuning uses static analysis and runtime performance metrics to determine the best possible compile-time approach for optimal application performance. This paper describes an offline tuning approach that uses a source-to-source parallelizing compiler, Cetus, and a tuning framework to tune parallel application performance. The implementation uses an existing, generic tuning algorithm called Combined Elimination to study the effect of serializing parallelizable loops based on measured whole program execution time, and provides a combination of parallel loops as an outcome that ensures to equal or improve performance of the original program. We evaluated our algorithm on a suite of hand-parallelized C benchmarks from the SPEC OMP2001 and NAS Parallel benchmarks and provide two sets of results. The first ignores hand-parallelized loops and only tunes application performance based on Cetus-parallelized loops. The second set of results considers the tuning of additional parallelism in hand-parallelized code. We show that our implementation always performs near-equal or better than serial code while tuning only Cetus-parallelized loops and equal to or better than hand-parallelized code while tuning additional parallelism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)
Google Scholar
Voss, M.J., Eigenmann, R.: Reducing parallel overheads through dynamic serialization. In: Int’l. Parallel Processing Symposium, pp. 88–92 (1999)
Google Scholar
Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pp. 177–187 (2009)
Google Scholar
Rauchwerger, L., Padua, D.: The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the SIGPLAN 1995 Conference on Programming Languages Design and Implementation, June 1995, pp. 218–232 (1995)
Google Scholar
Kisuki, T., Knijnenburg, P.M., O’Boyle, M.F., Bodin, F., Wijshoff, H.A.: A feasibility study in iterative compilation. In: Polychronopoulos, C., Joe, K., Fukuda, A., Tomita, S. (eds.) ISHPC 1999. LNCS, vol. 1615, pp. 121–132. Springer, Heidelberg (1999)
Chapter Google Scholar
Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pp. 63–76. ACM Press, New York (2003)
Chapter Google Scholar
Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of the 14th ACM SIGPLAN Symposium on the Principles and practice of parallel programming, pp. 75–84 (2009)
Google Scholar
Pan, Z., Eigenmann, R.: Fast, automatic, procedure-level performance tuning. In: Proc. of Parallel architectures and Compilation Techniques, pp. 173–181 (2006)
Google Scholar
Pan, Z., Eigenmann, R.: Fast and effective orchestration of compiler optimizations for automatic performance tuning. In: The 4th Annual International Symposium on Code Generation and Optimization (CGO), March 2006, pp. 319–330 (2006)
Google Scholar
Cetus: A Source-to-Source Compiler Infrastructure for C Programs, http://cetus.ecn.purdue.edu
Johnson, T.A., Lee, S.I., Fei, L., Basumallik, A., Upadhyaya, G., Eigenmann, R., Midkiff, S.P.: Experiences in using Cetus for source-to-source transformations. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 1–14. Springer, Heidelberg (2004)
Google Scholar
Baskaran, M.M., Vydyanathan, N., Bondhugula, U.K.R., Ramanujam, J., Rountev, A., Sadayappan, P.: Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 219–228 (2009)
Google Scholar
Dou, J., Cintra, M.: Compiler estimation of load imbalance overhead in speculative parallelization. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 203–214 (2004)
Google Scholar
Voss, M.J., Eigenmann, R.: High-level adaptive program optimization with adapt. In: Proc. of the ACM Symposium on Principles and Practice of Parallel Programming (PPOPP 2001), pp. 93–102. ACM Press, New York (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Purdue University,
Chirag Dave & Rudolf Eigenmann

Authors

Chirag Dave
View author publications
You can also search for this author in PubMed Google Scholar
Rudolf Eigenmann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dave, C., Eigenmann, R. (2010). Automatically Tuning Parallel and Parallelized Programs. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics