A Language-Based Tuning Mechanism for Task and Pipeline Parallelism

Otto, Frank; Schaefer, Christoph A.; Dempe, Matthias; Tichy, Walter F.

doi:10.1007/978-3-642-15291-7_30

A Language-Based Tuning Mechanism for Task and Pipeline Parallelism

Frank Otto¹⁸,
Christoph A. Schaefer¹⁸,
Matthias Dempe¹⁸ &
…
Walter F. Tichy¹⁸

Conference paper

1366 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6272))

Abstract

Current multicore computers differ in many hardware aspects. Tuning parallel applications is indispensable to achieve best performance on a particular hardware platform. Auto-tuners represent a promising approach to systematically optimize a program’s tuning parameters, such as the number of threads, the size of data partitions, or the number of pipeline stages. However, auto-tuners require several tuning runs to find optimal values for all parameters. In addition, a program optimized for execution on one machine usually has to be re-tuned on other machines.

Our approach tackles this problem by introducing a language-based tuning mechanism. The key idea is the inference of essential tuning parameters from high-level parallel language constructs. Instead of identifying and adjusting tuning parameters manually, we exploit the compiler’s context knowledge about the program’s parallel structure to configure the tuning parameters at runtime. Consequently, our approach significantly reduces the need for platform-specific tuning runs.

We implemented the approach as an integral part of XJava, a Java language extension to express task and pipeline parallelism. Several benchmark programs executed on different hardware platforms demonstrate the effectiveness of our approach. On average, our mechanism sets over 90% of the relevant tuning parameters automatically and achieves 93% of the optimal performance.

Download to read the full chapter text

Chapter PDF

References

Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report, University of California, Berkeley (2006)
Google Scholar
Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel Programmability and the Chapel Language. Int. J. High Perform. Comput. Appl. 21(3) (August 2007)
Google Scholar
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In: Proc. OOPSLA 2005. ACM, New York (2005)
Google Scholar
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: Proc. Supercomputing Conference (2008)
Google Scholar
Frigo, M., Johnson, S.G.: FFTW: An Adaptive Software Architecture for the FFT. In: Proc. ICASSP, vol. 3 (May 1998)
Google Scholar
Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting Coarse-grained Task, Data, and Pipeline Parallelism in Stream Programs. In: Proc. ASPLOS-XII. ACM, New York (2006)
Google Scholar
Katagiri, T., Kise, K., Honda, H., Yuba, T.: FIBER: A Generalized Framework for Auto-tuning Software. In: Proc. International Symposium on HPC (2003)
Google Scholar
Lea, D.: A Java fork/join Framework. In: Proc. Java Grande 2000. ACM, New York (2000)
Google Scholar
Lea, D.: The java.util.concurrent Synchronizer Framework. Sci. Comput. Program 58(3) (2005)
Google Scholar
Morajko, A., Margalef, T., Luque, E.: Design and Implementation of a Dynamic Tuning Environment. Parallel and Distributed Computing 67(4) (2007)
Google Scholar
Otto, F., Pankratius, V., Tichy, W.F.: High-level Multicore Programming With XJava. In: Comp. ICSE 2009, New Ideas And Emerging Results. ACM, New York (2009)
Google Scholar
Otto, F., Pankratius, V., Tichy, W.F.: XJava: Exploiting Parallelism with Object-Oriented Stream Programming. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009 Parallel Processing. LNCS, vol. 5704, pp. 875–886. Springer, Heidelberg (2009)
Chapter Google Scholar
Pankratius, V., Schaefer, C.A., Jannesari, A., Tichy, W.F.: Software Engineering for Multicore Systems: an Experience Report. In: Proc. IWMSE 2008. ACM, New York (2008)
Google Scholar
Proebsting, T.A., Watterson, S.A.: Filter Fusion. In: Proc. Symposium on Principles of Programming Languages (1996)
Google Scholar
Randall, K.: Cilk: Efficient Multithreaded Computing. PhD Thesis. Dep. EECS, MIT (1998)
Google Scholar
Reinders, J.: Intel Threading Building Blocks. O’Reilly Media, Inc., Sebastopol (2007)
Google Scholar
Schaefer, C.A.: Reducing Search Space of Auto-Tuners Using Parallel Patterns. In: Proc. IWMSE 2009. ACM, New York (2009)
Google Scholar
Schaefer, C.A., Pankratius, V., Tichy, W.F.: Atune-IL: An Instrumentation Language for Auto-Tuning Parallel Applications. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009 Parallel Processing. LNCS, vol. 5704, pp. 9–20. Springer, Heidelberg (2009)
Chapter Google Scholar
Schaefer, C.A., Pankratius, V., Tichy, W.F.: Engineering Parallel Applications with Tunable Architectures. In: Proc. ICSE. ACM, New York (2010)
Google Scholar
Tapus, C., Chung, I., Hollingsworth, J.K.: Active Harmony: Towards Automated Performance Tuning. In: Proc. Supercomputing Conference (2002)
Google Scholar
Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A Language for Streaming Applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, p. 179. Springer, Heidelberg (2002)
Chapter Google Scholar
Werner-Kytola, O., Tichy, W.F.: Self-tuning Parallelism. In: Williams, R., Afsarmanesh, H., Bubak, M., Hertzberger, B. (eds.) HPCN-Europe 2000. LNCS, vol. 1823, p. 300. Springer, Heidelberg (2000)
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimizations of Software and the ATLAS Project. Journal of Parallel Computing 27 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Karlsruhe Institute of Technology, 76131, Karlsruhe, Germany
Frank Otto, Christoph A. Schaefer, Matthias Dempe & Walter F. Tichy

Authors

Frank Otto
View author publications
You can also search for this author in PubMed Google Scholar
Christoph A. Schaefer
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Dempe
View author publications
You can also search for this author in PubMed Google Scholar
Walter F. Tichy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy
Pasqua D’Ambra & Mario Guarracino &
ICAR-CNR, Via P. Bucci 41c, 87036, Rende, Italy
Domenico Talia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Otto, F., Schaefer, C.A., Dempe, M., Tichy, W.F. (2010). A Language-Based Tuning Mechanism for Task and Pipeline Parallelism. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-15291-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics