Abstract
Current multicore computers differ in many hardware aspects. Tuning parallel applications is indispensable to achieve best performance on a particular hardware platform. Auto-tuners represent a promising approach to systematically optimize a program’s tuning parameters, such as the number of threads, the size of data partitions, or the number of pipeline stages. However, auto-tuners require several tuning runs to find optimal values for all parameters. In addition, a program optimized for execution on one machine usually has to be re-tuned on other machines.
Our approach tackles this problem by introducing a language-based tuning mechanism. The key idea is the inference of essential tuning parameters from high-level parallel language constructs. Instead of identifying and adjusting tuning parameters manually, we exploit the compiler’s context knowledge about the program’s parallel structure to configure the tuning parameters at runtime. Consequently, our approach significantly reduces the need for platform-specific tuning runs.
We implemented the approach as an integral part of XJava, a Java language extension to express task and pipeline parallelism. Several benchmark programs executed on different hardware platforms demonstrate the effectiveness of our approach. On average, our mechanism sets over 90% of the relevant tuning parameters automatically and achieves 93% of the optimal performance.
Chapter PDF
References
Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., Yelick, K.A.: The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report, University of California, Berkeley (2006)
Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel Programmability and the Chapel Language. Int. J. High Perform. Comput. Appl. 21(3) (August 2007)
Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: An Object-Oriented Approach to Non-Uniform Cluster Computing. In: Proc. OOPSLA 2005. ACM, New York (2005)
Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures. In: Proc. Supercomputing Conference (2008)
Frigo, M., Johnson, S.G.: FFTW: An Adaptive Software Architecture for the FFT. In: Proc. ICASSP, vol. 3 (May 1998)
Gordon, M.I., Thies, W., Amarasinghe, S.: Exploiting Coarse-grained Task, Data, and Pipeline Parallelism in Stream Programs. In: Proc. ASPLOS-XII. ACM, New York (2006)
Katagiri, T., Kise, K., Honda, H., Yuba, T.: FIBER: A Generalized Framework for Auto-tuning Software. In: Proc. International Symposium on HPC (2003)
Lea, D.: A Java fork/join Framework. In: Proc. Java Grande 2000. ACM, New York (2000)
Lea, D.: The java.util.concurrent Synchronizer Framework. Sci. Comput. Program 58(3) (2005)
Morajko, A., Margalef, T., Luque, E.: Design and Implementation of a Dynamic Tuning Environment. Parallel and Distributed Computing 67(4) (2007)
Otto, F., Pankratius, V., Tichy, W.F.: High-level Multicore Programming With XJava. In: Comp. ICSE 2009, New Ideas And Emerging Results. ACM, New York (2009)
Otto, F., Pankratius, V., Tichy, W.F.: XJava: Exploiting Parallelism with Object-Oriented Stream Programming. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009 Parallel Processing. LNCS, vol. 5704, pp. 875–886. Springer, Heidelberg (2009)
Pankratius, V., Schaefer, C.A., Jannesari, A., Tichy, W.F.: Software Engineering for Multicore Systems: an Experience Report. In: Proc. IWMSE 2008. ACM, New York (2008)
Proebsting, T.A., Watterson, S.A.: Filter Fusion. In: Proc. Symposium on Principles of Programming Languages (1996)
Randall, K.: Cilk: Efficient Multithreaded Computing. PhD Thesis. Dep. EECS, MIT (1998)
Reinders, J.: Intel Threading Building Blocks. O’Reilly Media, Inc., Sebastopol (2007)
Schaefer, C.A.: Reducing Search Space of Auto-Tuners Using Parallel Patterns. In: Proc. IWMSE 2009. ACM, New York (2009)
Schaefer, C.A., Pankratius, V., Tichy, W.F.: Atune-IL: An Instrumentation Language for Auto-Tuning Parallel Applications. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009 Parallel Processing. LNCS, vol. 5704, pp. 9–20. Springer, Heidelberg (2009)
Schaefer, C.A., Pankratius, V., Tichy, W.F.: Engineering Parallel Applications with Tunable Architectures. In: Proc. ICSE. ACM, New York (2010)
Tapus, C., Chung, I., Hollingsworth, J.K.: Active Harmony: Towards Automated Performance Tuning. In: Proc. Supercomputing Conference (2002)
Thies, W., Karczmarek, M., Amarasinghe, S.: StreamIt: A Language for Streaming Applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, p. 179. Springer, Heidelberg (2002)
Werner-Kytola, O., Tichy, W.F.: Self-tuning Parallelism. In: Williams, R., Afsarmanesh, H., Bubak, M., Hertzberger, B. (eds.) HPCN-Europe 2000. LNCS, vol. 1823, p. 300. Springer, Heidelberg (2000)
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated Empirical Optimizations of Software and the ATLAS Project. Journal of Parallel Computing 27 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Otto, F., Schaefer, C.A., Dempe, M., Tichy, W.F. (2010). A Language-Based Tuning Mechanism for Task and Pipeline Parallelism. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15291-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-15291-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15290-0
Online ISBN: 978-3-642-15291-7
eBook Packages: Computer ScienceComputer Science (R0)