Abstract
We present a new technique to automatically optimize parallel software for multicore processors. We have implemented the technique for Bamboo, a task-based extension to Java. Optimizing applications for multicore processors requires balancing the competing concerns of parallelism and communication costs. Bamboo uses high-level simulation to explore how to best trade off these competing concerns for an application. The compiler begins by generating several initial candidate implementations. The compiler then uses high-level simulation with profile statistics to evaluate these candidate implementations. It uses an as-built critical path analysis to automatically identify opportunities to improve the candidate implementation and then uses directed simulated annealing to evaluate possible optimizations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jenista J, Demsky B (2009) Disjointness analysis for Java-like languages. Technical Report UCI-ISR-09-1, Institute for Software Research, University of California, Irvine
Demsky B, Sundaramurthy S (2007) Static analysis of task interactions in bristlecone for program understanding. Technical Report UCI-ISR-07-7, Institute for Software Research, University of California, Irvine
Larson HJ, Shubert BO (1979) Probabilistic models in engineering sciences. Wiley, New York
Smith LA, Bull JM, Obdrzalek J (2001) A parallel Java Grande benchmark suite. In Proceedings of SC2001
Gordon M et al (2002) A stream compiler for communication-exposed architectures. In International conference on architectural support for programming languages and operating systems, October 2002
Johnston WM, Hanna JRP, Millar RJ (2004) Advances in dataflow programming languages. ACM Comput Surv 36(1):1–34
Gelernter D (1985) Generative communication in Linda. ACM Trans Progr Lang Syst (TOPLAS) 7(1):80–112
Cook WR, Patwardhan S, Misra J (2006) Workflow patterns in Orc. In Proceedings of the 2006 international conference on coordination models and languages
Smolka G (1996) The Oz programming model. In Proceedings of the European workshop on logics in artificial intelligence, Springer-Verlag, London, UK, p 251
Hewitt C, Baker HG (1978) Actors and continuous functionals. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA
Demsky B, Dash A (2008) Bristlecone: a language for robust software systems. In Proceedings of the 2008 European conference on object-oriented programming
Bilmes J, Asanovic K, Chin CW, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM international conference on supercomputing, pp 340–347
Frigo M (1999) A fast Fourier transform compiler. In Proceedings of the ACM SIGPLAN 1999 conference on programming language design and implementation, pp 169–180
Püschel M et al. (2005) SPIRAL: code generation for DSP transforms. Proc IEEE, special issue on Prog Generation Optimization Adapt 93(2):232–275
Acknowledgements
This research was supported by the National Science Foundation under grants CCF-0846195 and CCF-0725350.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer New York
About this chapter
Cite this chapter
Zhou, J., Demsky, B. (2011). Automatically Tuning Task-Based Programs for Multicore Processors. In: Naono, K., Teranishi, K., Cavazos, J., Suda, R. (eds) Software Automatic Tuning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6935-4_18
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6935-4_18
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6934-7
Online ISBN: 978-1-4419-6935-4
eBook Packages: EngineeringEngineering (R0)