Automatically Tuning Task-Based Programs for Multicore Processors
We present a new technique to automatically optimize parallel software for multicore processors. We have implemented the technique for Bamboo, a task-based extension to Java. Optimizing applications for multicore processors requires balancing the competing concerns of parallelism and communication costs. Bamboo uses high-level simulation to explore how to best trade off these competing concerns for an application. The compiler begins by generating several initial candidate implementations. The compiler then uses high-level simulation with profile statistics to evaluate these candidate implementations. It uses an as-built critical path analysis to automatically identify opportunities to improve the candidate implementation and then uses directed simulated annealing to evaluate possible optimizations.
KeywordsCore Group Multicore Processor Flag State Task Instance Real Execution Time
This research was supported by the National Science Foundation under grants CCF-0846195 and CCF-0725350.
- 1.Jenista J, Demsky B (2009) Disjointness analysis for Java-like languages. Technical Report UCI-ISR-09-1, Institute for Software Research, University of California, IrvineGoogle Scholar
- 2.Demsky B, Sundaramurthy S (2007) Static analysis of task interactions in bristlecone for program understanding. Technical Report UCI-ISR-07-7, Institute for Software Research, University of California, IrvineGoogle Scholar
- 3.Larson HJ, Shubert BO (1979) Probabilistic models in engineering sciences. Wiley, New YorkGoogle Scholar
- 4.Smith LA, Bull JM, Obdrzalek J (2001) A parallel Java Grande benchmark suite. In Proceedings of SC2001Google Scholar
- 5.Gordon M et al (2002) A stream compiler for communication-exposed architectures. In International conference on architectural support for programming languages and operating systems, October 2002Google Scholar
- 8.Cook WR, Patwardhan S, Misra J (2006) Workflow patterns in Orc. In Proceedings of the 2006 international conference on coordination models and languagesGoogle Scholar
- 10.Hewitt C, Baker HG (1978) Actors and continuous functionals. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USAGoogle Scholar
- 11.Demsky B, Dash A (2008) Bristlecone: a language for robust software systems. In Proceedings of the 2008 European conference on object-oriented programmingGoogle Scholar
- 12.Bilmes J, Asanovic K, Chin CW, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM international conference on supercomputing, pp 340–347Google Scholar
- 13.Frigo M (1999) A fast Fourier transform compiler. In Proceedings of the ACM SIGPLAN 1999 conference on programming language design and implementation, pp 169–180Google Scholar
- 14.Püschel M et al. (2005) SPIRAL: code generation for DSP transforms. Proc IEEE, special issue on Prog Generation Optimization Adapt 93(2):232–275Google Scholar