Automatically Tuning Task-Based Programs for Multicore Processors

  • Jin Zhou
  • Brian Demsky


We present a new technique to automatically optimize parallel software for multicore processors. We have implemented the technique for Bamboo, a task-based extension to Java. Optimizing applications for multicore processors requires balancing the competing concerns of parallelism and communication costs. Bamboo uses high-level simulation to explore how to best trade off these competing concerns for an application. The compiler begins by generating several initial candidate implementations. The compiler then uses high-level simulation with profile statistics to evaluate these candidate implementations. It uses an as-built critical path analysis to automatically identify opportunities to improve the candidate implementation and then uses directed simulated annealing to evaluate possible optimizations.


Core Group Multicore Processor Flag State Task Instance Real Execution Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was supported by the National Science Foundation under grants CCF-0846195 and CCF-0725350.


  1. 1.
    Jenista J, Demsky B (2009) Disjointness analysis for Java-like languages. Technical Report UCI-ISR-09-1, Institute for Software Research, University of California, IrvineGoogle Scholar
  2. 2.
    Demsky B, Sundaramurthy S (2007) Static analysis of task interactions in bristlecone for program understanding. Technical Report UCI-ISR-07-7, Institute for Software Research, University of California, IrvineGoogle Scholar
  3. 3.
    Larson HJ, Shubert BO (1979) Probabilistic models in engineering sciences. Wiley, New YorkGoogle Scholar
  4. 4.
    Smith LA, Bull JM, Obdrzalek J (2001) A parallel Java Grande benchmark suite. In Proceedings of SC2001Google Scholar
  5. 5.
    Gordon M et al (2002) A stream compiler for communication-exposed architectures. In International conference on architectural support for programming languages and operating systems, October 2002Google Scholar
  6. 6.
    Johnston WM, Hanna JRP, Millar RJ (2004) Advances in dataflow programming languages. ACM Comput Surv 36(1):1–34CrossRefGoogle Scholar
  7. 7.
    Gelernter D (1985) Generative communication in Linda. ACM Trans Progr Lang Syst (TOPLAS) 7(1):80–112MATHCrossRefGoogle Scholar
  8. 8.
    Cook WR, Patwardhan S, Misra J (2006) Workflow patterns in Orc. In Proceedings of the 2006 international conference on coordination models and languagesGoogle Scholar
  9. 9.
    Smolka G (1996) The Oz programming model. In Proceedings of the European workshop on logics in artificial intelligence, Springer-Verlag, London, UK, p 251CrossRefGoogle Scholar
  10. 10.
    Hewitt C, Baker HG (1978) Actors and continuous functionals. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USAGoogle Scholar
  11. 11.
    Demsky B, Dash A (2008) Bristlecone: a language for robust software systems. In Proceedings of the 2008 European conference on object-oriented programmingGoogle Scholar
  12. 12.
    Bilmes J, Asanovic K, Chin CW, Demmel J (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM international conference on supercomputing, pp 340–347Google Scholar
  13. 13.
    Frigo M (1999) A fast Fourier transform compiler. In Proceedings of the ACM SIGPLAN 1999 conference on programming language design and implementation, pp 169–180Google Scholar
  14. 14.
    Püschel M et al. (2005) SPIRAL: code generation for DSP transforms. Proc IEEE, special issue on Prog Generation Optimization Adapt 93(2):232–275Google Scholar

Copyright information

© Springer New York 2011

Authors and Affiliations

  1. 1.University of CaliforniaIrvineUSA

Personalised recommendations