Advertisement

Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization

  • Chunhua Liao
  • Daniel J. Quinlan
  • Richard Vuduc
  • Thomas Panas
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5898)

Abstract

Although automated empirical performance optimization and tuning is well-studied for kernels and domain-specific libraries, a current research grand challenge is how to extend these methodologies and tools to significantly larger sequential and parallel applications. In this context, we present the ROSE source-to-source outliner, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. Our outliner aims to handle large scale C/C++, Fortran and OpenMP applications. A set of program analysis and transformation techniques are utilized to enhance the portability, scalability, and interoperability of source-to-source outlining. More importantly, the generated kernels preserve performance characteristics of tuning targets and can be easily handled by other tools. Preliminary evaluations have shown that the ROSE outliner serves as a key component within an end-to-end empirical optimization system and enables a wide range of sequential and parallel optimization opportunities.

Keywords

Variable Clone Reference Type Parallel Region Chunk Size Code Segment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Whaley, C., Dongarra, J.: Automatically tuned linear algebra software. In: Proceedings of Supercomputing, Orlando, FL (1998)Google Scholar
  2. 2.
    Frigo, M.: A fast Fourier transform compiler. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia (May 1999)Google Scholar
  3. 3.
    Kisuki, T., Knijnenburg, P.M., O’Boyle, M.F.: Combined selection of tile sizes and unroll factors using iterative compilation. In: Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Philadelphia, PA (October 2000)Google Scholar
  4. 4.
    Whalley, D.B.: Tuning high performance kernels through empirical compilation. In: ICPP 2005: Proceedings of the 2005 International Conference on Parallel Processing, Washington, DC, USA, pp. 89–98. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  5. 5.
    Lee, Y.J., Hall, M.W.: A code isolator: Isolating code fragments from large programs. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 164–178. Springer, Heidelberg (2005)Google Scholar
  6. 6.
    Qasem, A., Kennedy, K., Mellor-Crummey, J.: Automatic tuning of whole applications using direct search and a performance-based transformation system. J. Supercomput. 36(2), 183–196 (2006)CrossRefGoogle Scholar
  7. 7.
    Pan, Z., Eigenmann, R.: PEAK—a fast and effective performance tuning system via compiler optimization orchestration. ACM Trans. Program. Lang. Syst. 30(3), 1–43 (2008)CrossRefGoogle Scholar
  8. 8.
    Bailey, D., Chame, J., Chen, C., Dongarra, J., Hall, M., Hollingsworth, J.K., Hovland, P., Moore, S., Seymour, K., Shin, J., Tiwari, A., Williams, S., You, H.: PERI auto-tuning. Journal of Physics: Conference Series (2008)Google Scholar
  9. 9.
    Zhao, P., Amaral, J.N.: Ablego: a function outlining and partial inlining framework: Research articles. Softw. Pract. Exper. 37(5), 465–491 (2007)CrossRefGoogle Scholar
  10. 10.
    Quinlan, D.J., et al.: ROSE compiler project, http://www.rosecompiler.org/
  11. 11.
    Mellor-Crummey, J., et al.: HPCToolkit, http://www.hpctoolkit.org/
  12. 12.
    Hargrove, P.H., et al.: Berkeley lab checkpoint/restart (BLCR), https://ftg.lbl.gov/CheckpointRestart
  13. 13.
    You, H., Seymour, K., Dongarra, J.: An effective empirical search method for autmatic software tuning. Technical report, University of Tennessee (2005)Google Scholar
  14. 14.
    Chung, I.H., Hollingsworth, J.K.: Using information from prior runs to improve automated tuning systems. In: SC 2004: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, Washington, DC, USA, p. 30 (2004)Google Scholar
  15. 15.
    Yi, Q., Quinlan, D.: Applying loop optimizations to object-oriented abstractions through general classification of array semantics. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds.) LCPC 2004. LNCS, vol. 3602, pp. 253–267. Springer, Heidelberg (2005)Google Scholar
  16. 16.
    Yi, Q., Seymour, K., You, H., Vuduc, R., Quinlan, D.: POET: Parameterized optimizations for empirical tuning. In: Workshop on Performance Optimization of High-Level Languages and Libraries (POHLL) (March 2007)Google Scholar
  17. 17.
    Chen, C., Chame, J., Hall, M.: CHiLL: A framework for composing high-level loop transformations. Technical report, USC Computer Science (2008)Google Scholar
  18. 18.
    Brown, P.N., Falgout, R.D., Jones, J.E.: Semicoarsening multigrid on distributed memory machines. SIAM J. Sci. Comput. 21(5), 1823–1834 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Liao, C., Quinlan, D.J., Willcock, J.J., Panas, T.: Extending automatic parallelization to optimize high-level abstractions for multicore. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 28–41. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Lakhotia, A., Deprez, J.C.: Restructuring programs by tucking statements into functions. In: Harman, M., Gallagher, K. (eds.) Special Issue on Program Slicing. Information and Software Technology, vol. 40, pp. 677–689 (1998)Google Scholar
  21. 21.
    Komondoor, R., Horwitz, S.: Effective, automatic procedure extraction. In: IWPC 2003: Proceedings of the 11th IEEE International Workshop on Program Comprehension, Washington, DC, USA, p. 33. IEEE Computer Society, Los Alamitos (2003)Google Scholar
  22. 22.
    Jin, G., Mellor-Crummey, J.: Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library. In: ICS 2002: Proceedings of the 16th international conference on Supercomputing, pp. 305–314. ACM, New York (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Chunhua Liao
    • 1
  • Daniel J. Quinlan
    • 1
  • Richard Vuduc
    • 2
  • Thomas Panas
    • 1
  1. 1.Center for Applied Scientific ComputingLawrence Livermore National LaboratoryLivermore
  2. 2.College of ComputingGeorgia Institute of TechnologyAtlanta

Personalised recommendations