A Holistic Approach towards Automated Performance Analysis and Tuning

  • Guogjing Cong
  • I-Hsin Chung
  • Huifang Wen
  • David Klepacki
  • Hiroki Murata
  • Yasushi Negishi
  • Takao Moriyama
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5704)

Abstract

High productivity to the end user is critical in harnessing the power of high performance computing systems to solve science and engineering problems. It is a challenge to bridge the gap between the hardware complexity and the software limitations. Despite significant progress in language, compiler, and performance tools, tuning an application remains largely a manual task, and is done mostly by experts. In this paper we propose a holistic approach towards automated performance analysis and tuning that we expect to greatly improve the productivity of performance debugging. Our approach seeks to build a framework that facilitates the combination of expert knowledge, compiler techniques, and performance research for performance diagnosis and solution discovery. With our framework, once a diagnosis and tuning strategy has been developed, it can be stored in an open and extensible database and thus be reused in the future. We demonstrate the effectiveness of our approach through the automated performance analysis and tuning of two scientific applications. We show that the tuning process is highly automated, and the performance improvement is significant.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proc. 13th international conference on parallel architecture and compilation techniques, Antibes Juan-les-Pins, France, September 2004, pp. 7–16 (2004)Google Scholar
  2. 2.
    Bhatele, A., Cong, G.: A selective profiling tool: towards automatic performance tuning. In: Proc. 3rd Workshop on System Management Techniques, Processes and Services (SMTPS 2007), Long beach, California (March 2007)Google Scholar
  3. 3.
    Chen, W., Bringmann, R., Mahlke, S., et al.: Using profile information to assist advanced compiler optimization and scheduling. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1992. LNCS, vol. 757, pp. 31–48. Springer, Heidelberg (1993)CrossRefGoogle Scholar
  4. 4.
    Cong, G., Seelam, S., et al.: Towards next-generation performance optimization tools: A case study. In: Proc. 1st Workshop on Tools Infrastructures and Methodologies for the Evaluation of Research Systems, Austin, TX (March 2007)Google Scholar
  5. 5.
    DeRose, L., Ekanadham, K., Hollingsworth, J.K., Sbaraglia, S.: Sigma: a simulator infrastructure to guide memory analysis. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pp. 1–13 (2002)Google Scholar
  6. 6.
    Geimer, M., Wolf, F., Wylie, B.J.N., Abraham, E., Becker, D., Mohr, B.: The SCALASCA performance toolset architecture. In: Proc. Int’l Workshop on Scalable Tools for High-End Computing (STHEC), Kos, Greece (2008)Google Scholar
  7. 7.
    High productivity computer systems (2005), http://highproductivity.org
  8. 8.
    High productivity computing systems toolkit. IBM alphaworks, http://www.alphaworks.ibm.com/tech/hpcst
  9. 9.
    MacNab, A., Vahala, G., Pavlo, P., Vahala, L., Soe, M.: Lattice Boltzmann Model for Dissipative Incompressible MHD. In: 28th EPS Conference on Contr. Fusion and Plasma Phys., vol. 25A, pp. 853–856 (2001)Google Scholar
  10. 10.
    Malony, A.D., Shende, S., Bell, R., Li, K., Li, L., Trebon, N.: Advances in the tau performance system, pp. 129–144 (2004)Google Scholar
  11. 11.
    Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn Parallel Performance Measurement Tool. IEEE Computer 28, 37–46 (1995)CrossRefGoogle Scholar
  12. 12.
    Mohr, B., Wolf, F.: KOJAK – A tool set for automatic performance analysis of parallel programs. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1301–1304. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  13. 13.
    Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A tool to visualise and analyze parallel code. In: Proc of WoTUG-18: Transputer and occam Developments, vol. 44, pp. 17–31. IOS Press, Amsterdam (1995)Google Scholar
  14. 14.
    Vuduc, R., Demmel, J., Yelick, K.: OSKI: A library of automatically tuned sparse matrix kernels. In: Proceedings of SciDAC 2005, Journal of Physics: Conference Series (2005)Google Scholar
  15. 15.
    Wen, H., Sbaraglia, S., Seelam, S., Chung, I., Cong, G., Klepacki, D.: A productivity centered tools framework for application performance tuning. In: QEST 2007: Proc. of the Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007), Washington, DC, USA, 2007, pp. 273–274. IEEE Computer Society, Los Alamitos (2007)CrossRefGoogle Scholar
  16. 16.
    Whaley, R., Dongarra, J.: Automatically tuned linear algebra software (ATLAS). In: Proc. Supercomputing 1998, Orlando, FL (November 1998), www.netlib.org/utk/people/JackDongarra/PAPERS/atlas-sc98.ps

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Guogjing Cong
    • 1
  • I-Hsin Chung
    • 1
  • Huifang Wen
    • 1
  • David Klepacki
    • 1
  • Hiroki Murata
    • 1
  • Yasushi Negishi
    • 1
  • Takao Moriyama
    • 1
  1. 1.IBM ResearchUSA

Personalised recommendations