Software Quality Journal

, Volume 26, Issue 3, pp 1063–1096 | Cite as

A multi-aspect online tuning framework for HPC applications

  • Michael Gerndt
  • Siegfried Benkner
  • Eduardo CésarEmail author
  • Carmen Navarrete
  • Enes Bajrovic
  • Jiri Dokulil
  • Carla Guillén
  • Robert Mijakovic
  • Anna Sikora


Developing software applications for high-performance computing (HPC) requires careful optimizations targeting a myriad of increasingly complex, highly interrelated software, hardware and system components. The demands placed on minimizing energy consumption on extreme-scale HPC systems and the associated shift towards hete rogeneous architectures add yet another level of complexity to program development and optimization. As a result, the software optimization process is often seen as daunting, cumbersome and time-consuming by software developers wishing to fully exploit HPC resources. To address these challenges, we have developed the Periscope Tuning Framework (PTF), an online automatic integrated tuning framework that combines both performance analysis and performance tuning with respect to the myriad of tuning parameters available to today’s software developer on modern HPC systems. This work introduces the architecture, tuning model and main infrastructure components of PTF as well as the main tuning plugins of PTF and their evaluation.


Automatic performance tuning High-performance computing Performance optimization Parallel architectures Energy tuning OpenCL 



This work was supported by the European Commission FP7 project AutoTune under grant no. 288038.


  1. Bajrovic, E., Mijakovic, R., Dokulil, J., Benkner, S., & Gerndt, M. (2016). Tuning OpenCL applications with the periscope tuning framework, Hawaii international conference on system sciences. IEEE.Google Scholar
  2. Balaprakash, P., Tiwari, A., & Wild, S.M. (2013). Multi-objective optimization of hpc kernels for performance, power, and energy, 4th international workshop on performance modeling, benchmarking, and simulation of HPC systems (PMBS12), 11/2013.Google Scholar
  3. Benedict, S., Petkov, V., & Gerndt, M. (2010). Periscope: An online-based distributed performance analysis tool. In Müller, M.S., Resch, M.M., Schulz, A., & Nagel, W.E. (Eds.), Tools for high performance computing 2009 (pp. 1–16). Berlin Heidelberg: Springer.Google Scholar
  4. Bruel, P., Gonzalez, M., & Goldman, A. (2015). Autotuning gpu compiler parameters using opentuner. XXII Symposium of Systems of High Performance Computing.Google Scholar
  5. Buck, B., & Hollingsworth, J.K. (2000). An api for runtime code patching. International Journal of High Performance Computing Applications, 14(4), 317–329.CrossRefGoogle Scholar
  6. Chen, C., Chame, J., & Hall, M. (2008). Chill: A framework for composing high-level loop transformations. Technical report University of Southern California.Google Scholar
  7. Chung, I-H., & Hollingsworth, J.K. (2004). Using information from prior runs to improve automated tuning systems, Proceedings of the 2004 ACM/IEEE conference on supercomputing, SC ’04 (p. 30). Washington: IEEE Computer Society.Google Scholar
  8. Costa, G., Jorba, J., Morajko, A., Margalef, T., & Luque, E. (2008). Performance models for dynamic tuning of parallel applications on computational grids, 2008 IEEE international conference on cluster computing (pp. 376–385).CrossRefGoogle Scholar
  9. Costa, G., Sikora, A., Jorba, J., & Gmate, T.M. (2014). Dynamic tuning of parallel applications in grid environment. Journal of Grid Computing, 12(2), 371–398.CrossRefGoogle Scholar
  10. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., & Yelick, K. (2008). Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the 2008 ACM/IEEE conference on supercomputing, SC ’08 (pp. 4:1–4:12). Piscataway: IEEE Press.Google Scholar
  11. Demmel, J., Dongarra, J., Eijkhout, V., Fuentes, E., Petitet, A., Vuduc, R., Whaley, R. C., & Yelick, K. (2005). Self-adapting linear algebra algorithms and software. Proceedings of the IEEE, 93(2), 293–312.CrossRefGoogle Scholar
  12. Frigo, M., & Johnson, S. G. (1998). Fftw: an adaptive software architecture for the fft. In Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, 1998 (Vol. 3, pp. 1381–1384).Google Scholar
  13. Frigo, M., & Johnson, S. G. (2005). The design and implementation of fftw3. Proceedings of the IEEE, 93(2), 216–231.CrossRefGoogle Scholar
  14. Fursin, G., Kashnikov, Y., Memon, A.W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E., Thomson, J., Williams, C.K.I., & O’Boyle, M. (2011). Milepost gcc Machine learning enabled self-tuning compiler. International Journal of Parallel Programming, 39(3), 296–327.CrossRefGoogle Scholar
  15. Gerndt, M., César, E., & Benkner, S. (eds.) (2015). Automatic tuning of HPC applications - the periscope tuning framework. Shaker Verlag.Google Scholar
  16. Haneda, M., Knijnenburg, P. M. W., & Wijshoff, H.A.G. (2005). Automatic selection of compiler options using non-parametric inferential statistics, 14th International conference on parallel architectures and compilation techniques, 2005. PACT 2005 (pp. 123–132).Google Scholar
  17. Kukkonen, S., & Lampinen, J. (2005). Gde3: The third evolution step of generalized differential evolution. In The 2005 IEEE congress on evolutionary computation, 2005 (Vol. 1, pp. 443–450). IEEE.Google Scholar
  18. Leather, H., Bonilla, E., & O’Boyle, M. (2009). Automatic feature generation for machine learning based optimizing compilation, Proceedings of the 7th Annual IEEE/ACM international symposium on code generation and optimization, CGO ’09 (pp. 81–91). Washington: IEEE Computer Society.CrossRefGoogle Scholar
  19. Morajko, A., Caymes-Scutari, P., Margalef, T., & Mate, E. Luque. (2007). Monitoring, analysis and tuning environment for parallel/distributed applications. Concurrency and Computation: Practice and Experience, 19(11), 1517–1531.CrossRefzbMATHGoogle Scholar
  20. Morajko, A., César, E., Caymes-Scutari, P., Margalef, T., Sorribes, J., & Luque, E. (2005). Automatic tuning of Master/Worker applications. In Proceedings of Euro-Par 2005 parallel processing: 11th international euro-par conference (pp. 95–103).Google Scholar
  21. Navarette, C., Guillen, C., Hesse, W., & Brehm, M. (2014). Autotuning the energy consumption. In Bader, M. et al. (Eds.) Parallel computing accelerating computational science and engineering. IOS Press.Google Scholar
  22. Nelson, Y. L., Bansal, B., Hall, M., Nakano, A., & Lerman, K. (2008). Model-guided performance tuning of parameter values A case study with molecular dynamics visualization, IEEE international symposium on parallel and distributed processing, 2008. IPDPS 2008 (pp. 1–8).Google Scholar
  23. Oleynik, Y., Gerndt, M., Schuchart, J., Kjeldsberg, P.G., & Nagel, W.E. (2015). Run-time exploitation of application dynamism for energy-efficient exascale computing (READEX). In IEEE 18th international conference on computational science and engineering (CSE), 2015 (pp. 347–350). IEEE.Google Scholar
  24. Pan, Z., & Eigenmann, R. (2006). Fast and effective orchestration of compiler optimizations for automatic performance tuning, Proceedings of the international symposium on code generation and optimization, CGO ’06 (pp. 319–332). Washington: IEEE Computer Society.Google Scholar
  25. Püschel, M., Moura, J.M. F., Singer, B., Xiong, J., Johnson, J., Padua, D., Veloso, M., & Johnson, R.W. (2004). Spiral: a generator for platform-adapted libraries of signal processing algorithms. International Journal of High Performance Computing Applications, 18(1), 21–45.CrossRefGoogle Scholar
  26. Ravipati, G., Bernat, A.R., Miller, B.P., & Hollingsworth, J.K. (2007). Towards the deconstruction of dyninst. Technical report. University of Wisconsin.Google Scholar
  27. Ribler, R.L., Simitci, H., & Reed, D.A. (2001). The autopilot performance-directed adaptive control system. Future Generation Computer Systems, 18(1), 175–187.CrossRefzbMATHGoogle Scholar
  28. Ribler, R. L., Vetter, J. S., Simitci, H., & Reed, D. A. (1998). Autopilot: adaptive control of distributed applications, Proceedings of the 7th international symposium on high performance distributed computing, 1998 (pp. 172–179).Google Scholar
  29. Tang, Y., Chowdhury, R.A., Kuszmaul, B.C., Luk, C.-K., & Leiserson, C.E. (2011). The pochoir stencil compiler, Proceedings of the 23rd annual ACM symposium on parallelism in algorithms and architectures, SPAA ’11 (pp. 117–128). New York: ACM.Google Scholar
  30. Tiwari, A., Chen, C., Chame, J., Hall, M., & Hollingsworth, J.K. (2009). A scalable auto-tuning framework for compiler optimization, IEEE International symposium on parallel distributed processing, 2009. IPDPS 2009 (pp. 1–12).Google Scholar
  31. Tiwari, A., & Hollingsworth, J. K. (2011). Online adaptive code generation and tuning. In 2011 IEEE international parallel distributed processing symposium (IPDPS) (pp. 879–892).Google Scholar
  32. Ţăpuş, C., Chung, I-H., & Hollingsworth, J.K. (2002). Active harmony: Towards automated performance tuning, Proceedings of the 2002 ACM/IEEE conference on supercomputing, SC ’02 (pp. 1–11). Los Alamitos: IEEE Computer Society Press.Google Scholar
  33. The LLVM Compiler Infrastructure.
  34. Triantafyllis, S., Vachharajani, M., Vachharajani, N., & August, D.I. (2003). Compiler optimization-space exploration, Proceedings of the international symposium on code generation and optimization: feedback-directed and runtime optimization, CGO ’03 (pp. 204–215). Washington: IEEE Computer Society.Google Scholar
  35. Vuduc, R., Demmel, J.W., & Yelick, K.A. (2005). Oski: A library of automatically tuned sparse matrix kernels. Journal of Physics: Conference Series, 16(1), 521.Google Scholar
  36. Whaley, R.C., Petitet, A., & Dongarra, J.J. (2001). Automated empirical optimizations of software and the ATLAS project. Parallel Computing, 27(12), 3–35. New Trends in High Performance Computing.CrossRefzbMATHGoogle Scholar
  37. Williams, S., Waterman, A., & Patterson, D. (2009). Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 52 (4), 65–76.CrossRefGoogle Scholar
  38. X-TUNE. Autotuning for exascale: self-tuning software to manage heterogeneity.
  39. Xiujuan, L., & Zhongke, S. (2004). Overview of multi-objective optimization methods. Journal of Systems Engineering and Electronics, 15(2), 142–146.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Technical University of MunichMunichGermany
  2. 2.University of ViennaViennaAustria
  3. 3.Autonomous University of BarcelonaBarcelonaSpain
  4. 4.Leibniz Supercomputing CentreGarching bei MünchenGermany

Personalised recommendations