Integrating performance analysis and energy efficiency optimizations in a unified environment

  • Robert Schöne
  • Daniel Molka
Special Issue Paper


Performance analysis tools have been available for decades. They help developers to speed up their applications and pinpoint bottlenecks in scalability. They are wide-spread, well understood, and sophisticated. Since the growing power consumption of HPC systems has become a major cost factor, support for energy efficiency evaluation has been added to various performance analysis tools. Furthermore, beneficial as well as detrimental effects of power saving strategies on energy efficiency are already well understood. However, appropriate tools to directly exploit the detected potentials are not yet available. We therefore present a library that reuses the highly sophisticated instrumentation mechanisms of VampirTrace to dynamically change hardware and software parameters that influence energy efficiency. We also present a library that wraps OpenMP runtimes of several x86_64 compilers in order to provide a low-overhead instrumentation at a parallel region level. This enhances VampirTrace’s abilities to handle OpenMP programs without the typically required recompilation.


Energy efficiency Benchmarking Tuning Optimization Performance analysis Frequency scaling Dynamic concurrency throttling 



This work has been funded by the Bundesministerium für Bildung und Forschung via the research project CoolSilicon (BMBF 16N10186).


  1. 1.
    Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr Comput, Pract Exp 22(6):685–701. doi: 10.1002/cpe.1553 Google Scholar
  2. 2.
    Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The nas parallel benchmarkssummary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on supercomputing, Supercomputing ’91. ACM, New York, pp 158–165. doi: 10.1145/125826.125925 CrossRefGoogle Scholar
  3. 3.
    Chetsa GLT, Lefèvre L, Pierson JM, Stolf P, Costa GD (2012) A runtime framework for energy efficient hpc systems without a priori knowledge of applications. In: ICPADS. IEEE Comput. Soc., Los Alamitos, pp 660–667 Google Scholar
  4. 4.
    Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel WE (2008) The vampir performance analysis tool-set. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A (eds) Tools for high performance computing. Springer, Berlin, pp 139–155. doi: 10.1007/978-3-540-68564-7_9 CrossRefGoogle Scholar
  5. 5.
    Knüpfer A, Rössel C, Mey D, Biersdorff S, Diethelm K, Eschweiler D, Geimer M, Gerndt M, Lorenz D, Malony A, Nagel WE, Oleynik Y, Philippen P, Saviankou P, Schmidl D, Shende S, Tschüter R, Wagner M, Wesarg B, Wolf F (2012) Score-p: a joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In: Brunst H, Müller MS, Nagel WE, Resch MM (eds) Tools for high performance computing 2011. Springer, Berlin, pp 79–91. doi: 10.1007/978-3-642-31476-6_7 CrossRefGoogle Scholar
  6. 6.
    Liao Sw, Hung TH, Nguyen D, Chou C, Tu C, Zhou H (2009) Machine learning-based prefetch optimization for data center applications. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC ’09. ACM, New York, pp 56:1–56:10. doi: 10.1145/1654059.1654116 Google Scholar
  7. 7.
    Lively C, Wu X, Taylor V, Moore S, Chang HC, Su CY, Cameron K (2012) Power-aware predictive models of hybrid (mpi/openmp) scientific applications on multicore systems. Comput Sci Res Dev 27(4):245–253. doi: 10.1007/s00450-011-0190-0 CrossRefGoogle Scholar
  8. 8.
    Mohr B, Malony A, Shende S, Wolf F (2002) Design and prototype of a performance tool interface for openmp. J Supercomput 23(1):105–128. doi: 10.1023/A:1015741304337 CrossRefzbMATHGoogle Scholar
  9. 9.
    Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: A tool to visualize and analyze parallel code. In: WoTUG-18, pp 17–31 Google Scholar
  10. 10.
    Rountree B, Lowenthal D, Funk S, Freeh VW, De Supinski B, Schulz M (2007) Bounding energy consumption in large-scale mpi programs. In: Supercomputing, 2007. SC ’07. Proceedings of the 2007 ACM/IEEE conference, pp 1–9. doi: 10.1145/1362622.1362688 Google Scholar
  11. 11.
    Rountree B, Lownenthal DK, de Supinski BR, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making dvs practical for complex hpc applications. In: Proceedings of the 23rd international conference on supercomputing, ICS ’09. ACM, New York, pp 460–469. doi: 10.1145/1542275.1542340 CrossRefGoogle Scholar
  12. 12.
    Schöne R, Hackenberg D (2011) On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceedings of the second joint WOSP/SIPEW international conference on performance engineering, ICPE ’11. ACM, New York, pp 481–486. doi: 10.1145/1958746.1958819 CrossRefGoogle Scholar
  13. 13.
    Schöne R, Hackenberg D, Molka D (2012) Memory performance at reduced cpu clock speeds: an analysis of current x86_64 processors. In: Proceedings of the 2012 USENIX conference on power-aware computing and systems, HotPower ’12. USENIX Association, Berkeley, pp 9 Google Scholar
  14. 14.
    Schöne R, Tschüter R, Ilsche T, Hackenberg D (2011) The vampirtrace plugin counter interface: introduction and examples. In: Proceedings of the 2010 conference on parallel processing, euro-par 2010. Springer, Berlin, pp 501–511 Google Scholar
  15. 15.
    Schulz M, Galarowicz J, Maghrak D, Hachfeld W, Montoya D, Cranford S (2008) Open|speedshop: An open source infrastructure for parallel performance analysis. Sci Program 16(2–3):105–121 Google Scholar
  16. 16.
    Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311. doi: 10.1177/1094342006064482 CrossRefGoogle Scholar
  17. 17.
    Tiwari A, Laurenzano M, Peraza J, Carrington L, Snavely A (2012) Green queue: customized large-scale clock frequency scaling. In: Cloud and green computing (CGC), 2012 second international conference, pp 260–267. doi: 10.1109/CGC.2012.62 Google Scholar
  18. 18.
    Tolentino M, Cameron KW (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. Computer 45(1):95–97. doi: 10.1109/MC.2012.34 CrossRefGoogle Scholar
  19. 19.
    TOP500 org (2012) Titan cray xk7 overview on Online Accessed April 2013
  20. 20.
    Wu CJ, Martonosi M (2011) Characterization and dynamic mitigation of intra-application cache interference. In: Proceedings of the IEEE international symposium on performance analysis of systems and software, ISPASS ’11. IEEE Comput. Soc., Washington, pp 2–11. doi: 10.1109/ISPASS.2011.5762710 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Center for Information Services and High Performance Computing (ZIH)Technische Universität DresdenDresdenGermany

Personalised recommendations