Abstract
Performance analysis tools have been available for decades. They help developers to speed up their applications and pinpoint bottlenecks in scalability. They are wide-spread, well understood, and sophisticated. Since the growing power consumption of HPC systems has become a major cost factor, support for energy efficiency evaluation has been added to various performance analysis tools. Furthermore, beneficial as well as detrimental effects of power saving strategies on energy efficiency are already well understood. However, appropriate tools to directly exploit the detected potentials are not yet available. We therefore present a library that reuses the highly sophisticated instrumentation mechanisms of VampirTrace to dynamically change hardware and software parameters that influence energy efficiency. We also present a library that wraps OpenMP runtimes of several x86_64 compilers in order to provide a low-overhead instrumentation at a parallel region level. This enhances VampirTrace’s abilities to handle OpenMP programs without the typically required recompilation.
Similar content being viewed by others
Notes
The used threshold and reduced frequencies are estimates based on the authors experience and not meant to fit perfectly for a maximal saving of energy consumption. Still they provide a good impression how effective a first optimization can be.
In this paper we focus on limited scalability due to architectural, not algorithmical reasons. The latter would include regions with a high amount of barriers and locks and would also benefit from a reduced number of threads.
References
Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr Comput, Pract Exp 22(6):685–701. doi:10.1002/cpe.1553
Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The nas parallel benchmarkssummary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on supercomputing, Supercomputing ’91. ACM, New York, pp 158–165. doi:10.1145/125826.125925
Chetsa GLT, Lefèvre L, Pierson JM, Stolf P, Costa GD (2012) A runtime framework for energy efficient hpc systems without a priori knowledge of applications. In: ICPADS. IEEE Comput. Soc., Los Alamitos, pp 660–667
Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel WE (2008) The vampir performance analysis tool-set. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A (eds) Tools for high performance computing. Springer, Berlin, pp 139–155. doi:10.1007/978-3-540-68564-7_9
Knüpfer A, Rössel C, Mey D, Biersdorff S, Diethelm K, Eschweiler D, Geimer M, Gerndt M, Lorenz D, Malony A, Nagel WE, Oleynik Y, Philippen P, Saviankou P, Schmidl D, Shende S, Tschüter R, Wagner M, Wesarg B, Wolf F (2012) Score-p: a joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In: Brunst H, Müller MS, Nagel WE, Resch MM (eds) Tools for high performance computing 2011. Springer, Berlin, pp 79–91. doi:10.1007/978-3-642-31476-6_7
Liao Sw, Hung TH, Nguyen D, Chou C, Tu C, Zhou H (2009) Machine learning-based prefetch optimization for data center applications. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC ’09. ACM, New York, pp 56:1–56:10. doi:10.1145/1654059.1654116
Lively C, Wu X, Taylor V, Moore S, Chang HC, Su CY, Cameron K (2012) Power-aware predictive models of hybrid (mpi/openmp) scientific applications on multicore systems. Comput Sci Res Dev 27(4):245–253. doi:10.1007/s00450-011-0190-0
Mohr B, Malony A, Shende S, Wolf F (2002) Design and prototype of a performance tool interface for openmp. J Supercomput 23(1):105–128. doi:10.1023/A:1015741304337
Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: A tool to visualize and analyze parallel code. In: WoTUG-18, pp 17–31
Rountree B, Lowenthal D, Funk S, Freeh VW, De Supinski B, Schulz M (2007) Bounding energy consumption in large-scale mpi programs. In: Supercomputing, 2007. SC ’07. Proceedings of the 2007 ACM/IEEE conference, pp 1–9. doi:10.1145/1362622.1362688
Rountree B, Lownenthal DK, de Supinski BR, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making dvs practical for complex hpc applications. In: Proceedings of the 23rd international conference on supercomputing, ICS ’09. ACM, New York, pp 460–469. doi:10.1145/1542275.1542340
Schöne R, Hackenberg D (2011) On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceedings of the second joint WOSP/SIPEW international conference on performance engineering, ICPE ’11. ACM, New York, pp 481–486. doi:10.1145/1958746.1958819
Schöne R, Hackenberg D, Molka D (2012) Memory performance at reduced cpu clock speeds: an analysis of current x86_64 processors. In: Proceedings of the 2012 USENIX conference on power-aware computing and systems, HotPower ’12. USENIX Association, Berkeley, pp 9
Schöne R, Tschüter R, Ilsche T, Hackenberg D (2011) The vampirtrace plugin counter interface: introduction and examples. In: Proceedings of the 2010 conference on parallel processing, euro-par 2010. Springer, Berlin, pp 501–511
Schulz M, Galarowicz J, Maghrak D, Hachfeld W, Montoya D, Cranford S (2008) Open|speedshop: An open source infrastructure for parallel performance analysis. Sci Program 16(2–3):105–121
Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311. doi:10.1177/1094342006064482
Tiwari A, Laurenzano M, Peraza J, Carrington L, Snavely A (2012) Green queue: customized large-scale clock frequency scaling. In: Cloud and green computing (CGC), 2012 second international conference, pp 260–267. doi:10.1109/CGC.2012.62
Tolentino M, Cameron KW (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. Computer 45(1):95–97. doi:10.1109/MC.2012.34
TOP500 org (2012) Titan cray xk7 overview on top500.org. Online http://www.top500.org/system/177975. Accessed April 2013
Wu CJ, Martonosi M (2011) Characterization and dynamic mitigation of intra-application cache interference. In: Proceedings of the IEEE international symposium on performance analysis of systems and software, ISPASS ’11. IEEE Comput. Soc., Washington, pp 2–11. doi:10.1109/ISPASS.2011.5762710
Acknowledgements
This work has been funded by the Bundesministerium für Bildung und Forschung via the research project CoolSilicon (BMBF 16N10186).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schöne, R., Molka, D. Integrating performance analysis and energy efficiency optimizations in a unified environment. Comput Sci Res Dev 29, 231–239 (2014). https://doi.org/10.1007/s00450-013-0243-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-013-0243-7