Integrating performance analysis and energy efficiency optimizations in a unified environment

Schöne, Robert; Molka, Daniel

doi:10.1007/s00450-013-0243-7

Integrating performance analysis and energy efficiency optimizations in a unified environment

Special Issue Paper
Published: 25 July 2013

Volume 29, pages 231–239, (2014)
Cite this article

Computer Science - Research and Development

Robert Schöne¹ &
Daniel Molka¹

479 Accesses
11 Citations
Explore all metrics

Abstract

Performance analysis tools have been available for decades. They help developers to speed up their applications and pinpoint bottlenecks in scalability. They are wide-spread, well understood, and sophisticated. Since the growing power consumption of HPC systems has become a major cost factor, support for energy efficiency evaluation has been added to various performance analysis tools. Furthermore, beneficial as well as detrimental effects of power saving strategies on energy efficiency are already well understood. However, appropriate tools to directly exploit the detected potentials are not yet available. We therefore present a library that reuses the highly sophisticated instrumentation mechanisms of VampirTrace to dynamically change hardware and software parameters that influence energy efficiency. We also present a library that wraps OpenMP runtimes of several x86_64 compilers in order to provide a low-overhead instrumentation at a parallel region level. This enhances VampirTrace’s abilities to handle OpenMP programs without the typically required recompilation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Containers in HPC: a survey

Article 27 October 2022

Efficient High-Level Programming in Plain Java

Article 05 December 2022

Notes

The used threshold and reduced frequencies are estimates based on the authors experience and not meant to fit perfectly for a maximal saving of energy consumption. Still they provide a good impression how effective a first optimization can be.
In this paper we focus on limited scalability due to architectural, not algorithmical reasons. The latter would include regions with a high amount of barriers and locks and would also benefit from a reduced number of threads.

References

Adhianto L, Banerjee S, Fagan M, Krentel M, Marin G, Mellor-Crummey J, Tallent NR (2010) Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurr Comput, Pract Exp 22(6):685–701. doi:10.1002/cpe.1553
Google Scholar
Bailey DH, Barszcz E, Barton JT, Browning DS, Carter RL, Dagum L, Fatoohi RA, Frederickson PO, Lasinski TA, Schreiber RS, Simon HD, Venkatakrishnan V, Weeratunga SK (1991) The nas parallel benchmarkssummary and preliminary results. In: Proceedings of the 1991 ACM/IEEE conference on supercomputing, Supercomputing ’91. ACM, New York, pp 158–165. doi:10.1145/125826.125925
Chapter Google Scholar
Chetsa GLT, Lefèvre L, Pierson JM, Stolf P, Costa GD (2012) A runtime framework for energy efficient hpc systems without a priori knowledge of applications. In: ICPADS. IEEE Comput. Soc., Los Alamitos, pp 660–667
Google Scholar
Knüpfer A, Brunst H, Doleschal J, Jurenz M, Lieber M, Mickler H, Müller M, Nagel WE (2008) The vampir performance analysis tool-set. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A (eds) Tools for high performance computing. Springer, Berlin, pp 139–155. doi:10.1007/978-3-540-68564-7_9
Chapter Google Scholar
Knüpfer A, Rössel C, Mey D, Biersdorff S, Diethelm K, Eschweiler D, Geimer M, Gerndt M, Lorenz D, Malony A, Nagel WE, Oleynik Y, Philippen P, Saviankou P, Schmidl D, Shende S, Tschüter R, Wagner M, Wesarg B, Wolf F (2012) Score-p: a joint performance measurement run-time infrastructure for periscope, scalasca, tau, and vampir. In: Brunst H, Müller MS, Nagel WE, Resch MM (eds) Tools for high performance computing 2011. Springer, Berlin, pp 79–91. doi:10.1007/978-3-642-31476-6_7
Chapter Google Scholar
Liao Sw, Hung TH, Nguyen D, Chou C, Tu C, Zhou H (2009) Machine learning-based prefetch optimization for data center applications. In: Proceedings of the conference on high performance computing networking, storage and analysis, SC ’09. ACM, New York, pp 56:1–56:10. doi:10.1145/1654059.1654116
Google Scholar
Lively C, Wu X, Taylor V, Moore S, Chang HC, Su CY, Cameron K (2012) Power-aware predictive models of hybrid (mpi/openmp) scientific applications on multicore systems. Comput Sci Res Dev 27(4):245–253. doi:10.1007/s00450-011-0190-0
Article Google Scholar
Mohr B, Malony A, Shende S, Wolf F (2002) Design and prototype of a performance tool interface for openmp. J Supercomput 23(1):105–128. doi:10.1023/A:1015741304337
Article MATH Google Scholar
Pillet V, Labarta J, Cortes T, Girona S (1995) Paraver: A tool to visualize and analyze parallel code. In: WoTUG-18, pp 17–31
Google Scholar
Rountree B, Lowenthal D, Funk S, Freeh VW, De Supinski B, Schulz M (2007) Bounding energy consumption in large-scale mpi programs. In: Supercomputing, 2007. SC ’07. Proceedings of the 2007 ACM/IEEE conference, pp 1–9. doi:10.1145/1362622.1362688
Google Scholar
Rountree B, Lownenthal DK, de Supinski BR, Schulz M, Freeh VW, Bletsch T (2009) Adagio: making dvs practical for complex hpc applications. In: Proceedings of the 23rd international conference on supercomputing, ICS ’09. ACM, New York, pp 460–469. doi:10.1145/1542275.1542340
Chapter Google Scholar
Schöne R, Hackenberg D (2011) On-line analysis of hardware performance events for workload characterization and processor frequency scaling decisions. In: Proceedings of the second joint WOSP/SIPEW international conference on performance engineering, ICPE ’11. ACM, New York, pp 481–486. doi:10.1145/1958746.1958819
Chapter Google Scholar
Schöne R, Hackenberg D, Molka D (2012) Memory performance at reduced cpu clock speeds: an analysis of current x86_64 processors. In: Proceedings of the 2012 USENIX conference on power-aware computing and systems, HotPower ’12. USENIX Association, Berkeley, pp 9
Google Scholar
Schöne R, Tschüter R, Ilsche T, Hackenberg D (2011) The vampirtrace plugin counter interface: introduction and examples. In: Proceedings of the 2010 conference on parallel processing, euro-par 2010. Springer, Berlin, pp 501–511
Google Scholar
Schulz M, Galarowicz J, Maghrak D, Hachfeld W, Montoya D, Cranford S (2008) Open|speedshop: An open source infrastructure for parallel performance analysis. Sci Program 16(2–3):105–121
Google Scholar
Shende SS, Malony AD (2006) The tau parallel performance system. Int J High Perform Comput Appl 20(2):287–311. doi:10.1177/1094342006064482
Article Google Scholar
Tiwari A, Laurenzano M, Peraza J, Carrington L, Snavely A (2012) Green queue: customized large-scale clock frequency scaling. In: Cloud and green computing (CGC), 2012 second international conference, pp 260–267. doi:10.1109/CGC.2012.62
Google Scholar
Tolentino M, Cameron KW (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. Computer 45(1):95–97. doi:10.1109/MC.2012.34
Article Google Scholar
TOP500 org (2012) Titan cray xk7 overview on top500.org. Online http://www.top500.org/system/177975. Accessed April 2013
Wu CJ, Martonosi M (2011) Characterization and dynamic mitigation of intra-application cache interference. In: Proceedings of the IEEE international symposium on performance analysis of systems and software, ISPASS ’11. IEEE Comput. Soc., Washington, pp 2–11. doi:10.1109/ISPASS.2011.5762710
Chapter Google Scholar

Download references

Acknowledgements

This work has been funded by the Bundesministerium für Bildung und Forschung via the research project CoolSilicon (BMBF 16N10186).

Author information

Authors and Affiliations

Center for Information Services and High Performance Computing (ZIH), Technische Universität Dresden, 01062, Dresden, Germany
Robert Schöne & Daniel Molka

Authors

Robert Schöne
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Molka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Schöne.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schöne, R., Molka, D. Integrating performance analysis and energy efficiency optimizations in a unified environment. Comput Sci Res Dev 29, 231–239 (2014). https://doi.org/10.1007/s00450-013-0243-7

Download citation

Published: 25 July 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00450-013-0243-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating performance analysis and energy efficiency optimizations in a unified environment

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Containers in HPC: a survey

Efficient High-Level Programming in Plain Java

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrating performance analysis and energy efficiency optimizations in a unified environment

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Containers in HPC: a survey

Efficient High-Level Programming in Plain Java

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation