Operation-Aware Power Capping

Wang, Bo; Miller, Julian; Terboven, Christian; Müller, Matthias

doi:10.1007/978-3-030-57675-2_5

Bo Wang¹⁰,
Julian Miller¹⁰,
Christian Terboven¹⁰ &
…
Matthias Müller¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12247))

Included in the following conference series:

European Conference on Parallel Processing

1490 Accesses
3 Citations

Abstract

Once the peak power draw of a large-scale high-performance-computing (HPC) cluster exceeds the capacity of its surrounding infrastructures, the cluster’s power consumption needs to be capped to avoid hardware damage. However, power capping often causes a computational performance loss because the underlying processors are clocked down. In this work, we developed an operation-aware management strategy, called OAPM, to mitigate the performance loss. OAPM manages performance under a power cap dynamically at runtime by modifying the core and uncore clock rate. Using this approach, the limited power budget can be shifted effectively and optimally among components within a processor. The components with high computational activities are powered up while the others are throttled. The overall execution performance is improved. Employing the OAPM on diverse HPC benchmarks and real-world applications, we observed that the hardware settings adjusted by OAPM have near-optimal results compared to the optimal setting of a static approach. The achieved speedup in our work amounts to up to 6.3%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://doc.itc.rwth-aachen.de/display/CC/Home.
2.
The FLOPS benchmark has a cycles per instructions (CPI) of 0.8 while the CPI of the TRIAD benchmark amounts to 15.3 on our platform with 12 threads.

References

Auweter, A., et al.: A case study of energy aware scheduling on superMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_25
Chapter Google Scholar
Bekele, S.A., Balakrishnan, M., Kumar, A.: Ml guided energy-performance trade-off estimation for uncore frequency scaling. In: 2019 Spring Simulation Conference (SpringSim), pp. 1–12. IEEE (2019). https://doi.org/10.23919/SpringSim/2019.8732878
Benoit, A., et al.: Shutdown policies with power capping for large scale computing systems. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 134–146. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64203-1_10
Chapter Google Scholar
Bhalachandra, S., Porterfield, A., Prins, J.F.: Using dynamic duty cycle modulation to improve energy efficiency in high performance computing. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 911–918. IEEE (2015). https://doi.org/10.1109/IPDPSW.2015.144
Burton, E.A., et al.: FIVR—fully integrated voltage regulators on 4th generation intel R coreTM soCs. In: 2014 IEEE Applied Power Electronics Conference and Exposition-APEC 2014, pp. 432–439. IEEE (2014). https://doi.org/10.1109/APEC.2014.6803344
Choi, K., Soma, K., Pedram, M.: Fine-grained DVFS for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times. In: Proceedings of DATE, pp. 4–9 (2004). https://doi.org/10.1109/TCAD.2004.839485
David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C.: RAPL: memory power estimation and capping. In: Proceedings of the 16th ACM/IEEE International Symposium on Low power Electronics and Design, pp. 189–194. ACM (2010). https://doi.org/10.1145/1840845.1840883
Eichenberger, A.E., et al.: OMPT: an OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_13
Chapter Google Scholar
Ellsworth, D., et al.: Simulating power scheduling at scale. In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 2. ACM (2017). https://doi.org/10.1145/3149412.3149414
Gholkar, N., Mueller, F., Rountree, B.: Uncore power scavenger: a runtime for uncore power conservation on HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–23 (2019). https://doi.org/10.1145/3295500.3356150
Gholkar, N., Mueller, F., Rountree, B., Marathe, A.: PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, pp. 106–117. ACM (2018). https://doi.org/10.1145/3208040.3208047
Hackenberg, D., et al.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 194–204. IEEE (2013). https://doi.org/10.1109/ISPASS.2013.6557170
Hackenberg, D., et al.: An energy efficiency feature survey of the intel Haswell processor. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 896–904. IEEE (2015). https://doi.org/10.1109/IPDPSW.2015.70
Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multi-core chips via simple machine models. Concurrency Comput. Prac. Experience 28(2), 189–210 (2016). https://doi.org/10.1002/cpe.3180
Article Google Scholar
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. ACM SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012). https://doi.org/10.1145/2425248.2425252
Article Google Scholar
Hill, D.L., et al.: The uncore: a modular approach to feeding the high-performance cores. Intel Technol. J. 14(3), 30 (2010)
Google Scholar
Hoffmann, G.R., Swarztrauber, P., Sweet, R.: Aspects of using multiprocessors for meteorological modelling. In: Hoffmann, G.R., Swarztrauber, P., Sweet, R. (eds.) Multiprocessing in Meteorological Models, pp. 125–196. Springer, Berlin (1988). https://doi.org/10.1007/978-3-642-83248-2_10
Chapter Google Scholar
Horvath, T., Abdelzaher, T., Skadron, K., Liu, X.: Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Trans. Comput. 56(4), 444–458 (2007). https://doi.org/10.1109/TC.2007.1003
Article MathSciNet Google Scholar
Isci, C., et al.: An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget. In: 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2006), pp. 347–358. IEEE (2006). https://doi.org/10.1109/MICRO.2006.8
Jackson Marusarz, D.R.: Top-down microarchitecture analysis method. https://software.intel.com/en-us/vtune-cookbook-top-down-microarchitecture-analysis-method. Accessed Jan 2020
Kontorinis, V., et al.: Managing distributed ups energy for effective power capping in data centers. In: ACM SIGARCH Computer Architecture News, vol. 40, pp. 488–499, Sept 2012. https://doi.org/10.1109/ISCA.2012.6237042
Kremenetsky, M., Raefsky, A., Reinhardt, S.: Poor scalability of parallel shared memory model: myth or reality? In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 657–666. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44864-0_68
Chapter Google Scholar
Lefurgy, C., Wang, X., Ware, M.: Power capping: a prelude to power shifting. Cluster Comput. 11(2), 183–195 (2008). https://doi.org/10.1007/s10586-007-0045-4
Article Google Scholar
Patki, T., et al.: Practical resource management in power-constrained, high performance computing. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 121–132 (2015). https://doi.org/10.1145/2749246.2749262
Rountree, B., et al.: A first look at performance under a hardware-enforced power bound. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 947–953. IEEE (2012). https://doi.org/10.1109/IPDPSW.2012.116
Sadourny, R.: The dynamics of finite-difference models of the shallow-water equations. J. Atmos. Sci. 32(4), 680–689 (1975). https://doi.org/10.1175/1520-0469(1975)032<0680:TDOFDM>2.0.CO;2
Article Google Scholar
Sarood, O., Langer, A., Gupta, A., Kale, L.: Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 807–818. IEEE (2014). https://doi.org/10.1109/SC.2014.71
Stantchev, G., Dorland, W., Gumerov, N.: Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. J. Parallel Distribut. Comput. 68(10), 1339–1349 (2008). https://doi.org/10.1016/j.jpdc.2008.05.009
Article Google Scholar
Sundriyal, V., et al.: Comparisons of core and uncore frequency scaling modes in quantum chemistry application GAMESS. In: Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, p. 13 (2018). https://doi.org/10.13140/RG.2.2.15809.45923
Sundriyal, V., Sosonkina, M., Westheimer, B.M., Gordon, M.: Uncore frequency scaling vs dynamic voltage and frequency scaling: a quantitative comparison. Soc. Model. Simul. Int. SpringSim-HPC, Baltimore, MD, USA (2018)
Google Scholar
Wang, B., et al.: Dynamic application-aware power capping. In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 1. ACM (2017). https://doi.org/10.1145/3149412.3149413
Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Article Google Scholar
Yasin, A.: A top-down method for performance analysis and counters architecture. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 35–44. IEEE (2014). https://doi.org/10.1109/ISPASS.2014.6844459
Ye, W., Silva, F., Heidemann, J.: Ultra-low duty cycle mac with scheduled channel polling. In: Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, pp. 321–334 (2006). https://doi.org/10.1145/1182807.1182839
Zhang, H., Hoffman, H.: A quantitative evaluation of the RAPL power control system. Feedback Comput. (2015)
Google Scholar
Zhang, H., Hoffmann, H.: Maximizing performance under a power cap: a comparison of hardware, software, and hybrid techniques. ACM SIGPLAN Not. 51(4), 545–559 (2016). https://doi.org/10.1145/2872362.2872375
Article Google Scholar

Download references

Author information

Authors and Affiliations

Chair for HPC, RWTH Aachen University, Aachen, Germany
Bo Wang, Julian Miller, Christian Terboven & Matthias Müller

Authors

Bo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Julian Miller
View author publications
You can also search for this author in PubMed Google Scholar
Christian Terboven
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wang .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Malawski
University of Warsaw, Warsaw, Poland
Krzysztof Rzadca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B., Miller, J., Terboven, C., Müller, M. (2020). Operation-Aware Power Capping. In: Malawski, M., Rzadca, K. (eds) Euro-Par 2020: Parallel Processing. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12247. Springer, Cham. https://doi.org/10.1007/978-3-030-57675-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-57675-2_5
Published: 18 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57674-5
Online ISBN: 978-3-030-57675-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics