Abstract
Once the peak power draw of a large-scale high-performance-computing (HPC) cluster exceeds the capacity of its surrounding infrastructures, the cluster’s power consumption needs to be capped to avoid hardware damage. However, power capping often causes a computational performance loss because the underlying processors are clocked down. In this work, we developed an operation-aware management strategy, called OAPM, to mitigate the performance loss. OAPM manages performance under a power cap dynamically at runtime by modifying the core and uncore clock rate. Using this approach, the limited power budget can be shifted effectively and optimally among components within a processor. The components with high computational activities are powered up while the others are throttled. The overall execution performance is improved. Employing the OAPM on diverse HPC benchmarks and real-world applications, we observed that the hardware settings adjusted by OAPM have near-optimal results compared to the optimal setting of a static approach. The achieved speedup in our work amounts to up to 6.3%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The FLOPS benchmark has a cycles per instructions (CPI) of 0.8 while the CPI of the TRIAD benchmark amounts to 15.3 on our platform with 12 threads.
References
Auweter, A., et al.: A case study of energy aware scheduling on superMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_25
Bekele, S.A., Balakrishnan, M., Kumar, A.: Ml guided energy-performance trade-off estimation for uncore frequency scaling. In: 2019 Spring Simulation Conference (SpringSim), pp. 1–12. IEEE (2019). https://doi.org/10.23919/SpringSim/2019.8732878
Benoit, A., et al.: Shutdown policies with power capping for large scale computing systems. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 134–146. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64203-1_10
Bhalachandra, S., Porterfield, A., Prins, J.F.: Using dynamic duty cycle modulation to improve energy efficiency in high performance computing. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 911–918. IEEE (2015). https://doi.org/10.1109/IPDPSW.2015.144
Burton, E.A., et al.: FIVR—fully integrated voltage regulators on 4th generation intel R coreTM soCs. In: 2014 IEEE Applied Power Electronics Conference and Exposition-APEC 2014, pp. 432–439. IEEE (2014). https://doi.org/10.1109/APEC.2014.6803344
Choi, K., Soma, K., Pedram, M.: Fine-grained DVFS for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times. In: Proceedings of DATE, pp. 4–9 (2004). https://doi.org/10.1109/TCAD.2004.839485
David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C.: RAPL: memory power estimation and capping. In: Proceedings of the 16th ACM/IEEE International Symposium on Low power Electronics and Design, pp. 189–194. ACM (2010). https://doi.org/10.1145/1840845.1840883
Eichenberger, A.E., et al.: OMPT: an OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_13
Ellsworth, D., et al.: Simulating power scheduling at scale. In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 2. ACM (2017). https://doi.org/10.1145/3149412.3149414
Gholkar, N., Mueller, F., Rountree, B.: Uncore power scavenger: a runtime for uncore power conservation on HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–23 (2019). https://doi.org/10.1145/3295500.3356150
Gholkar, N., Mueller, F., Rountree, B., Marathe, A.: PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, pp. 106–117. ACM (2018). https://doi.org/10.1145/3208040.3208047
Hackenberg, D., et al.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 194–204. IEEE (2013). https://doi.org/10.1109/ISPASS.2013.6557170
Hackenberg, D., et al.: An energy efficiency feature survey of the intel Haswell processor. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 896–904. IEEE (2015). https://doi.org/10.1109/IPDPSW.2015.70
Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multi-core chips via simple machine models. Concurrency Comput. Prac. Experience 28(2), 189–210 (2016). https://doi.org/10.1002/cpe.3180
Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. ACM SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012). https://doi.org/10.1145/2425248.2425252
Hill, D.L., et al.: The uncore: a modular approach to feeding the high-performance cores. Intel Technol. J. 14(3), 30 (2010)
Hoffmann, G.R., Swarztrauber, P., Sweet, R.: Aspects of using multiprocessors for meteorological modelling. In: Hoffmann, G.R., Swarztrauber, P., Sweet, R. (eds.) Multiprocessing in Meteorological Models, pp. 125–196. Springer, Berlin (1988). https://doi.org/10.1007/978-3-642-83248-2_10
Horvath, T., Abdelzaher, T., Skadron, K., Liu, X.: Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Trans. Comput. 56(4), 444–458 (2007). https://doi.org/10.1109/TC.2007.1003
Isci, C., et al.: An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget. In: 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2006), pp. 347–358. IEEE (2006). https://doi.org/10.1109/MICRO.2006.8
Jackson Marusarz, D.R.: Top-down microarchitecture analysis method. https://software.intel.com/en-us/vtune-cookbook-top-down-microarchitecture-analysis-method. Accessed Jan 2020
Kontorinis, V., et al.: Managing distributed ups energy for effective power capping in data centers. In: ACM SIGARCH Computer Architecture News, vol. 40, pp. 488–499, Sept 2012. https://doi.org/10.1109/ISCA.2012.6237042
Kremenetsky, M., Raefsky, A., Reinhardt, S.: Poor scalability of parallel shared memory model: myth or reality? In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 657–666. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44864-0_68
Lefurgy, C., Wang, X., Ware, M.: Power capping: a prelude to power shifting. Cluster Comput. 11(2), 183–195 (2008). https://doi.org/10.1007/s10586-007-0045-4
Patki, T., et al.: Practical resource management in power-constrained, high performance computing. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 121–132 (2015). https://doi.org/10.1145/2749246.2749262
Rountree, B., et al.: A first look at performance under a hardware-enforced power bound. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 947–953. IEEE (2012). https://doi.org/10.1109/IPDPSW.2012.116
Sadourny, R.: The dynamics of finite-difference models of the shallow-water equations. J. Atmos. Sci. 32(4), 680–689 (1975). https://doi.org/10.1175/1520-0469(1975)032<0680:TDOFDM>2.0.CO;2
Sarood, O., Langer, A., Gupta, A., Kale, L.: Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 807–818. IEEE (2014). https://doi.org/10.1109/SC.2014.71
Stantchev, G., Dorland, W., Gumerov, N.: Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. J. Parallel Distribut. Comput. 68(10), 1339–1349 (2008). https://doi.org/10.1016/j.jpdc.2008.05.009
Sundriyal, V., et al.: Comparisons of core and uncore frequency scaling modes in quantum chemistry application GAMESS. In: Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, p. 13 (2018). https://doi.org/10.13140/RG.2.2.15809.45923
Sundriyal, V., Sosonkina, M., Westheimer, B.M., Gordon, M.: Uncore frequency scaling vs dynamic voltage and frequency scaling: a quantitative comparison. Soc. Model. Simul. Int. SpringSim-HPC, Baltimore, MD, USA (2018)
Wang, B., et al.: Dynamic application-aware power capping. In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 1. ACM (2017). https://doi.org/10.1145/3149412.3149413
Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Yasin, A.: A top-down method for performance analysis and counters architecture. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 35–44. IEEE (2014). https://doi.org/10.1109/ISPASS.2014.6844459
Ye, W., Silva, F., Heidemann, J.: Ultra-low duty cycle mac with scheduled channel polling. In: Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, pp. 321–334 (2006). https://doi.org/10.1145/1182807.1182839
Zhang, H., Hoffman, H.: A quantitative evaluation of the RAPL power control system. Feedback Comput. (2015)
Zhang, H., Hoffmann, H.: Maximizing performance under a power cap: a comparison of hardware, software, and hybrid techniques. ACM SIGPLAN Not. 51(4), 545–559 (2016). https://doi.org/10.1145/2872362.2872375
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, B., Miller, J., Terboven, C., Müller, M. (2020). Operation-Aware Power Capping. In: Malawski, M., Rzadca, K. (eds) Euro-Par 2020: Parallel Processing. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12247. Springer, Cham. https://doi.org/10.1007/978-3-030-57675-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-57675-2_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-57674-5
Online ISBN: 978-3-030-57675-2
eBook Packages: Computer ScienceComputer Science (R0)