Skip to main content

Operation-Aware Power Capping

  • Conference paper
  • First Online:
Euro-Par 2020: Parallel Processing (Euro-Par 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12247))

Included in the following conference series:

Abstract

Once the peak power draw of a large-scale high-performance-computing (HPC) cluster exceeds the capacity of its surrounding infrastructures, the cluster’s power consumption needs to be capped to avoid hardware damage. However, power capping often causes a computational performance loss because the underlying processors are clocked down. In this work, we developed an operation-aware management strategy, called OAPM, to mitigate the performance loss. OAPM manages performance under a power cap dynamically at runtime by modifying the core and uncore clock rate. Using this approach, the limited power budget can be shifted effectively and optimally among components within a processor. The components with high computational activities are powered up while the others are throttled. The overall execution performance is improved. Employing the OAPM on diverse HPC benchmarks and real-world applications, we observed that the hardware settings adjusted by OAPM have near-optimal results compared to the optimal setting of a static approach. The achieved speedup in our work amounts to up to 6.3%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://doc.itc.rwth-aachen.de/display/CC/Home.

  2. 2.

    The FLOPS benchmark has a cycles per instructions (CPI) of 0.8 while the CPI of the TRIAD benchmark amounts to 15.3 on our platform with 12 threads.

References

  1. Auweter, A., et al.: A case study of energy aware scheduling on superMUC. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 394–409. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_25

    Chapter  Google Scholar 

  2. Bekele, S.A., Balakrishnan, M., Kumar, A.: Ml guided energy-performance trade-off estimation for uncore frequency scaling. In: 2019 Spring Simulation Conference (SpringSim), pp. 1–12. IEEE (2019). https://doi.org/10.23919/SpringSim/2019.8732878

  3. Benoit, A., et al.: Shutdown policies with power capping for large scale computing systems. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 134–146. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64203-1_10

    Chapter  Google Scholar 

  4. Bhalachandra, S., Porterfield, A., Prins, J.F.: Using dynamic duty cycle modulation to improve energy efficiency in high performance computing. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 911–918. IEEE (2015). https://doi.org/10.1109/IPDPSW.2015.144

  5. Burton, E.A., et al.: FIVR—fully integrated voltage regulators on 4th generation intel R coreTM soCs. In: 2014 IEEE Applied Power Electronics Conference and Exposition-APEC 2014, pp. 432–439. IEEE (2014). https://doi.org/10.1109/APEC.2014.6803344

  6. Choi, K., Soma, K., Pedram, M.: Fine-grained DVFS for precise energy and performance trade-off based on the ratio of off-chip access to on-chip computation times. In: Proceedings of DATE, pp. 4–9 (2004). https://doi.org/10.1109/TCAD.2004.839485

  7. David, H., Gorbatov, E., Hanebutte, U.R., Khanna, R., Le, C.: RAPL: memory power estimation and capping. In: Proceedings of the 16th ACM/IEEE International Symposium on Low power Electronics and Design, pp. 189–194. ACM (2010). https://doi.org/10.1145/1840845.1840883

  8. Eichenberger, A.E., et al.: OMPT: an OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40698-0_13

    Chapter  Google Scholar 

  9. Ellsworth, D., et al.: Simulating power scheduling at scale. In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 2. ACM (2017). https://doi.org/10.1145/3149412.3149414

  10. Gholkar, N., Mueller, F., Rountree, B.: Uncore power scavenger: a runtime for uncore power conservation on HPC systems. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–23 (2019). https://doi.org/10.1145/3295500.3356150

  11. Gholkar, N., Mueller, F., Rountree, B., Marathe, A.: PShifter: feedback-based dynamic power shifting within HPC jobs for performance. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, pp. 106–117. ACM (2018). https://doi.org/10.1145/3208040.3208047

  12. Hackenberg, D., et al.: Power measurement techniques on standard compute nodes: a quantitative comparison. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 194–204. IEEE (2013). https://doi.org/10.1109/ISPASS.2013.6557170

  13. Hackenberg, D., et al.: An energy efficiency feature survey of the intel Haswell processor. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 896–904. IEEE (2015). https://doi.org/10.1109/IPDPSW.2015.70

  14. Hager, G., Treibig, J., Habich, J., Wellein, G.: Exploring performance and power properties of modern multi-core chips via simple machine models. Concurrency Comput. Prac. Experience 28(2), 189–210 (2016). https://doi.org/10.1002/cpe.3180

    Article  Google Scholar 

  15. Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using RAPL. ACM SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012). https://doi.org/10.1145/2425248.2425252

    Article  Google Scholar 

  16. Hill, D.L., et al.: The uncore: a modular approach to feeding the high-performance cores. Intel Technol. J. 14(3), 30 (2010)

    Google Scholar 

  17. Hoffmann, G.R., Swarztrauber, P., Sweet, R.: Aspects of using multiprocessors for meteorological modelling. In: Hoffmann, G.R., Swarztrauber, P., Sweet, R. (eds.) Multiprocessing in Meteorological Models, pp. 125–196. Springer, Berlin (1988). https://doi.org/10.1007/978-3-642-83248-2_10

    Chapter  Google Scholar 

  18. Horvath, T., Abdelzaher, T., Skadron, K., Liu, X.: Dynamic voltage scaling in multitier web servers with end-to-end delay control. IEEE Trans. Comput. 56(4), 444–458 (2007). https://doi.org/10.1109/TC.2007.1003

    Article  MathSciNet  Google Scholar 

  19. Isci, C., et al.: An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget. In: 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2006), pp. 347–358. IEEE (2006). https://doi.org/10.1109/MICRO.2006.8

  20. Jackson Marusarz, D.R.: Top-down microarchitecture analysis method. https://software.intel.com/en-us/vtune-cookbook-top-down-microarchitecture-analysis-method. Accessed Jan 2020

  21. Kontorinis, V., et al.: Managing distributed ups energy for effective power capping in data centers. In: ACM SIGARCH Computer Architecture News, vol. 40, pp. 488–499, Sept 2012. https://doi.org/10.1109/ISCA.2012.6237042

  22. Kremenetsky, M., Raefsky, A., Reinhardt, S.: Poor scalability of parallel shared memory model: myth or reality? In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds.) ICCS 2003. LNCS, vol. 2660, pp. 657–666. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44864-0_68

    Chapter  Google Scholar 

  23. Lefurgy, C., Wang, X., Ware, M.: Power capping: a prelude to power shifting. Cluster Comput. 11(2), 183–195 (2008). https://doi.org/10.1007/s10586-007-0045-4

    Article  Google Scholar 

  24. Patki, T., et al.: Practical resource management in power-constrained, high performance computing. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 121–132 (2015). https://doi.org/10.1145/2749246.2749262

  25. Rountree, B., et al.: A first look at performance under a hardware-enforced power bound. In: 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 947–953. IEEE (2012). https://doi.org/10.1109/IPDPSW.2012.116

  26. Sadourny, R.: The dynamics of finite-difference models of the shallow-water equations. J. Atmos. Sci. 32(4), 680–689 (1975). https://doi.org/10.1175/1520-0469(1975)032<0680:TDOFDM>2.0.CO;2

    Article  Google Scholar 

  27. Sarood, O., Langer, A., Gupta, A., Kale, L.: Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, pp. 807–818. IEEE (2014). https://doi.org/10.1109/SC.2014.71

  28. Stantchev, G., Dorland, W., Gumerov, N.: Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. J. Parallel Distribut. Comput. 68(10), 1339–1349 (2008). https://doi.org/10.1016/j.jpdc.2008.05.009

    Article  Google Scholar 

  29. Sundriyal, V., et al.: Comparisons of core and uncore frequency scaling modes in quantum chemistry application GAMESS. In: Proceedings of the High Performance Computing Symposium. Society for Computer Simulation International, p. 13 (2018). https://doi.org/10.13140/RG.2.2.15809.45923

  30. Sundriyal, V., Sosonkina, M., Westheimer, B.M., Gordon, M.: Uncore frequency scaling vs dynamic voltage and frequency scaling: a quantitative comparison. Soc. Model. Simul. Int. SpringSim-HPC, Baltimore, MD, USA (2018)

    Google Scholar 

  31. Wang, B., et al.: Dynamic application-aware power capping. In: Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, p. 1. ACM (2017). https://doi.org/10.1145/3149412.3149413

  32. Weaver, V.M.: Linux perf\_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, vol. 13 (2013)

    Google Scholar 

  33. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785

    Article  Google Scholar 

  34. Yasin, A.: A top-down method for performance analysis and counters architecture. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 35–44. IEEE (2014). https://doi.org/10.1109/ISPASS.2014.6844459

  35. Ye, W., Silva, F., Heidemann, J.: Ultra-low duty cycle mac with scheduled channel polling. In: Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, pp. 321–334 (2006). https://doi.org/10.1145/1182807.1182839

  36. Zhang, H., Hoffman, H.: A quantitative evaluation of the RAPL power control system. Feedback Comput. (2015)

    Google Scholar 

  37. Zhang, H., Hoffmann, H.: Maximizing performance under a power cap: a comparison of hardware, software, and hybrid techniques. ACM SIGPLAN Not. 51(4), 545–559 (2016). https://doi.org/10.1145/2872362.2872375

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, B., Miller, J., Terboven, C., Müller, M. (2020). Operation-Aware Power Capping. In: Malawski, M., Rzadca, K. (eds) Euro-Par 2020: Parallel Processing. Euro-Par 2020. Lecture Notes in Computer Science(), vol 12247. Springer, Cham. https://doi.org/10.1007/978-3-030-57675-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-57675-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-57674-5

  • Online ISBN: 978-3-030-57675-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics