Abstract
Production high-performance computing systems continue to grow in complexity and size. As applications struggle to make use of increasingly heterogeneous compute nodes, maintaining high efficiency (performance per watt) for the whole platform becomes a challenge. Alongside the growing complexity of scientific workloads, this extreme heterogeneity is also an opportunity: as applications dynamically undergo variations in workload, due to phases or data/compute movement between devices, one can dynamically adjust power across compute elements to save energy without impacting performance. With an aim toward an autonomous and dynamic power management strategy for current and future HPC architectures, this paper explores the use of control theory for the design of a dynamic power regulation method. Structured as a feedback loop, our approach—which is novel in computing resource management—consists of periodically monitoring application progress and choosing at runtime a suitable power cap for processors. Thanks to a preliminary offline identification process, we derive a model of the dynamics of the system and a proportional-integral (PI) controller. We evaluate our approach on top of an existing resource management framework, the Argo Node Resource Manager, deployed on several clusters of Grid’5000, using a standard memory-bound HPC benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
https://www.grid5000.fr/w/Hardware with reference API version 9925e0598.
- 2.
Available at https://xgitlab.cels.anl.gov/argo/hnrm.
References
Abdelzaher, T., et al.: Introduction to control theory and its application to computing systems. In: Performance Modeling and Engineering, pp. 185–215. Springer (2008). https://doi.org/10.1007/978-0-387-79361-0_7
Albers, S.: Algorithms for dynamic speed scaling. In: STACS. LIPIcs, vol. 9, pp. 1–11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2011). https://doi.org/10.4230/LIPIcs.STACS.2011.1
Åström, K.J., Hägglund, T.: PID Controllers: Theory, Design, and Tuning. International Society of Automation, second edn. (1995)
Bhalachandra, S., et al.: Using dynamic duty cycle modulation to improve energy efficiency in high performance computing. In: IPDPS Workshops, pp. 911–918. IEEE, May 2015. https://doi.org/10.1109/IPDPSW.2015.144
Cerf, S., et al.: Artifact and instructions to generate experimental results for the Euro-Par 2021 paper: Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach, August 2021. https://doi.org/10.6084/m9.figshare.14754468
David, H., et al.: RAPL: memory power estimation and capping. In: ISLPED, pp. 189–194. ACM (2010). https://doi.org/10.1145/1840845.1840883
Desrochers, S., et al.: A validation of DRAM RAPL power measurements. In: MEMSYS, pp. 455–470. ACM, October 2016. https://doi.org/10.1145/2989081.2989088
Dolstra, E., et al.: Nix: a safe and policy-free system for software deployment. In: LISA, pp. 79–92. USENIX (2004). http://www.usenix.org/publications/library/proceedings/lisa04/tech/dolstra.html
Dutot, P., et al.: Towards energy budget control in HPC. In: CCGrid, pp. 381–390. IEEE/ACM, May 2017. https://doi.org/10.1109/CCGRID.2017.16
Eastep, J., et al.: Global extensible open power manager: a vehicle for HPC community collaboration on co-designed energy management solutions. In: ISC. Lecture Notes in Computer Science, vol. 10266, pp. 394–412. Springer, June 2017. https://doi.org/10.1007/978-3-319-58667-0_21
Filieri, A., et al.: Control strategies for self-adaptive software systems. ACM Trans. Auton. Adapt. Syst. 11(4), 24:1–24:31, February 2017. https://doi.org/10.1145/3024188
Hellerstein, J.L., et al.: Feedback control of computing systems. Wiley, Hoboken (2004). https://doi.org/10.1002/047166880X
Imes, C., et al.: POET: a portable approach to minimizing energy under soft real-time constraints. In: RTAS, pp. 75–86. IEEE, April 2015. https://doi.org/10.1109/RTAS.2015.7108419
Imes, C., et al.: CoPPer: soft real-time application performance using hardware power capping. In: ICAC, pp. 31–41. IEEE, June 2019. https://doi.org/10.1109/ICAC.2019.00015
Levine, W.S.: The Control Handbook (three volume set). CRC Press, Boca Raton, second edn. (2011). https://doi.org/10.1201/9781315218694
Lo, D., et al.: Towards energy proportionality for large-scale latency-critical workloads. In: ISCA, pp. 301–312. IEEE, June 2014. https://doi.org/10.1109/ISCA.2014.6853237
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Committee Comput. Archit. (TCCA) Newsl. 2, 19–25 (1995)
Montgomery, D.C., Runger, G.C.: Applied Statistics and Probability for Engineers. Wiley, Hoboken, seventh edn. January 2018
Orgerie, A., et al.: Save watts in your grid: green strategies for energy-aware framework in large scale distributed systems. In: ICPADS, pp. 171–178. IEEE, December 2008. https://doi.org/10.1109/ICPADS.2008.97
Petoumenos, P., et al.: Power capping: what works, what does not. In: ICPADS, pp. 525–534. IEEE, December 2015. https://doi.org/10.1109/ICPADS.2015.72
Ramesh, S., et al.: Understanding the impact of dynamic power capping on application progress. In: IPDPS, pp. 793–804. IEEE, May 2019. https://doi.org/10.1109/IPDPS.2019.00088
Reis, V., et al.: Argo Node Resource Manager. https://www.mcs.anl.gov/research/projects/argo/overview/nrm/ (2021)
Rotem, E., et al.: Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32(2), 20–27 (2012). https://doi.org/10.1109/MM.2012.12
Rountree, B., et al.: Beyond DVFS: a first look at performance under a hardware-enforced power bound. In: IPDPS Workshops, pp. 947–953. IEEE (2012). https://doi.org/10.1109/IPDPSW.2012.116
Rutten, É., et al.: Feedback control as MAPE-K loop in autonomic computing. In: Software Engineering for Self-Adaptive Systems. Lecture Notes in Computer Science, vol. 9640, pp. 349–373. Springer (2017). https://doi.org/10.1007/978-3-319-74183-3_12
Santriaji, M.H., Hoffmann, H.: GRAPE: minimizing energy for GPU applications with performance requirements. In: MICRO, pp. 16:1–16:13. IEEE, October 2016. https://doi.org/10.1109/MICRO.2016.7783719
Stahl, E., et al.: Towards a control-theory approach for minimizing unused grid resources. In: AI-Science@HPDC, pp. 4:1–4:8. ACM (2018). https://doi.org/10.1145/3217197.3217201
Zhou, Y., et al.: CASH: supporting IaaS customers with a sub-core configurable architecture. In: ISCA, pp. 682–694. IEEE, June 2016. https://doi.org/10.1109/ISCA.2016.65
Acknowledgments and Data Availability Statement
Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). Argonne National Laboratory’s work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computer Research, under Contract DE-AC02-06CH11357. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research is partially supported by the NCSA-Inria-ANL-BSC-JSC-Riken-UTK Joint-Laboratory for Extreme Scale Computing (JLESC, https://jlesc.github.io/).
The datasets and code generated and analyzed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.14754468 [5].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cerf, S., Bleuse, R., Reis, V., Perarnau, S., Rutten, É. (2021). Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-85665-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)