Skip to main content

Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach

  • Conference paper
  • First Online:
Euro-Par 2021: Parallel Processing (Euro-Par 2021)

Abstract

Production high-performance computing systems continue to grow in complexity and size. As applications struggle to make use of increasingly heterogeneous compute nodes, maintaining high efficiency (performance per watt) for the whole platform becomes a challenge. Alongside the growing complexity of scientific workloads, this extreme heterogeneity is also an opportunity: as applications dynamically undergo variations in workload, due to phases or data/compute movement between devices, one can dynamically adjust power across compute elements to save energy without impacting performance. With an aim toward an autonomous and dynamic power management strategy for current and future HPC architectures, this paper explores the use of control theory for the design of a dynamic power regulation method. Structured as a feedback loop, our approach—which is novel in computing resource management—consists of periodically monitoring application progress and choosing at runtime a suitable power cap for processors. Thanks to a preliminary offline identification process, we derive a model of the dynamics of the system and a proportional-integral (PI) controller. We evaluate our approach on top of an existing resource management framework, the Argo Node Resource Manager, deployed on several clusters of Grid’5000, using a standard memory-bound HPC benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.grid5000.fr/w/Hardware with reference API version 9925e0598.

  2. 2.

    Available at https://xgitlab.cels.anl.gov/argo/hnrm.

References

  1. Abdelzaher, T., et al.: Introduction to control theory and its application to computing systems. In: Performance Modeling and Engineering, pp. 185–215. Springer (2008). https://doi.org/10.1007/978-0-387-79361-0_7

  2. Albers, S.: Algorithms for dynamic speed scaling. In: STACS. LIPIcs, vol. 9, pp. 1–11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2011). https://doi.org/10.4230/LIPIcs.STACS.2011.1

  3. Åström, K.J., Hägglund, T.: PID Controllers: Theory, Design, and Tuning. International Society of Automation, second edn. (1995)

    Google Scholar 

  4. Bhalachandra, S., et al.: Using dynamic duty cycle modulation to improve energy efficiency in high performance computing. In: IPDPS Workshops, pp. 911–918. IEEE, May 2015. https://doi.org/10.1109/IPDPSW.2015.144

  5. Cerf, S., et al.: Artifact and instructions to generate experimental results for the Euro-Par 2021 paper: Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach, August 2021. https://doi.org/10.6084/m9.figshare.14754468

  6. David, H., et al.: RAPL: memory power estimation and capping. In: ISLPED, pp. 189–194. ACM (2010). https://doi.org/10.1145/1840845.1840883

  7. Desrochers, S., et al.: A validation of DRAM RAPL power measurements. In: MEMSYS, pp. 455–470. ACM, October 2016. https://doi.org/10.1145/2989081.2989088

  8. Dolstra, E., et al.: Nix: a safe and policy-free system for software deployment. In: LISA, pp. 79–92. USENIX (2004). http://www.usenix.org/publications/library/proceedings/lisa04/tech/dolstra.html

  9. Dutot, P., et al.: Towards energy budget control in HPC. In: CCGrid, pp. 381–390. IEEE/ACM, May 2017. https://doi.org/10.1109/CCGRID.2017.16

  10. Eastep, J., et al.: Global extensible open power manager: a vehicle for HPC community collaboration on co-designed energy management solutions. In: ISC. Lecture Notes in Computer Science, vol. 10266, pp. 394–412. Springer, June 2017. https://doi.org/10.1007/978-3-319-58667-0_21

  11. Filieri, A., et al.: Control strategies for self-adaptive software systems. ACM Trans. Auton. Adapt. Syst. 11(4), 24:1–24:31, February 2017. https://doi.org/10.1145/3024188

  12. Hellerstein, J.L., et al.: Feedback control of computing systems. Wiley, Hoboken (2004). https://doi.org/10.1002/047166880X

  13. Imes, C., et al.: POET: a portable approach to minimizing energy under soft real-time constraints. In: RTAS, pp. 75–86. IEEE, April 2015. https://doi.org/10.1109/RTAS.2015.7108419

  14. Imes, C., et al.: CoPPer: soft real-time application performance using hardware power capping. In: ICAC, pp. 31–41. IEEE, June 2019. https://doi.org/10.1109/ICAC.2019.00015

  15. Levine, W.S.: The Control Handbook (three volume set). CRC Press, Boca Raton, second edn. (2011). https://doi.org/10.1201/9781315218694

  16. Lo, D., et al.: Towards energy proportionality for large-scale latency-critical workloads. In: ISCA, pp. 301–312. IEEE, June 2014. https://doi.org/10.1109/ISCA.2014.6853237

  17. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Committee Comput. Archit. (TCCA) Newsl. 2, 19–25 (1995)

    Google Scholar 

  18. Montgomery, D.C., Runger, G.C.: Applied Statistics and Probability for Engineers. Wiley, Hoboken, seventh edn. January 2018

    Google Scholar 

  19. Orgerie, A., et al.: Save watts in your grid: green strategies for energy-aware framework in large scale distributed systems. In: ICPADS, pp. 171–178. IEEE, December 2008. https://doi.org/10.1109/ICPADS.2008.97

  20. Petoumenos, P., et al.: Power capping: what works, what does not. In: ICPADS, pp. 525–534. IEEE, December 2015. https://doi.org/10.1109/ICPADS.2015.72

  21. Ramesh, S., et al.: Understanding the impact of dynamic power capping on application progress. In: IPDPS, pp. 793–804. IEEE, May 2019. https://doi.org/10.1109/IPDPS.2019.00088

  22. Reis, V., et al.: Argo Node Resource Manager. https://www.mcs.anl.gov/research/projects/argo/overview/nrm/ (2021)

  23. Rotem, E., et al.: Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro 32(2), 20–27 (2012). https://doi.org/10.1109/MM.2012.12

    Article  Google Scholar 

  24. Rountree, B., et al.: Beyond DVFS: a first look at performance under a hardware-enforced power bound. In: IPDPS Workshops, pp. 947–953. IEEE (2012). https://doi.org/10.1109/IPDPSW.2012.116

  25. Rutten, É., et al.: Feedback control as MAPE-K loop in autonomic computing. In: Software Engineering for Self-Adaptive Systems. Lecture Notes in Computer Science, vol. 9640, pp. 349–373. Springer (2017). https://doi.org/10.1007/978-3-319-74183-3_12

  26. Santriaji, M.H., Hoffmann, H.: GRAPE: minimizing energy for GPU applications with performance requirements. In: MICRO, pp. 16:1–16:13. IEEE, October 2016. https://doi.org/10.1109/MICRO.2016.7783719

  27. Stahl, E., et al.: Towards a control-theory approach for minimizing unused grid resources. In: AI-Science@HPDC, pp. 4:1–4:8. ACM (2018). https://doi.org/10.1145/3217197.3217201

  28. Zhou, Y., et al.: CASH: supporting IaaS customers with a sub-core configurable architecture. In: ISCA, pp. 682–694. IEEE, June 2016. https://doi.org/10.1109/ISCA.2016.65

Download references

Acknowledgments and Data Availability Statement

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr). Argonne National Laboratory’s work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computer Research, under Contract DE-AC02-06CH11357. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research is partially supported by the NCSA-Inria-ANL-BSC-JSC-Riken-UTK Joint-Laboratory for Extreme Scale Computing (JLESC, https://jlesc.github.io/).

The datasets and code generated and analyzed during the current study are available in the Figshare repository: https://doi.org/10.6084/m9.figshare.14754468 [5].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sophie Cerf .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cerf, S., Bleuse, R., Reis, V., Perarnau, S., Rutten, É. (2021). Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85665-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85664-9

  • Online ISBN: 978-3-030-85665-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics