Skip to main content

Energy Accounting and Control with SLURM Resource and Job Management System

  • Conference paper
Distributed Computing and Networking (ICDCN 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8314))

Included in the following conference series:

Abstract

Energy consumption has gradually become a very important parameter in High Performance Computing platforms. The Resource and Job Management System (RJMS) is the HPC middleware that is responsible for distributing computing power to applications and has knowledge of both the underlying resources and jobs needs. Therefore it is the best candidate for monitoring and controlling the energy consumption of the computations according to the job specifications. The integration of energy measurment mechanisms on RJMS and the consideration of energy consumption as a new characteristic in accounting seemed primordial at this time when energy has become a bottleneck to scalability. Since Power-Meters would be too expensive, other existing measurement models such as IPMI and RAPL can be exploited by the RJMS in order to track energy consumption and enhance the monitoring of the executions with energy considerations.

In this paper we present the design and implementation of a new framework, developed upon SLURM Resource and Job Management System, which allows energy accounting per job with power profiling capabilities along with parameters for energy control features based on static frequency scaling of the CPUs. Since the goal of this work is the deployment of the framework on large petaflopic clusters such as CURIE, its cost and reliability are important issues. We evaluate the overhead of the design choices and the precision of the monitoring modes using different HPC benchmarks (Linpack, IMB, Stream) on a real-scale platform with integrated Power-meters. Our experiments show that the overhead is less than 0.6% in energy consumption and less than 0.2% in execution time while the error deviation compared to Power-meters less than 2% in most cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: Simple Linux utility for resource management. In: Feitelson, D.G., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2003. LNCS, vol. 2862, pp. 44–60. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Top500 supercomputer sites, http://www.top500.org/

  3. Assun¸ão, M., Gelas, J.P., Lefèvre, L., Orgerie, A.C.: The green grid’5000: Instrumenting and using a grid with energy sensors. In: Remote Instrumentation for eScience and Related Aspects, pp. 25–42. Springer, New York (2012)

    Chapter  Google Scholar 

  4. James, L., David, D., Phi, P.: Powerinsight - a commodity power measurement capability. In: The Third International Workshop on Power Measurement and Profiling (2013)

    Google Scholar 

  5. Ge, R., Feng, X., Song, S., Chang, H.C., Li, D., Cameron, K.: Powerpack: Energy profiling and analysis of high-performance systems and applications. IEEE Transactions on Parallel and Distributed Systems 21(5), 658–671 (2010)

    Article  Google Scholar 

  6. Intel: (intelligent platform management interface specification v2.0)

    Google Scholar 

  7. Hackenberg, D., Ilsche, T., Schone, R., Molka, D., Schmidt, M., Nagel, W.E.: Power measurement techniques on standard compute nodes: A quantitative comparison. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 194–204 (2013)

    Google Scholar 

  8. Rotem, E., Naveh, A., Ananthakrishnan, A., Weissmann, E., Rajwan, D.: Power-management architecture of the intel microarchitecture code-named sandy bridge. IEEE Micro 32(2), 20–27 (2012)

    Article  Google Scholar 

  9. Dongarra, J., Ltaief, H., Luszczek, P., Weaver, V.: Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures. In: 2012 Second International Conference on Cloud and Green Computing (CGC), pp. 274–281 (2012)

    Google Scholar 

  10. Hähnel, M., Döbel, B., Völp, M., Härtig, H.: Measuring energy consumption for short code paths using rapl. SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)

    Article  Google Scholar 

  11. Weaver, V.M., Johnson, M., Kasichayanula, K., Ralph, J., Luszczek, P., Terpstra, D., Moore, S.: Measuring energy and power with papi. In: ICPP Workshops, pp. 262–268 (2012)

    Google Scholar 

  12. Goel, B., McKee, S., Gioiosa, R., Singh, K., Bhadauria, M., Cesati, M.: Portable, scalable, per-core power estimation for intelligent resource management. In: 2010 International Green Computing Conference, pp. 135–146 (2010)

    Google Scholar 

  13. Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W.E., Wolf, F.: Open trace format 2: The next generation of scalable trace formats and support libraries. In: Bosschere, K.D., D’Hollander, E.H., Joubert, G.R., Padua, D.A., Peters, F.J., Sawyer, M. (eds.) PARCO. Advances in Parallel Computing, vol. 22, pp. 481–490. IOS Press (2011)

    Google Scholar 

  14. Folk, M., Cheng, A., Yates, K.: HDF5: A file format and i/o library for high performance computing applications. In: Proceedings of Supercomputing 1999 (CD-ROM), Portland, OR. ACM SIGARCH and IEEE (1999)

    Google Scholar 

  15. Biddiscombe, J., Soumagne, J., Oger, G., Guibert, D., Piccinali, J.G.: Parallel computational steering for HPC applications using HDF5 files in distributed shared memory. IEEE Transactions on Visualization and Computer Graphics 18(6), 852–864 (2012)

    Article  Google Scholar 

  16. Hennecke, M., Frings, W., Homberg, W., Zitz, A., Knobloch, M., Böttiger, H.: Measuring power consumption on ibm blue gene/p. Computer Science - Research and Development 27(4), 329–336 (2012)

    Article  Google Scholar 

  17. Rountree, B., Lownenthal, D.K., de Supinski, B.R., Schulz, M., Freeh, V.W., Bletsch, T.: Adagio: making dvs practical for complex hpc applications. In: ICS 2009: Proceedings of the 23rd International Conference on Supercomputing, pp. 460–469. ACM, New York (2009)

    Google Scholar 

  18. Huang, S., Feng, W.: Energy-efficient cluster computing via accurate workload characterization. In: CCGRID 2009: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 68–75. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  19. Lim, M.Y., Freeh, V.W., Lowenthal, D.K.: Adaptive, transparent frequency and voltage scaling of communication phases in mpi programs. In: SC 2006: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 107. ACM, New York (2006)

    Google Scholar 

  20. Costa, G.D., de Assunção, M.D., Gelas, J.P., Georgiou, Y., Lefèvre, L., Orgerie, A.C., Pierson, J.M., Richard, O., Sayah, A.: Multi-facet approach to reduce energy consumption in clouds and grids: the green-net framework. In: e-Energy, pp. 95–104 (2010)

    Google Scholar 

  21. Bolze, R., Cappello, F., Caron, E., Daydé, M., Desprez, F., Jeannot, E., Jégou, Y., Lantéri, S., Leduc, J., Melab, N., Mornet, G., Namyst, R., Primet, P., Quetier, B., Richard, O., Talbi, I.G., Iréa, T.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. Int. Journal of High Performance Computing Applications 20(4), 481–494 (2006)

    Article  Google Scholar 

  22. Patterson, M.K., Poole, S.W., Hsu, C.-H., Maxwell, D., Tschudi, W., Coles, H., Martinez, D.J., Bates, N.: TUE, a new energy-efficiency metric applied at ORNL’s jaguar. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 372–382. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Georgiou, Y., Cadeau, T., Glesser, D., Auble, D., Jette, M., Hautreux, M. (2014). Energy Accounting and Control with SLURM Resource and Job Management System. In: Chatterjee, M., Cao, Jn., Kothapalli, K., Rajsbaum, S. (eds) Distributed Computing and Networking. ICDCN 2014. Lecture Notes in Computer Science, vol 8314. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45249-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45249-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45248-2

  • Online ISBN: 978-3-642-45249-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics