Advertisement

Reducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling

  • Zhou Zhou
  • Zhiling Lan
  • Wei Tang
  • Narayan Desai
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8429)

Abstract

Energy expense is becoming increasingly dominant in the operating costs of high-performance computing (HPC) systems. At the same time, electricity prices vary significantly at different times of the day. Furthermore, job power profiles also differ greatly, especially on HPC systems. In this paper, we propose a smart, power-aware job scheduling approach for HPC systems based on variable energy prices and job power profiles. In particular, we propose a 0-1 knapsack model and demonstrate its flexibility and effectiveness for scheduling jobs, with the goal of reducing energy cost and not degrading system utilization. We design scheduling strategies for Blue Gene/P, a typical partition-based system. Experiments with both synthetic data and real job traces from production systems show that our power-aware job scheduling approach can reduce the energy cost significantly, up to 25 %, with only slight impact on system utilization.

Keywords

Energy Power-aware job scheduling Resource management Blue Gene HPC system 

Notes

Acknowledgment

This work was supported in part by the U.S. National Science Foundation grants CNS-0834514 and CNS-0720549 and in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research under contract DE-AC02-06CH1135. We thank Dr. Ioan Raicu for generously providing high-performance servers for our experiments.

References

  1. 1.
    Zhou, Z., Tang, W., Zheng, Z., Lan, Z., Desai, N.: Evaluating performance impacts of delayed failure repairing on large-scale systems. In: 2011 IEEE International Conference on Cluster Computing (CLUSTER), pp. 532–536 (2011)Google Scholar
  2. 2.
    Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hiller, J., Karp, S., Keckler, S., Klein, D., Lucas, R., Richards, M., Scarpelli, A., Scott, S., Snavely, A., Sterling, T., Williams, R.S., Yelick, K., Bergman, K., Borkar, S., Campbell, D., Carlson, W., Dally, W., Denneau, M., Franzon, P., Harrod, W., Hiller, J., Keckler, S., Klein, D., Kogge, P., Williams, R.S., Yelick, K.: Exascale computing study: technology challenges in achieving exascale systems (2008)Google Scholar
  3. 3.
    Patel, C., Sharma, R., Bash, C., Graupner, S.: Energy aware grid: global workload placement based on energy efficiency. In: Proceedings of IMECE (2003)Google Scholar
  4. 4.
    Goiri, I., Le, K., Haque, M., Beauchea, R., Nguyen, T., Guitart, J., Torres, J., Bianchini, R.: Greenslot: scheduling energy consumption in green datacenters. In: 2011 International Conference on High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–11 (2011)Google Scholar
  5. 5.
    Jossen, A., Garche, J., Sauer, D.U.: Operation conditions of batteries in PV applications. Sol. Energy 76, 759–769 (2004)CrossRefGoogle Scholar
  6. 6.
    Fan, X., Weber, W.-D., Barroso, L.A.: Power provisioning for a warehouse-sized computer. In: Proceedings of the 34th annual International Symposium on Computer Architecture, ISCA ’07, pp. 13–23. ACM, New York (2007)Google Scholar
  7. 7.
    Qureshi, A., Weber, R., Balakrishnan, H., Guttag, J., Maggs, B.: Cutting the electric bill for internet-scale systems. In: Proceedings of the ACM SIGCOMM 2009 conference on data communication, SIGCOMM ’09, pp. 123–134. ACM, New York (2009)Google Scholar
  8. 8.
    Hennecke, M., Frings, W., Homberg, W., Zitz, A., Knobloch, M., Böttiger, H.: Measuring power consumption on IBM Blue Gene/P. Comput. Sci. Res. Dev. 27(4), 329–336 (2012)CrossRefGoogle Scholar
  9. 9.
  10. 10.
    Mämmelä, O., Majanen, M., Basmadjian, R., Meer, H., Giesler, A., Homberg, W.: Energy-aware job scheduler for high-performance computing. Comput. Sci. Res. Dev. 27(4), 265–275 (2012)CrossRefGoogle Scholar
  11. 11.
    Meisner, D., Sadler, C., Barroso, L., Weber, W., Wenisch, T.: Power management of online data-intensive services. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp. 319–330 (2011)Google Scholar
  12. 12.
    Barroso, L., Holzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)Google Scholar
  13. 13.
    Pinheiro, E., Bianchini, R., Carrera, E.V., Heath, T.: Load balancing and unbalancing for power and performance in cluster-based systems. In: Proceedings of the Workshop on Compilers and Operating Systems for Low, Power (COLP’01) (2001)Google Scholar
  14. 14.
    Liu, Y., Zhu, H.: A survey of the research on power management techniques for high-performance systems. Softw. Pract. Exper. 40, 943–964 (2010)CrossRefGoogle Scholar
  15. 15.
    Lee, E., Kulkarni, I., Pompili, D., Parashar, M.: Proactive thermal management in green datacenters. J. Supercomput. 60(2), 165–195 (2012)CrossRefGoogle Scholar
  16. 16.
    Feng, W., Warren, M., Weigle, E.: The bladed beowulf: a cost-effective alternative to traditional beowulfs. In: Proceedings 2002 IEEE International Conference on Cluster Computing, 2002, pp. 245–254 (2002)Google Scholar
  17. 17.
    Hikita, J., Hirano, A., Nakashima, H.: Saving 200 kw and \(\$200\) k/year by power-aware job/machine scheduling. In: IEEE International Symposium on Parallel and Distributed Processing, 2008, IPDPS 2008, pp. 1–8 (2008)Google Scholar
  18. 18.
    Etsion, Y., Tsafrir, D.: A short survey of commercial cluster batch schedulers, Technical report. The Hebrew University of Jerusalem, Jerusalem (2005)Google Scholar
  19. 19.
    Feitelson, D., Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: Parallel Processing Symposium, 1998, IPPS/SPDP 1998. In: Proceedings of the 1st Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing 1998, pp. 542–546 (1998)Google Scholar
  20. 20.
    Tsafrir, D., Etsion, Y., Feitelson, D.: Backfilling using system-generated predictions rather than user runtime estimates. IEEE Trans. Parallel Distrib. Syst. 18(6), 789–803 (2007)CrossRefGoogle Scholar
  21. 21.
    Li, Y., Lan, Z., Gujrati, P., Sun, X.-H.: Fault-aware runtime strategies for high-performance computing. IEEE Trans. Parallel Distrib. Syst. 20(4), 460–473 (2009)CrossRefGoogle Scholar
  22. 22.
    IBM Blue Gene team: Overview of the IBM Blue Gene/P project. IBM J. Res. Dev. 52(1.2), pp. 199–220 (2008)Google Scholar
  23. 23.
    Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education, New York (2001)zbMATHGoogle Scholar
  24. 24.
    Tang, W., Lan, Z., Desai, N., Buettner, D.: Fault-aware, utility-based job scheduling on Blue Gene/P systems. In: IEEE International Conference on Cluster Computing and Workshops, 2009, CLUSTER ’09, pp. 1–10 (2009)Google Scholar
  25. 25.
    Tang, W., Lan, Z., Desai, N., Buettner, D., Yu, Y.: Reducing fragmentation on torus-connected supercomputers. In: 2011 IEEE International Parallel Distributed Processing Symposium (IPDPS), pp. 828–839 (2011)Google Scholar
  26. 26.
    Cobalt resource manager. http://trac.mcs.anl.gov/projects/cobalt
  27. 27.
    Sabin, G., Kochhar, G., Sadayappan, P.: Job fairness in non-preemptive job scheduling. In: International Conference on Parallel Processing, 2004, ICPP 2004, vol. 1, pp. 186–194 (2004)Google Scholar
  28. 28.
    Sabin, G., Sadayappan, P.: Unfairness metrics for space-sharing parallel job schedulers. In: Feitelson, D.G., Frachtenberg, E., Rudolph, L., Schwiegelshohn, U. (eds.) JSSPP 2005. LNCS, vol. 3834, pp. 238–256. Springer, Heidelberg (2005) CrossRefGoogle Scholar
  29. 29.
    Tang, W., Ren, D., Lan, Z., Desai, N.: Adaptive metric-aware job scheduling for production supercomputers. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 107–115 (2012)Google Scholar
  30. 30.
    Pemmaraju, S., Skiena, S.: Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Cambridge University Press, New York (2003)CrossRefGoogle Scholar
  31. 31.
    Rodero, I., Guim, F., Corbalan, J.: Evaluation of coordinated grid scheduling strategies. In: 11th IEEE International Conference on High Performance Computing and Communications, 2009, HPCC ’09, pp. 1–10 (2009)Google Scholar
  32. 32.
    Tang, W., Desai, N., Buettner, D., Lan, Z.: Analyzing and adjusting user runtime estimates to improve job scheduling on the Blue Gene/P. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS) 2010, pp. 1–11 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Zhou Zhou
    • 1
  • Zhiling Lan
    • 1
  • Wei Tang
    • 2
  • Narayan Desai
    • 2
  1. 1.Department of Computer ScienceIllinois Institute of TechnologyChicagoUSA
  2. 2.Mathematics and Computer Science DivisionArgonne National LaboratoryArgonneUSA

Personalised recommendations