A Power-Aware Autonomic Approach for Performance Management of Scientific Applications in a Data Center Environment

  • Rajat MehrotraEmail author
  • Ioana Banicescu
  • Srishti Srivastava
  • Sherif Abdelwahed


In the recent years, computer servers and data center facilities that provide high performance computing (HPC) for scientific applications have largely increased in numbers and have become great consumers of electrical power. Supercomputers often run at their peak performance for an efficient execution of scientific applications, and therefore consume an enormous amount of power that results in increased operational cost. Furthermore, an increase in the power consumption results in an increase in the temperature of the physical HPC systems, which in turn translates into increased failure rates and decreased reliability. Slowing down these HPC systems by reducing the individual speed of the processors, results in a loss of execution performance of the scientific application, due to the variation in processing speed. Another cause of the degradation in the execution performance of scientific applications is the variation in the computational resource availability due to its utilization by other applications executing on the same computing node in a space shared manner. The variations in processor availability can lead to severe performance degradation in the execution environment due to load imbalance and a violation of the performance objectives, such as meeting a deadline, and therefore it may result in high penalty in terms of revenue loss to the service providers. In this chapter, a utility based power-aware approach has been presented that uses a model-based control theoretic framework for executing scientific applications. The approach and related simulations indicate that the performance and the power requirements of the system can dynamically be adjusted, while maintaining the predefined quality of service (QoS) goals in terms of deadline of execution and power consumption of the HPC system, even in the presence of computational resource related perturbations. This approach is autonomic, performance directed, dynamically controlled, and independent of (does not interfere with) the execution of the application.



The authors would like to thank the National Science Foundation (NSF) for its support of this work through the grant NSF IIP-1034897.


  1. 1.
    Report to congress on server and data center energy efficiency public law 109-431. Technical report, U.S. Environmental Protection Agency ENERGY STAR Program, August 2 2007.Google Scholar
  2. 2.
    A simple way to estimate the cost of downtime. In Proceedings of the 16th USENIX conference on System administration (LISA '02), pages 185–188, Berkeley, CA, USA, 2002. USENIX Association.Google Scholar
  3. 3.
    Wu chun Feng, Xizhou Feng, and Rong Ge. Green supercomputing comes of age. IT Professional, 10(1):17–23, 2008.CrossRefGoogle Scholar
  4. 4.
    W. Feng. Green destiny + mpiblast = bioinfomagic. In 10th International Conference on Parallel Computing (PARCO), pages 653–660, 2003.Google Scholar
  5. 5.
    Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. Cpu miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of the 2007 International Conference on Parallel Processing (ICPP '07), page 18, Washington, DC, USA, 2007. IEEE Computer Society.Google Scholar
  6. 6.
    R. Ge and K.W. Cameron. Power-aware speedup. In Proceedings of the IEEE International on Parallel and Distributed Processing Symposium (IPDPS)., pages 1–10, March 2007.Google Scholar
  7. 7.
    Chung-hsing Hsu and Wu-chun Feng. A power-aware run-time system for high-performance computing. In Proceedings of the ACM/IEEE conference on Supercomputing (SC '05), page 1, Washington, DC, USA, 2005. IEEE Computer Society.Google Scholar
  8. 8.
    Ioana Banicescu and Ricolindo L. Carino. Addressing the stochastic nature of scientific computations via dynamic loop scheduling. Electronic Transactions on Numerical Analysis 21:66-80, 2005.Google Scholar
  9. 9.
    Rajat Mehrotra, Ioana Banicescu, and Srishti Srivastava. A utility based power-aware autonomic approach for running scientific applications. In Proceedings of IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pages 1457–1466, 2012.Google Scholar
  10. 10.
    David A. Patterson and John L. Hennessy. Computer Organization and Design, The Hardware/Software Interface, 4th Edition. Morgan Kaufmann, 2008.Google Scholar
  11. 11.
    Yongpeng Liu and Hong Zhu. A survey of the research on power management techniques for high-performance systems. Software: Practice and Experience, 40(11):943–964, October 2010.Google Scholar
  12. 12.
    M. Nakao, H. Hayama, and M. Nishioka. Which cooling air supply system is better for a high heat density room: underfloor or overhead? In Proceedings of Telecommunications Energy Conference, (INTELEC '91), pages 393–400, 1991.Google Scholar
  13. 13.
    H. Hayama and M. Nakao. Air flow systems for telecommunications equipment rooms. In Proceedings of Telecommunications Energy Conference (INTELEC '89), pages 8.3/1–8.3/7 vol.1, 1989.Google Scholar
  14. 14.
    Taliver Heath, Ana Paula Centeno, Pradeep George, Luiz Ramos, Yogesh Jaluria, and Ricardo Bianchini. Mercury and freon: temperature emulation and management for server systems. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, ASPLOS XII, pages 106–116, New York, NY, USA, 2006. ACM.Google Scholar
  15. 15.
    Justin Moore, Jeff Chase, Parthasarathy Ranganathan, and Ratnesh Sharma. Making scheduling “cool”: temperature-aware workload placement in data centers. In Proceedings of the annual conference on USENIX Annual Technical Conference, ATEC '05, pages 5–5, Berkeley, CA, USA, 2005. USENIX Association.Google Scholar
  16. 16.
    Tridib Mukherjee, Ayan Banerjee, Georgios Varsamopoulos, Sandeep K. S. Gupta, and Sanjay Rungta. Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Computer Networks, 53(17):2888–2904, December 2009.Google Scholar
  17. 17.
    Eun Kyung Lee, Indraneel Kulkarni, Dario Pompili, and Manish Parashar. Proactive thermal management in green datacenters. Journal of Supercomput., 60(2):165–195, May 2012.CrossRefGoogle Scholar
  18. 18.
  19. 19.
    Severin Zimmermann, Ingmar Meijer, Manish K. Tiwari, Stephan Paredes, Bruno Michel, and Dimos Poulikakos. Aquasar: A hot water cooled data center with direct energy reuse. Energy, 43(1):237–245, 2012. 2nd International Meeting on Cleaner Combustion (CM0901-Detailed Chemical Models for Cleaner Combustion).CrossRefGoogle Scholar
  20. 20.
    Chung-Hsing Hsu and Wu-Chun Feng. Effective dynamic voltage scaling through cpu-boundedness detection. In In Workshop on Power Aware Computing Systems, pages 135–149, 2004.Google Scholar
  21. 21.
    Vincent W. Freeh, David K. Lowenthal, Feng Pan, Nandini Kappiah, Rob Springer, Barry L. Rountree, and Mark E. Femal. Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans. Parallel Distrib. Syst., 18:835–848, June 2007.Google Scholar
  22. 22.
    Michael Knobloch. Chapter 1 - energy-aware high performance computing—a survey. In Ali Hurson, editor, Green and Sustainable Computing: Part II, volume 88 of Advances in Computers, pages 1–78. Elsevier, 2013.Google Scholar
  23. 23.
    B. J. Smith. Architecture and applications of the hep multiprocessor computer system. In SPIE - Real-Time Signal Processing IV, pages 241–248, 1981.Google Scholar
  24. 24.
    Clyde P. Kruskal and Alan Weiss. Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng., 11(10):1001–1016, 1985.Google Scholar
  25. 25.
    T. H. Tzen and L. M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers. IEEE Trans. Parallel Distrib. Syst., 4(1):87–98, 1993.CrossRefGoogle Scholar
  26. 26.
    Susan Flynn Hummel, Edith Schonberg, and Lawrence E. Flynn. Factoring: a method for scheduling parallel loops. Communication of ACM, 35(8):90–101, 1992.Google Scholar
  27. 27.
    Ioana Banicescu and Susan Flynn Hummel. Balancing processor loads and exploiting data locality in n-body simulations. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, Supercomputing '95 (on CDROM), pages 43–55, New York, NY, USA, 1995. ACM.Google Scholar
  28. 28.
    Susan Flynn Hummel, Jeanette Schmidt, R. N. Uma, and Joel Wein. Load-sharing in heterogeneous systems via weighted factoring. In Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures (SPAA '96), pages 318–328, New York, NY, USA, 1996. ACM.Google Scholar
  29. 29.
    Ioana Banicescu and Vijay Velusamy. Performance of scheduling scientific applications with adaptive weighted factoring. In Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS '01), page 84, Washington, DC, USA, 2001. IEEE Computer Society.Google Scholar
  30. 30.
    Ricolindo L. Carino Cariño and Ioana Banicescu. Dynamic load balancing with adaptive factoring methods in scientific applications. The Journal of Supercomputing, 44(1):41–63, 2008.CrossRefGoogle Scholar
  31. 31.
    Ioana Banicescu, Vijay Velusamy, and Johnny Devaprasad. On the scalability of dynamic scheduling scientific applications with adaptive weighted factoring. Cluster Computing, 6(3):215–226, 2003.CrossRefGoogle Scholar
  32. 32.
    Ioana Banicescu and Vijay Velusamy. Load balancing highly irregular computations with the adaptive factoring. In 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 15-19 April 2002, Fort Lauderdale, FL, USA, CD-ROM/Abstracts Proceedings. IEEE Computer Society, 2002.Google Scholar
  33. 33.
    Ricolindo Cari˜no, Ioana Banicescu, Thomas Rauber, and Gudula Rünger. Dynamic loop scheduling with processor groups. In Proceedings of the ISCA Parallel and distributed Computing Symposium (PDCS), pages 78–84, 2004.Google Scholar
  34. 34.
    Yong Dong, Juan Chen, Xuejun Yang, Lin Deng, and Xuemeng Zhang. Energy-oriented openmp parallel loop scheduling. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pages 162–169, Washington, DC, USA, 2008. IEEE Computer Society.Google Scholar
  35. 35.
    Anton Cervin, Johan Eker, Bo Bernhardsson, and Karl-Erik Arzen. Feedback–feedforward scheduling of control tasks. Real-Time Systems, 23(1/2):25–53, 2002.CrossRefzbMATHGoogle Scholar
  36. 36.
    T.F. Abdelzaher, K.G. Shin, and N. Bhatti. Performance guarantees for web server end-systems: a control-theoretical approach. IEEE Transactions on Parallel and Distributed Systems, 13(1):80–96, Jan 2002.CrossRefGoogle Scholar
  37. 37.
    R. Mehrotra, A. Dubey, S. Abdelwahed, and W. Monceaux. Large scale monitoring and online analysis in a distributed virtualized environment. In 8th IEEE International Conference and Workshops on Engineering of Autonomic and Autonomous Systems (EASe), 2011, pages 1–9, 2011.Google Scholar
  38. 38.
    Chenyang Lu, Guillermo A. Alvarez, and John Wilkes. Aqueduct: Online data migration with performance guarantees. In FAST '02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, page 21, Berkeley, CA, USA, 2002. USENIX Association.Google Scholar
  39. 39.
    R. Mehrotra, A. Dubey, S. Abdelwahed, and A. Tantawi. Integrated monitoring and control for performance management of distributed enterprise systems. In 2010 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), pages 424–426, 2010.Google Scholar
  40. 40.
    Rajat Mehrotra, Abhishek Dubey, Sherif Abdelwahed, and Asser Tantawi. A Power-aware Modeling and Autonomic Management Framework for Distributed Computing Systems. CRC Press, 2011.Google Scholar
  41. 41.
    Dara Kusic, Nagarajan Kandasamy, and Guofei Jiang. Approximation modeling for the online performance management of distributed computing systems. In ICAC '07: Proceedings of the Fourth International Conference on Autonomic Computing, page 23, Washington, DC, USA, 2007. IEEE Computer Society.Google Scholar
  42. 42.
    Rajat Mehrotra, Abhishek Dubey, Sherif Abdelwahed, and Asser Tantawi. Model identification for performance management of distributed enterprise systems. (ISIS-10-104), 2010.Google Scholar
  43. 43.
    S. Abdelwahed, Nagarajan Kandasamy, and Sandeep Neema. Online control for self-management in computing systems. In Proceedings of Real-Time and Embedded Technology and Applications Symposium,(RTAS) 2004., pages 368–375, 2004.Google Scholar
  44. 44.
    Abhishek Dubey, Rajat Mehrotra, Sherif Abdelwahed, and Asser Tantawi. Performance modeling of distributed multi-tier enterprise systems. SIGMETRICS Performance Evaluation Review, 37(2):9–11, 2009.CrossRefGoogle Scholar
  45. 45.
    S. Abdelwahed, Jia Bai, Rong Su, and Nagarajan Kandasamy. On the application of predictive control techniques for adaptive performance management of computing systems. IEEE Transactions on Network and Service Management, 6(4):212–225, 2009.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Rajat Mehrotra
    • 1
    Email author
  • Ioana Banicescu
    • 2
  • Srishti Srivastava
    • 2
  • Sherif Abdelwahed
    • 1
  1. 1.Department of Electrical and Computer EngineeringNSF Center for Cloud and Autonomic Computing, Mississippi State UniversityMSUSA
  2. 2.Department of Computer Science and EngineeringNSF Center for Cloud and Autonomic Computing, Mississippi State UniversityMSUSA

Personalised recommendations