Skip to main content

Advertisement

Log in

Machine Learning-Based Energy Optimization for Parallel Program Execution on Multicore Chips

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Energy is increasingly becoming the major constraint in designing multicore chips. Power and performance are the main components of energy and are inversely correlated. In this paper, we study the energy optimization of multicore chips that process parallel workloads using either power or performance optimization. To do so, we propose novel machine learning-based global and dynamic power management controller. The controller is used either to maximize performance within a fixed power budget or to minimize the consumed power to achieve the same baseline performance. The controller is also scalable, as it does not incur significant overhead as the number of cores or demands increases. The technique was evaluated using the PARSEC benchmark suite on a full-system simulator. The experimental results show that our global power controller outperforms, in terms of the EDP metric, the non-DVFS baseline by 28 and 35.5%, when optimized for performance and power, respectively. This suggests that optimizing power is more related to energy efficiency than optimizing performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Huang, K.; Santinelli, L.; Chen, J.-J.; Thiele, L.; Buttazzo, G.C.: Adaptive power management for real-time event streams. In: Proceedings of the 2010 Asia and South Pacific Design Automation Conference, pp. 7–12. IEEE Press (2010)

  2. Jha, S.S.; Heirman, W.; Falcón, A.; Tubella, J.; González, A.; Eeckhout, L.: Shared resource aware scheduling on power-constrained tiled many-core processors. J. Parallel Distrib. Comput. 100, 30–41 (2017)

    Article  Google Scholar 

  3. Sharifi, A.; Mishra, A.K.; Srikantaiah, S.; Kandemir, M.; Das, C.R.: Pepon: performance-aware hierarchical power budgeting for noc based multicores. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, pp. 65–74. ACM (2012)

  4. Das, A.; Shafik, R.A.; Merrett, G.V.; Al-Hashimi, B.M.; Kumar, A.; Veeravalli, B.: Reinforcement learning-based inter-and intra-application thermal optimization for lifetime improvement of multicore systems. In: Proceedings of the 51st Annual Design Automation Conference, pp. 1–6. ACM (2014)

  5. Otoom, M.; Trancoso, P.; Almasaeid, H.; Alzubaidi, M.: Scalable and dynamic global power management for multicore chips. In: Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures, pp. 25–30. ACM (2015)

  6. Khan, U.A.; Rinner, B.: Online learning of timeout policies for dynamic power management. ACM Trans. Embed. Comput. Syst. (TECS) 13(4), 96 (2014)

    Google Scholar 

  7. Jung, H.; Pedram, M.: Supervised learning based power management for multicore processors. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29(9), 1395–1408 (2010)

    Article  Google Scholar 

  8. Liu, W.; Tan, Y.; Qiu, Q.: Enhanced q-learning algorithm for dynamic power management with performance constraint. In: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 602–605. European Design and Automation Association (2010)

  9. Chen, Z.; Marculescu, D.: Distributed reinforcement learning for power limited many-core system performance optimization. In: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, pp. 1521–1526. EDA Consortium (2015)

  10. Kianzad, V.; Bhattacharyya, S.S.; Qu, G.: Casper: an integrated energy-driven approach for task graph scheduling on distributed embedded systems. In: 16th IEEE International Conference on Application-Specific Systems, Architecture Processors, 2005. ASAP 2005, pp. 191–197. IEEE (2005)

  11. Hua, S.; Qu, G.; Bhattacharyya, S.S.: Energy reduction techniques for multimedia applications with tolerance to deadline misses. In: Proceedings of the 40th Annual Design Automation Conference, pp. 131–136. ACM (2003)

  12. Choi, K.; Soma, R.; Pedram, M.: Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24(1), 18–28 (2005)

    Article  Google Scholar 

  13. Gheorghita, S.V.; Basten, T.; Corporaal, H.: Application scenarios in streaming-oriented embedded-system design. IEEE Des. Test Comput. 25(6) (2008)

    Article  Google Scholar 

  14. Xian, C.; Lu, Y.-H.; Li, Z.: Energy-aware scheduling for real-time multiprocessor systems with uncertain task execution time. In: Proceedings of the 44th Annual Design Automation Conference, pp. 664–669. ACM (2007)

  15. Yang, P.; Marchal, P.; Wong, C.; Himpe, S.; Catthoor, F.; David, P.; Vounckx, J.; Lauwereins, R.: Managing dynamic concurrent tasks in embedded real-time multimedia systems. In: Proceedings of the 15th International Symposium on System Synthesis, pp. 112–119. ACM (2002)

  16. Kim, W.; Gupta, M.S.; Wei, G.-Y.; Brooks, D.: System level analysis of fast, per-core dvfs using on-chip switching regulators. In: IEEE 14th International Symposium on High Performance Computer Architecture, 2008. HPCA 2008, pp. 123–134. IEEE (2008)

  17. Hua, S.; Qu, G.: Approaching the maximum energy saving on embedded systems with multiple voltages. In: Proceedings of the 2003 IEEE/ACM International Conference on Computer-Aided Design, p. 26. IEEE Computer Society (2003)

  18. Li, D.; Chou, P.H.; Bagherzadeh, N.: Mode selection and mode-dependency modeling for power-aware embedded systems. In: Proceedings of the 2002 Asia and South Pacific Design Automation Conference, p. 697. IEEE Computer Society (2002)

  19. Hoeller Jr., A.S.; Wanner, L.F.; Fröhlich, A.A.: A hierarchical approach for power management on mobile embedded systems. In: Kleinjohann, B., Kleinjohann, L., Machado, R.J., Pereira, C.E., Thiagarajan, P.S. (eds.) From Model-Driven Design to Resource Management for Distributed Embedded Systems, pp. 265–274. Springer, Berlin (2006)

    Chapter  Google Scholar 

  20. Bhatti, K.; Belleudy, C.; Auguin, M.: Power management in real time embedded systems through online and adaptive interplay of dpm and dvfs policies. In: 2010 IEEE/IFIP 8th International Conference on Embedded and Ubiquitous Computing (EUC), pp. 184–191. IEEE (2010)

  21. Niu, L.; Quan, G.: Reducing both dynamic and leakage energy consumption for hard real-time systems. In: Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 140–148. ACM (2004)

  22. Kim, M.; Ha, S.: Hybrid run-time power management technique for real-time embedded system with voltage scalable processor. ACM SIGPLAN Not. 36(8), 11–19 (2001)

    Article  Google Scholar 

  23. Shin, Y.; Choi, K.; Sakurai, T.: Power optimization of real-time embedded systems on variable speed processors. In: Proceedings of the 2000 IEEE/ACM International Conference on Computer-Aided Design, pp. 365–368. IEEE Press (2000)

  24. Trajkovic, J.; Veidenbaum, A.V.; Kejariwal, A.: Improving sdram access energy efficiency for low-power embedded systems. ACM Trans. Embed. Comput. Syst. (TECS) 7(3), 24 (2008)

    Google Scholar 

  25. Reddy, R.; Petrov, P.: Cache partitioning for energy-efficient and interference-free embedded multitasking. ACM Trans. Embed. Comput. Syst. (TECS) 9(3), 16 (2010)

    Google Scholar 

  26. Tsai, Y.-Y.; Chen, C.-H.: Energy-efficient trace reuse cache for embedded processors. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 19(9), 1681–1694 (2011)

    Article  Google Scholar 

  27. Hajimiri, H.; Rahmani, K.; Mishra, P.: Synergistic integration of dynamic cache reconfiguration and code compression in embedded systems. In: Green Computing Conference and Workshops (IGCC), 2011 International, pp. 1–8. IEEE (2011)

  28. Albonesi, D.H.: Selective cache ways: on-demand cache resource allocation. In: 32nd Annual International Symposium on Microarchitecture, 1999. MICRO-32. Proceedings, pp. 248–259. IEEE (1999)

  29. Zhang, C.; Vahid, F.; Najjar, W.: A highly configurable cache architecture for embedded systems. In: 30th Annual International Symposium on Computer Architecture, 2003. Proceedings, pp. 136–146. IEEE (2003)

  30. Kin, J.; Gupta, M.; Mangione-Smith, W.H.: The filter cache: an energy efficient memory structure. In: Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, pp. 184–193. IEEE Computer Society (1997)

  31. Steinke, S.; Grunwald, N.; Wehmeyer, L.; Banakar, R.; Balakrishnan, M.; Marwedel, P.: Reducing energy consumption by dynamic copying of instructions onto onchip memory. In: Proceedings of the 15th International Symposium on System Synthesis, pp. 213–218. ACM (2002)

  32. Benini, L.; Macii, A.; Macii, E.; Poncino, M.: Increasing energy efficiency of embedded systems by application-specific memory hierarchy generation. IEEE Des. Test Comput. 17(2), 74–85 (2000)

    Article  Google Scholar 

  33. Bournoutian, G.; Orailoglu, A.: Miss reduction in embedded processors through dynamic, power-friendly cache design. In: Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE, pp. 304–309. IEEE (2008)

  34. Han, J.; Orshansky, M.: Approximate computing: an emerging paradigm for energy-efficient design. In: Test Symposium (ETS), 2013 18th IEEE European, pp. 1–6. IEEE (2013)

  35. Mittal, S.; Gupta, S.; Dasgupta, S.: Fpga: an efficient and promising platform for real-time image processing applications. In: National Conference on Research and Development in Hardware Systems (CSI-RDHS) (2008)

  36. Keckler, S.W.; Dally, W.J.; Khailany, B.; Garland, M.; Glasco, D.: GPUs and the future of parallel computing. IEEE Micro 31(5), 7–17 (2011)

    Article  Google Scholar 

  37. Ma, K.; Li, X.; Chen, M.; Wang, X.: Scalable power control for many-core architectures running multi-threaded applications. In: ACM SIGARCH Computer Architecture News, vol. 39, pp. 449–460. ACM (2011)

  38. Wang, Y.; Ma, K.; Wang, X.: Temperature-constrained power control for chip multiprocessors with online model estimation. In: ACM SIGARCH Computer Architecture News, vol. 37, pp. 314–324. ACM (2009)

    Article  MathSciNet  Google Scholar 

  39. Mishra, A.K.; Srikantaiah, S.; Kandemir, M.; Das,C.R.: Cpm in cmps: coordinated power management in chip-multiprocessors. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12. IEEE Computer Society (2010)

  40. Sartori, J.; Kumar, R.: Distributed peak power management for many-core architectures. In: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 1556–1559. European Design and Automation Association (2009)

  41. Winter, J.A.; Albonesi, D.H.; Shoemaker, C.A.: Scalable thread scheduling and global power management for heterogeneous many-core architectures. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 29–40. ACM (2010)

  42. Hanson, H.; Keckler, S.W.; Rajamani, K.; Ghiasi, S.; Rawson, F.; Rubio, J.: Power, performance, and thermal management for high-performance systems. In: Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pp. 1–8. IEEE (2007)

  43. Cochran, R.; Hankendi, C.; Coskun, A.K.; Reda, S.: Pack & cap: adaptive dvfs and thread packing under power caps. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 175–185. ACM (2011)

  44. Etinski, M.; Corbalan, J.; Labarta, J.; Valero, M.: Linear programming based parallel job scheduling for power constrained systems. In: 2011 International Conference on High Performance Computing and Simulation (HPCS), pp. 72–80. IEEE (2011)

  45. Ye, R.; Xu, Q.: Learning-based power management for multicore processors via idle period manipulation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 33(7), 1043–1055 (2014)

    Article  Google Scholar 

  46. Shen, H.; Tan, Y.; Lu, J.; Wu, Q.; Qiu, Q.: Achieving autonomous power management using reinforcement learning. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 18(2), 24 (2013)

    Google Scholar 

  47. Juan, D.-C.; Marculescu, D.: Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors. In: Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 97–102. ACM (2012)

  48. ul Islam, F .M .M.; Lin, M.: Hybrid dvfs scheduling for real-time systems based on reinforcement learning. IEEE Syst. J. 11(2), 931–940 (2017)

    Article  Google Scholar 

  49. Wang, Z.; Tian, Z.; Xu, J.; Maeda, R.K.; Li, H.; Yang, P.; Wang, Z.; Duong, L.H.;. Wang, Z; Chen, X.: Modular reinforcement learning for self-adaptive energy efficiency optimization in multicore system. In: Design Automation Conference (ASP-DAC), 2017 22nd Asia and South Pacific, pp. 684–689. IEEE (2017)

  50. Biswas, D.; Balagopal, V.; Shafik, R; Al-Hashimi, B.M.; Merrett, G.V.: Machine learning for run-time energy optimisation in many-core systems. In: 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1588–1592. IEEE (2017)

  51. Kumar, N.; Vidyarthi, D.P.: A GA based energy aware scheduler for DVFS enabled multicore systems. Computing 1–23 (2017)

  52. Zhu, K.; Ding, Y.: Research on low power scheduling of heterogeneous multi core mission based on genetic algorithm. In: 2017 9th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 219–223. IEEE (2017)

  53. Dhiman, G.; et al.: System-level power management using online learning. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 28(5), 676–689 (2009)

    Article  Google Scholar 

  54. Kolpe, T.; Zhai, A.; Sapatnekar, S.S.: Enabling improved power management in multicore processors through clustered DVFS. In: Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1–6. IEEE (2011)

  55. Zhang, Z.; Chang, J.M.: A cool scheduler for multi-core systems exploiting program phases. IEEE Trans. Comput. 63(5), 1061–1073 (2014)

    Article  MathSciNet  Google Scholar 

  56. Mitchell, T.M.: Machine learning. 1997, vol. 45, pp. 870–877. McGraw Hill, Burr Ridge (1997)

    Google Scholar 

  57. IBM: IBM CPLEX Optimizer (2017)

  58. P. University: Parsec Benchmark (2017)

  59. Bienia, C.; Kumar, S.; Singh, J.P.; Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81. ACM (2008)

  60. Binkert, N.; Beckmann, B.; Black, G.; Reinhardt, S.K.; Saidi, A.; Basu, A.; Hestness, J.; Hower, D.R.; Krishna, T.; Sardashti, S.; et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39(2), 1–7 (2011)

    Article  Google Scholar 

  61. I. Corp.: Intel Performance Counter Monitor–A Better Way to Measure CPU Utilization (2017)

  62. P. University: Princeton’s garnet network simulator (2017)

  63. Kahng, A.B.; Li, B.; Peh, L.-S.; Samadi, K.: Orion 2.0: a fast and accurate noc power and area model for early-stage design space exploration. In: Proceedings of the Conference on Design, Automation and Test in Europe, pp. 423–428. European Design and Automation Association (2009)

  64. Bartolini, A.; Cacciari, M.; Tilli, A.; Benini, L.; Gries, M.: A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores. In: Proceedings of the 20th Symposium on Great Lakes Symposium on VLSI, pp. 311–316. ACM (2010)

  65. Thoziyoor, S.; Ahn, J.H.; Monchiero, M.; Brockman, J.B.; Jouppi, N.P.: A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In: 35th International Symposium on Computer Architecture, 2008. ISCA’08, pp. 51–62. IEEE (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mwaffaq Otoom.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Otoom, M., Trancoso, P., Alzubaidi, M.A. et al. Machine Learning-Based Energy Optimization for Parallel Program Execution on Multicore Chips. Arab J Sci Eng 43, 7343–7358 (2018). https://doi.org/10.1007/s13369-018-3079-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-018-3079-4

Keywords

Navigation