Machine Learning in VLSI Computer-Aided Design pp 571-608 | Cite as

# Multicore Power and Thermal Proxies Using Least-Angle Regression

## Abstract

The use of performance counters (PCs) to develop per-core power and thermal proxies for multicore processors is now well established. These proxies are typically obtained using traditional linear regression techniques. These techniques have the disadvantage of requiring the full PC set regardless of the workload run by the multicore processor. Typically a computationally expensive principal component analysis is conducted to find the PCs most correlated with each workload. In this chapter, we use the more recent algorithm of least-angle regression to efficiently develop power and thermal proxies that include only PCs most relevant to the workload. Such PCs are considered workload signatures in the PC space and used to categorize the workload and to trigger specific power and thermal management action. Also, the workload signatures at both the core and the thread level are used to decide thread migration policies to maximize per-core utilization and reduce the number of active cores. Our new power and thermal proxies are trained and tested on workloads from the PARSEC and SPEC CPU 2006 benchmarks with an average error of less than 3%. Power, thermal, and performance-aware autoscaling policies are presented, and extensive numerical experiments are used to illustrate the advantages of our algorithm for real-time multicore power and performance management.

## Notes

### Acknowledgements

The authors would like to acknowledge very helpful discussions with Andrew Henroid from Intel, and with Pradip Bose, Alper Buyuktosunoglu, Canturk Isci, Prabhakar Kudva, and Charles Lefurgy from IBM. This work was supported by SRC under Contract 2011-TJ- 2192 with customized funding from Mubadala, Abu Dhabi, UAE.

## References

- 1.J.S. Lee, K. Skadron, S.W. Chung, Predictive temperature-aware DVFs. IEEE Trans. Comput.
**59**(1), 127–133 (2010)MathSciNetCrossRefGoogle Scholar - 2.R. Kalla, B. Sinharoy, W.J. Starke, M. Floyd, Power7: IBM’s next-generation server processor. IEEE Micro
**30**(2), 7–15 (2010)CrossRefGoogle Scholar - 3.M. Floyd, M. Allen-Ware, K. Rajamani, B. Brock, C. Lefurgy, A.J. Drake, L. Pesantez, T. Gloekler, J.A. Tierno, P. Bose et al., Introducing the adaptive energy management features of the power7 chip. IEEE Micro
**31**(2), 60–75 (2011)CrossRefGoogle Scholar - 4.K. Kasichayanula, D. Terpstra, P. Luszczek, S. Tomov, S. Moore, G.D. Peterson, Power aware computing on GPUs, in
*2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)*(IEEE, Piscataway, 2012), pp. 64–73Google Scholar - 5.A. Sîrbu, O. Babaoglu, Predicting system-level power for a hybrid supercomputer (2016). Preprint. arXiv:1605.09530Google Scholar
- 6.M. Yasin, A. Shahrour, I.M. Elfadel, Unified, ultra compact, quadratic power proxies for multi-core processors, in
*Design, Automation and Test in Europe Conference and Exhibition (DATE), 2014*(IEEE, Piscataway, 2014), pp. 1–4Google Scholar - 7.C.-B. Cho, T. Li, Using wavelet domain workload execution characteristics to improve accuracy, scalability and robustness in program phase analysis, in
*IEEE International Symposium on Performance Analysis of Systems & Software, 2007. ISPASS 2007*(IEEE, Piscataway, 2007), pp. 136–145Google Scholar - 8.R. Sarikaya, C. Isci, A. Buyuktosunoglu, Runtime application behavior prediction using a statistical metric model. IEEE Trans. Comput.
**62**(3), 575–588 (2013)MathSciNetCrossRefGoogle Scholar - 9.B. Efron, T. Hastie, I. Johnstone, R. Tibshirani et al., Least angle regression. Ann. Stat.
**32**(2), 407–499 (2004)MathSciNetCrossRefGoogle Scholar - 10.R.R. Karn, I.M. Elfadel, Extraction of thermal workload signatures in multicore processors using least angle regression, in
*2015 International Conference on Communications, Signal Processing, and Their Applications (ICCSPA’15)*, Feb 2015, pp. 1–5Google Scholar - 11.R.R. Karn, I.M. Elfadel, Multicore power proxies using least-angle regression, in
*2015 IEEE International Symposium on Circuits and Systems (ISCAS)*, May 2015, pp. 2872–2875Google Scholar - 12.R.R. Karn, I.M. Elfadel, Autoscaling of cores in multicore processors using power and thermal workload signatures, in
*IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS 2016)*, Oct 2016, pp. 1–4Google Scholar - 13.T. Hastie, R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, R. Tibshirani,
*The Elements of Statistical Learning*, vol. 2, no. 1 (Springer, Berlin, 2009)CrossRefGoogle Scholar - 14.J. Demmel, A. Gearhart, Instrumenting linear algebra energy consumption via on-chip energy counters. UC at Berkeley, Tech. Rep. UCB/EECS-2012-168 (2012)Google Scholar
- 15.Intel PCM performance counter monitor description, https://software.intel.com/en-us/articles/intel-performance-counter-monitor. Accessed 30 March 2015
- 16.M. Shafique, S. Garg, J. Henkel, D. Marculescu, The EDA challenges in the dark silicon era: temperature, reliability, and variability perspectives, in
*Proceedings of the 51st Annual Design Automation Conference*(ACM, New York, 2014), pp. 1–6Google Scholar - 17.J. Henkel, H. Khdr, S. Pagani, M. Shafique, New trends in dark silicon, in
*Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE*(IEEE, Piscataway, 2015), pp. 1–6Google Scholar - 18.M. Shafique, D. Gnad, S. Garg, J. Henkel, Variability-aware dark silicon management in on-chip many-core systems, in
*Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition*. EDA Consortium (2015), pp. 387–392Google Scholar - 19.H. Khdr, S. Pagani, M. Shafique, J. Henkel, Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips, in
*Proceedings of the 52nd Annual Design Automation Conference*(ACM, New York, 2015), p. 179Google Scholar - 20.I. Takouna, W. Dawoud, C. Meinel, Accurate multicore processor power models for power-aware resource management, in
*2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC)*(IEEE, Piscataway, 2011), pp. 419–426Google Scholar - 21.V.M. Weaver, Linux perf event features and overhead, in
*The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath*, 2013, p. 80Google Scholar - 22.H. Zhao, A. Sharifi, S. Srikantaiah, M. Kandemir, Feedback control based cache reliability enhancement for emerging multicores, in
*2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)*(IEEE, Piscataway, 2011), pp. 56–62Google Scholar - 23.E. Seo, J. Jeong, S. Park, J. Lee, Energy efficient scheduling of real-time tasks on multicore processors. IEEE Trans. Parallel Distrib. Syst.
**19**(11), 1540–1552 (2008)CrossRefGoogle Scholar - 24.X. Guerin, W. Tan, Y. Liu, S. Seelam, P. Dube, Evaluation of multi-core scalability bottlenecks in enterprise java workloads, in
*2012 IEEE 20th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS)*(IEEE, Piscataway, 2012), pp. 308–317Google Scholar - 25.K.K. Pusukuri, R. Gupta, L.N. Bhuyan, Thread reinforcer: dynamically determining number of threads via OS level monitoring, in
*IEEE International Symposium on Workload Characterization (IISWC)*, November 2011, pp. 116–125Google Scholar - 26.C. Bienia, S. Kumar, J.P. Singh, K. Li, The parsec benchmark suite: characterization and architectural implications, in
*Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques*(ACM, New York, 2008), pp. 72–81Google Scholar - 27.J.L. Henning, Spec cpu2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News
**34**(4), 1–17 (2006)CrossRefGoogle Scholar