Skip to main content
Log in

Efficient inter-core power and thermal balancing for multicore processors

  • Published:
Computing Aims and scope Submit manuscript

Abstract

Nowadays the market is dominated by processor architectures that employ multiple cores per chip. These architectures have different behavior depending on the applications running on the processor (parallel, multiprogrammed, sequential), but all happen to meet what is called the power and temperature wall. For future technologies (less than 22 nm) and a fixed die size, it is still uncertain the percentage of processor that can be simultaneously powered on. Power saving and power budget mechanisms can be useful to precisely control the amount of power been dissipated by the processor. After an initial analysis we discover that legacy power saving techniques work properly for matching a power budget in thread-independent and multi-programmed workloads, but not in parallel workloads. When running parallel shared-memory applications sacrificing some performance in a single core (thread) in order to be more energy-efficient can unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), having a negative impact on global performance. In order to solve this problem we propose power token balancing (PTB) aimed at accurately matching an external power constraint by balancing the power consumed among the different cores. Experimental results show that PTB matches more accurately a predefined power budget (50 % of the original peak power) than other mechanisms like DVFS. The total energy consumed over the budget is reduced to only 8 % for a 16-core CMP with only a 3 % energy increase (overhead). We also introduce a novel mechanism named “Nitro”. Nitro will overclock the core that enters a critical section (delimited by locks) in order to free the lock as soon as possible. Experimental results have shown that Nitro is able to reduce the execution time of lock-intensive applications in more than 4 % by overclocking the frequency by 15 % in selected program phases over a period of time that represents a 22 % of the total execution time. We conclude the work with an analysis of the thermal effects of PTB in different CMP configurations using realistic power numbers and heatsink/fan configurations. Results show how PTB not only balances temperature between the different cores, reducing temperature gradient and increasing signal reliability, but also allows a reduction of 28–30 % of both average and peak temperatures for the studied benchmarks when a peak power budget of 50 % is exceeded.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. The difference between maximum and minimum die temperatures in a given time.

  2. These spare tokens represent the amount of power that the core can dissipate and still be under the power budget, will be defined in Sect. 3.2.

  3. Group of instructions delimited by branches.

  4. Reorder Buffer.

  5. As mentioned before, we can store information about past power behavior at a basic block level to predict future power trends and increase power matching accuracy.

  6. We only report results for a 50 % power budget due to space limitations for the sake of visibility. For less restrictive power budgets PTB also works properly.

  7. DVFS and DFS are applied at a core-level to increase its accuracy when matching the power budget.

    Fig. 8
    figure 8

    Normalized energy (top) and AoPB (bottom) for a varying number of cores and PTB policies

References

  1. Bhattacharjee A, Martonosi M (2009) Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In: Proceedings of the 36th annual international symposium on computer architecture, ISCA ’09, pp 290–301. ACM, New York, NY, USA. http://doi.acm.org/10.1145/1555754.1555792

  2. Cai Q, Gonzalez J, Rakvic R, Magklis G, Chaparro P, Gonzalez A (2008) Meeting points: Using thread criticality to adapt multicore hardware to parallel regions. In: Proceedings of the international conference on parallel architectures and compilation techniques, pp 240–249

  3. Cebrian JM, Aragon JL, Garcia JM, Petoumenos P, Kaxiras S (2009) Efficient microarchitecture policies for accurately adapting to power constraints. In: Proceedings of the IEEE international parallel and distributed processing, symposium, pp 1–12. doi:10.1109/IPDPS.2009.5161022

  4. Cebrian JM, Aragon JL, Kaxiras S (2011) Power token balancing: adapting CMPS to power constraints for parallel multithreaded workloads. In: Proceedings of the IEEE international parallel and distributed processing symposium

  5. Donald J, Martonosi M (2006) Techniques for multicore thermal management: Classification and new exploration. In: Proceedings of the 33rd international symposium on computer, architecture, pp 78–88. doi:10.1109/ISCA.2006.39

  6. Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of the 38th annual international symposium on Computer architecture, ISCA ’11, pp 365–376. ACM, New York, NY, USA. doi:10.1145/2000064.2000108. http://doi.acm.org/10.1145/2000064.2000108

  7. Flynn MJ, Hung P (2005) Microprocessor design issues: thoughts on the road ahead 25(3):16–31. doi:10.1109/MM.2005.56

  8. Isci C, Buyuktosunoglu A, Cher CY, Bose P, Martonosi M (2006) An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget. In: Proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture, pp 347–358. doi:10.1109/MICRO.2006.8

  9. Keshavarzi, A. (1997) Intrinsic iddq: origins, reduction, and applications in deep sub- low-power cmos ic’s. In: Proceedings of the IEEE international test conference

  10. Kim NS, Austin T, Baauw D, Mudge T, Flautner K, Hu JS, Irwin MJ, Kandemir M, Narayanan V (2003) Leakage current: Moore’s law meets static power. Computer 36(12):68–75. doi:10.1109/MC.2003.1250885

    Article  Google Scholar 

  11. Kim W, Gupta MS, Wei GY, Brooks D (2008) System level analysis of fast, per-core DVFS using on-chip switching regulators. In: Proceedings of the IEEE 14th international symposium on high performance computer, architecture, pp 123–134. doi:10.1109/HPCA.2008.4658633

  12. Li J, Martinez JF, Huang MC (2004) The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors. In: Proceedings of the 10th international symposium on high performance computer, architecture, pp 14–23. doi:10.1109/HPCA.2004.10018

  13. Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the 42th international symposium on microarchitecture, pp 469–480

  14. Li T, Lebeck AR, Sorin DJ (2006) Spin detection hardware for improved management of multithreaded systems 17(6):508–521. doi:10.1109/TPDS.2006.78

    Google Scholar 

  15. Macken P, Degrauwe M, Van Paemel M, Oguey H (1990) A voltage reduction technique for digital systems. In: Proceedings of the 37th IEEE international solid-state circuits conference, digest of technical papers, pp 238–239. doi:10.1109/ISSCC.1990.110213

  16. Magnusson PS, Christensson M, Eskilson J, Forsgren D, Hallberg G, Hogberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. Computer 35(2):50–58. doi:10.1109/2.982916

    Article  Google Scholar 

  17. Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33:2005

    Article  Google Scholar 

  18. Meng K, Joseph R, Dick RP, Shang L (2008) Multi-optimization power management for chip multiprocessors. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08, pp 177–186. ACM, New York, NY, USA. http://doi.acm.org/10.1145/1454115.1454141

  19. Sartori J, Kumar R (2009) Distributed peak power management for many-core architectures. In: Proceedings of the design, automation and test in Europe conference and Exhibition, pp 1556–1559

  20. Sasanka R, Hughes CJ, Adve SV (2002) Joint local and global hardware adaptations for energy. In: Proceedings of the 10th international conference on architectural support for programming languages and operating systems, ASPLOS-X, pp 144–155. ACM, New York, NY, USA. http://doi.acm.org/10.1145/605397.605413

  21. Semeraro G, Magklis G, Balasubramonian R, Albonesi DH, Dwarkadas S, Scott ML (2002) Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In: Proceedings of the 8th international high-performance computer architecture, symposium, pp 29–40 doi:10.1109/HPCA.2002.995696

  22. Simunic T, Benini L, Acquaviva A, Glynn P, de Micheli G (2001) Dynamic voltage scaling and power management for portable systems. In: Proceedings on design automation conference, pp 524–529. doi:10.1109/DAC.2001.156195

  23. Skadron K, Stan MR, Huang W, Velusamy S, Sankaranarayanan K, Tarjan D (2003) Temperature-aware microarchitecture. In: Proceedings of the 30th annual international computer architecture, symposium, pp 2–13. doi:10.1109/ISCA.2003.1206984

  24. Winter JA, Albonesi DH (2008) Addressing thermal nonuniformity in smt workloads. ACM Trans Archit Code Optim 5:4:1–4:28. http://doi.acm.org/10.1145/1369396.1369400

    Google Scholar 

  25. Wu Q, Juang P, Martonosi M, Clark DW (2005) Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors. In: Proceedings of the 11th international symposium on high-performance computer, architecture, pp 178–189. doi:10.1109/HPCA.2005.43

Download references

Acknowledgments

This work was supported by the Spanish MEC, MICINN and EU Commission FEDER funds under Grants CSD2006-00046 and TIN2009-14475-C04. Also by the EU-FP7 ICT Project “Embedded Reconfigurable Architecture (ERA)”, contract No. 249059.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan M. Cebrián.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cebrián, J.M., Sánchez, D., Aragón, J.L. et al. Efficient inter-core power and thermal balancing for multicore processors. Computing 95, 537–566 (2013). https://doi.org/10.1007/s00607-012-0236-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-012-0236-6

Keywords

Navigation