, Volume 95, Issue 7, pp 537–566 | Cite as

Efficient inter-core power and thermal balancing for multicore processors

  • Juan M. Cebrián
  • Daniel Sánchez
  • Juan L. Aragón
  • Stefanos Kaxiras


Nowadays the market is dominated by processor architectures that employ multiple cores per chip. These architectures have different behavior depending on the applications running on the processor (parallel, multiprogrammed, sequential), but all happen to meet what is called the power and temperature wall. For future technologies (less than 22 nm) and a fixed die size, it is still uncertain the percentage of processor that can be simultaneously powered on. Power saving and power budget mechanisms can be useful to precisely control the amount of power been dissipated by the processor. After an initial analysis we discover that legacy power saving techniques work properly for matching a power budget in thread-independent and multi-programmed workloads, but not in parallel workloads. When running parallel shared-memory applications sacrificing some performance in a single core (thread) in order to be more energy-efficient can unintentionally delay the rest of cores (threads) due to synchronization points (locks/barriers), having a negative impact on global performance. In order to solve this problem we propose power token balancing (PTB) aimed at accurately matching an external power constraint by balancing the power consumed among the different cores. Experimental results show that PTB matches more accurately a predefined power budget (50 % of the original peak power) than other mechanisms like DVFS. The total energy consumed over the budget is reduced to only 8 % for a 16-core CMP with only a 3 % energy increase (overhead). We also introduce a novel mechanism named “Nitro”. Nitro will overclock the core that enters a critical section (delimited by locks) in order to free the lock as soon as possible. Experimental results have shown that Nitro is able to reduce the execution time of lock-intensive applications in more than 4 % by overclocking the frequency by 15 % in selected program phases over a period of time that represents a 22 % of the total execution time. We conclude the work with an analysis of the thermal effects of PTB in different CMP configurations using realistic power numbers and heatsink/fan configurations. Results show how PTB not only balances temperature between the different cores, reducing temperature gradient and increasing signal reliability, but also allows a reduction of 28–30 % of both average and peak temperatures for the studied benchmarks when a peak power budget of 50 % is exceeded.


Power consumption Power budget Power tokens Chip multiprocessor 



This work was supported by the Spanish MEC, MICINN and EU Commission FEDER funds under Grants CSD2006-00046 and TIN2009-14475-C04. Also by the EU-FP7 ICT Project “Embedded Reconfigurable Architecture (ERA)”, contract No. 249059.


  1. 1.
    Bhattacharjee A, Martonosi M (2009) Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In: Proceedings of the 36th annual international symposium on computer architecture, ISCA ’09, pp 290–301. ACM, New York, NY, USA.
  2. 2.
    Cai Q, Gonzalez J, Rakvic R, Magklis G, Chaparro P, Gonzalez A (2008) Meeting points: Using thread criticality to adapt multicore hardware to parallel regions. In: Proceedings of the international conference on parallel architectures and compilation techniques, pp 240–249Google Scholar
  3. 3.
    Cebrian JM, Aragon JL, Garcia JM, Petoumenos P, Kaxiras S (2009) Efficient microarchitecture policies for accurately adapting to power constraints. In: Proceedings of the IEEE international parallel and distributed processing, symposium, pp 1–12. doi:10.1109/IPDPS.2009.5161022
  4. 4.
    Cebrian JM, Aragon JL, Kaxiras S (2011) Power token balancing: adapting CMPS to power constraints for parallel multithreaded workloads. In: Proceedings of the IEEE international parallel and distributed processing symposiumGoogle Scholar
  5. 5.
    Donald J, Martonosi M (2006) Techniques for multicore thermal management: Classification and new exploration. In: Proceedings of the 33rd international symposium on computer, architecture, pp 78–88. doi:10.1109/ISCA.2006.39
  6. 6.
    Esmaeilzadeh H, Blem E, St. Amant R, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of the 38th annual international symposium on Computer architecture, ISCA ’11, pp 365–376. ACM, New York, NY, USA. doi:10.1145/2000064.2000108.
  7. 7.
    Flynn MJ, Hung P (2005) Microprocessor design issues: thoughts on the road ahead 25(3):16–31. doi:10.1109/MM.2005.56
  8. 8.
    Isci C, Buyuktosunoglu A, Cher CY, Bose P, Martonosi M (2006) An analysis of efficient multi-core global power management policies: maximizing performance for a given power budget. In: Proceedings of the 39th annual IEEE/ACM international symposium on microarchitecture, pp 347–358. doi:10.1109/MICRO.2006.8
  9. 9.
    Keshavarzi, A. (1997) Intrinsic iddq: origins, reduction, and applications in deep sub- low-power cmos ic’s. In: Proceedings of the IEEE international test conferenceGoogle Scholar
  10. 10.
    Kim NS, Austin T, Baauw D, Mudge T, Flautner K, Hu JS, Irwin MJ, Kandemir M, Narayanan V (2003) Leakage current: Moore’s law meets static power. Computer 36(12):68–75. doi:10.1109/MC.2003.1250885 CrossRefGoogle Scholar
  11. 11.
    Kim W, Gupta MS, Wei GY, Brooks D (2008) System level analysis of fast, per-core DVFS using on-chip switching regulators. In: Proceedings of the IEEE 14th international symposium on high performance computer, architecture, pp 123–134. doi:10.1109/HPCA.2008.4658633
  12. 12.
    Li J, Martinez JF, Huang MC (2004) The thrifty barrier: energy-aware synchronization in shared-memory multiprocessors. In: Proceedings of the 10th international symposium on high performance computer, architecture, pp 14–23. doi:10.1109/HPCA.2004.10018
  13. 13.
    Li S, Ahn JH, Strong RD, Brockman JB, Tullsen DM, Jouppi NP (2009) Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the 42th international symposium on microarchitecture, pp 469–480Google Scholar
  14. 14.
    Li T, Lebeck AR, Sorin DJ (2006) Spin detection hardware for improved management of multithreaded systems 17(6):508–521. doi:10.1109/TPDS.2006.78 Google Scholar
  15. 15.
    Macken P, Degrauwe M, Van Paemel M, Oguey H (1990) A voltage reduction technique for digital systems. In: Proceedings of the 37th IEEE international solid-state circuits conference, digest of technical papers, pp 238–239. doi:10.1109/ISSCC.1990.110213
  16. 16.
    Magnusson PS, Christensson M, Eskilson J, Forsgren D, Hallberg G, Hogberg J, Larsson F, Moestedt A, Werner B (2002) Simics: a full system simulation platform. Computer 35(2):50–58. doi:10.1109/2.982916 CrossRefGoogle Scholar
  17. 17.
    Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33:2005CrossRefGoogle Scholar
  18. 18.
    Meng K, Joseph R, Dick RP, Shang L (2008) Multi-optimization power management for chip multiprocessors. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques, PACT ’08, pp 177–186. ACM, New York, NY, USA.
  19. 19.
    Sartori J, Kumar R (2009) Distributed peak power management for many-core architectures. In: Proceedings of the design, automation and test in Europe conference and Exhibition, pp 1556–1559Google Scholar
  20. 20.
    Sasanka R, Hughes CJ, Adve SV (2002) Joint local and global hardware adaptations for energy. In: Proceedings of the 10th international conference on architectural support for programming languages and operating systems, ASPLOS-X, pp 144–155. ACM, New York, NY, USA.
  21. 21.
    Semeraro G, Magklis G, Balasubramonian R, Albonesi DH, Dwarkadas S, Scott ML (2002) Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In: Proceedings of the 8th international high-performance computer architecture, symposium, pp 29–40 doi:10.1109/HPCA.2002.995696
  22. 22.
    Simunic T, Benini L, Acquaviva A, Glynn P, de Micheli G (2001) Dynamic voltage scaling and power management for portable systems. In: Proceedings on design automation conference, pp 524–529. doi:10.1109/DAC.2001.156195
  23. 23.
    Skadron K, Stan MR, Huang W, Velusamy S, Sankaranarayanan K, Tarjan D (2003) Temperature-aware microarchitecture. In: Proceedings of the 30th annual international computer architecture, symposium, pp 2–13. doi:10.1109/ISCA.2003.1206984
  24. 24.
    Winter JA, Albonesi DH (2008) Addressing thermal nonuniformity in smt workloads. ACM Trans Archit Code Optim 5:4:1–4:28. Google Scholar
  25. 25.
    Wu Q, Juang P, Martonosi M, Clark DW (2005) Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors. In: Proceedings of the 11th international symposium on high-performance computer, architecture, pp 178–189. doi:10.1109/HPCA.2005.43

Copyright information

© Springer-Verlag Wien 2012

Authors and Affiliations

  • Juan M. Cebrián
    • 1
  • Daniel Sánchez
    • 1
  • Juan L. Aragón
    • 1
  • Stefanos Kaxiras
    • 2
  1. 1.Department of Computer ArchitectureUniversity of MurciaMurciaSpain
  2. 2.Department of Information TechnologyUniversity of UppsalaUppsalaSweden

Personalised recommendations