Advertisement

Thermal Management in Many Core Systems

  • Dhireesha Kudithipudi
  • Qinru Qu
  • Ayse K. Coskun
Part of the Studies in Computational Intelligence book series (SCI, volume 432)

Abstract

High power densities and operating temperatures in multi-processor systems impose a number of undesirable effects, including performance degradation, high operational and cooling costs, and reliability deterioration leading to system failures. Many-core systems bring exciting opportunities in design and system management owing to the ample hardware parallelism, while introducing novel challenges due to their complexity and the highly variant workload that is expected to run on these systems. Efficient thermal monitoring and management, designing thermally-aware architectures, and multi-level parameter optimization can alleviate some of the undesirable thermal effects while maintaining the desired performance and energy levels. In particular, these techniques aid in the evolution of green computing systems. This chapter provides a qualitative discussion on thermal management techniques for many-core systems. We will elucidate the following questions in detail: What are the specific design challenges in monitoring the temperature of large-scale systems? How can we exploit the multi-level optimizations at runtime in response to the dynamic behavior of the processor’s workload? How do emerging workloads affect the thermal distribution on many-core systems?

Keywords

Thermal Sensor Thermal Management Sensor Placement Core System Sequential Probability Ratio Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alley, R., Soto, M., Kwark, L., Crocco, P., Koester, D.: Modeling and validation of on-die cooling of dual-core cpu using embedded thermoelectric devices. In: Twenty-fourth Annual IEEE Semiconductor Thermal Measurement and Management Symposium, Semi-Therm 2008, pp. 77–82 (March 2008)Google Scholar
  2. 2.
    Memik, S.O., Mukherjee, R., Ni, M., Long, J.: Optimizing Thermal Sensor Allocation for Microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27(3), 516–527 (2008), http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4454017 CrossRefGoogle Scholar
  3. 3.
    Viswanath, R., Wakharkar, V., Watwe, A., Lebonheur, V.: Thermal performance challenges from silicon to systems. Intel Technology Journal Q3, 1–16 (2000)Google Scholar
  4. 4.
    Xiang, Y., Chantem, T., Dick, R.P., Hu, X.S., Shang, L.: System-level reliability modeling for mpsocs. In: Proceedings of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES/ISSS 2010, pp. 297–306 (2010), http://doi.acm.org/10.1145/1878961.1879013
  5. 5.
    Borkar, S.: Design challenges of technology scaling. IEEE Micro 19(4), 23–29 (1999)CrossRefGoogle Scholar
  6. 6.
    Gronowski, P., Bowhill, W., Preston, R., Gowan, M., Allmon, R.: High-performance microprocessor design. IEEE Journal of Solid-State Circuits 33(5), 676–686 (1998)CrossRefGoogle Scholar
  7. 7.
    Skadron, K., Huang, W.: Analytical model for sensor placement on microprocessors. In: 2005 International Conference on Computer Design, pp. 24–27 (2005), http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1524125
  8. 8.
    Mukherjee, R., Memik, S.O.: Systematic temperature sensor allocation and placement for microprocessors. In: Proceedings of the 43rd Annual Design Automation Conference, DAC 2006, pp. 542–547. ACM, New York (2006)CrossRefGoogle Scholar
  9. 9.
    Kwasinski, A., Kudithipudi, D.: Towards integrated circuit thermal profiling for reduced power consumption: Evaluation of distributed sensing techniques. In: Proceedings of the International Conference on Green Computing, GREENCOMP 2010, pp. 503–508. IEEE Computer Society, Washington, DC (2010)CrossRefGoogle Scholar
  10. 10.
    Yun, X.: On-Chip Thermal Sensor Placement, Master’s, University of Massachusetts Amherst (2008), http://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1242&context=theses
  11. 11.
    Dellaquila, K.: Thermal Profiling of Homogeneous Multi-Core Processors Using Sensor Mini-Networks, Master’s, Rochester institute of Technology (2010), https://ritdml.rit.edu/bitstream/.../KDellaquilaThesis8-2010.pdf?...1
  12. 12.
    SPEC-CPU 2000, Standard Performance Evaluation Council, Performance Evaluation in the New Millennium, Version 1.1 (2000)Google Scholar
  13. 13.
    Skadron, K., Lee, K.: Using Performance Counters for Runtime Temperature Sensing in High-Performance Processors. In: 19th IEEE International Parallel and Distributed Processing Symposium, pp. 232a–232a (2005), http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1420152
  14. 14.
    Burger, D., Austin, T.: SimpleScalar Tutorial. In: 30th International Symposium on (1997), http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:SimpleScalar+Tutorial#2
  15. 15.
    Macqueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  16. 16.
    Long, J., Memik, S., Memik, G., Mukherjee, R.: Thermal monitoring mechanisms for chip multiprocessors. ACM Transactions on Architecture and Code Optimization 5(2), 1–33 (2008), http://portal.acm.org/citation.cfm?doid=1400112.1400114 CrossRefGoogle Scholar
  17. 17.
    Skadron, K., Stan, M., Huang, W., Velusamy, S.: Temperature-aware microarchitecture: Extended discussion and results. University of Virginia, Department of Computer Science (2003), http://scholar.google.com/scholar?q=intitle:Temperature-Aware+Microarchitecture:+Extended+Discussion+and+Results#0
  18. 18.
    Sabuncu, M.R., Ramadge, P.J.: Gradient based nonuniform subsampling for information-theoretic alignment methods. In: 26th Annual International Conference of the IEEE on Engineering in Medicine and Biology Society (IEMBS), pp. 1683–1686 (2004)Google Scholar
  19. 19.
    Cochran, R., Nowroz, A.N., Reda, S.: Post-silicon power characterization using thermal infrared emissions. In: ISLPED, pp. 331–336 (2010)Google Scholar
  20. 20.
    Mesa-Martinez, F.J., Nayfach-Battilana, J., Renau, J.: Power model validation through thermal measurements. In: ISCA, pp. 302–311 (2007)Google Scholar
  21. 21.
    Pedram, M., Nazarian, S.: Thermal modeling, analysis, and management in vlsi circuits: Principles and methods. Proceedings of the IEEE 94(8), 1487–1501 (2006)CrossRefGoogle Scholar
  22. 22.
    Isci, C., Martonosi, M.: Runtime power monitoring in high-end processors: Methodology and empirical data. In: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 36, p. 93 (2003)Google Scholar
  23. 23.
    Khan, M.A., Hankendi, C., Coskun, A.K., Herbordt, M.C.: Software optimization for performance, energy, and thermal distribution: Initial case studies. In: IEEE Workshop on Thermal Modeling and Management: From Chips to Data Centers (in conj. with Green Computing Conference (IGCC)) (2011)Google Scholar
  24. 24.
    Brooks, D., Tiwari, V., Martonosi, M.: Wattch: a framework for architectural-level power analysis and optimizations. In: ISCA, pp. 83–94 (2000)Google Scholar
  25. 25.
    Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 469–480 (2009)Google Scholar
  26. 26.
    Skadron, K., Stan, M., Huang, W., Velusamy, S., Sankaranarayanan, K., Tarjan, D.: Temperature-Aware Microarchitecture. In: ISCA, pp. 2–13 (2003)Google Scholar
  27. 27.
    Skadron, K., Stan, M.R., Sankaranarayanan, K., Huang, W., Velusamy, S., Tarjan, D.: Temperature-aware microarchitecture: Modeling and implementation. In: TACO, vol. 1(1), pp. 94–125 (2004)Google Scholar
  28. 28.
    Atienza, D., Valle, P.D., Paci, G., Poletti, F., Benini, L., Micheli, G.D., Mendias, J.M.: A Fast HW/SW FPGA-Based Thermal Emulation Framework for Multi-Processor System-on-Chip. In: Design Automation Conference (DAC), pp. 618–623 (2006)Google Scholar
  29. 29.
    Coskun, A.K., Atienza, D., Rosing, T.S., Brunschwiler, T., Michel, B.: Energy-efficient variable-flow liquid cooling in 3d stacked architectures. In: DATE, pp. 111–116 (2010)Google Scholar
  30. 30.
    Link, G.M., Vijaykrishnan, N.: Thermal trends in emerging technologies. In: Proceedings of the 7th International Symposium on Quality Electronic Design, ISQED 2006, pp. 625–632. IEEE Computer Society, Washington, DC (2006)CrossRefGoogle Scholar
  31. 31.
    Sridhar, A., Vincenzi, A., Ruggiero, M., Atienza, D., Brunschwiler, T.: 3d-ice: Fast compact transient thermal modeling for 3D-ICs with inter-tier liquid cooling. In: International Conference on Computer-Aided Design, ICCAD 2010 (2010)Google Scholar
  32. 32.
    Huang, W., Stan, M.R., Skadron, K., Sankaranarayanan, K., Ghosh, S., Velusam, S.: Compact thermal modeling for temperature-aware design. In: Proceedings of the 41st Annual Design Automation Conference, DAC 2004, pp. 878–883. ACM, New York (2004)CrossRefGoogle Scholar
  33. 33.
    Velusamy, S., Huang, W., Lach, J., Stan, M.R., Skadron, K.: Monitoring temperature in fpga based socs. In: ICCD, pp. 634–640 (2005)Google Scholar
  34. 34.
    Sridhar, A., Vincenzi, A., Ruggiero, M., Brunschwiler, T., Atienza Alonso, D.: Compact transient thermal model for 3D ICs with liquid cooling via enhanced heat transfer cavity geometries. In: Proceedings of the 16th International Workshop on Thermal Investigations of ICs and Systems (THERMINIC 2010), vol. 1(1), pp. 105–110. IEEE Press, New York (2010)Google Scholar
  35. 35.
    Hung, W.-L., Link, G.M., Xie, Y., Vijaykrishnan, N., Irwin, M.J.: Interconnect and thermal-aware floorplanning for 3d microprocessors. In: ISQED, pp. 98–104 (2006)Google Scholar
  36. 36.
    Sharifi, S., Rosing, T.V.: Accurate direct and indirect on-chip temperature sensing for efficient dynamic thermal management. Trans. Comp.-Aided Des. Integ. Cir. Sys. 29, 1586–1599 (2010)CrossRefGoogle Scholar
  37. 37.
    Cochran, R., Reda, S.: Spectral techniques for high-resolution thermal characterization with limited sensor data. In: DAC, pp. 478–483 (2009)Google Scholar
  38. 38.
    Coskun, A.K., Rosing, T.V., Gross, K.C.: Utilizing Predictors for Efficient Thermal Management in Multiprocessor SoCs. IEEE Transactions on CAD 28, 1503–1516 (2009)Google Scholar
  39. 39.
    Yeo, I., Liu, C.C., Kim, E.J.: Predictive dynamic thermal management for multicore systems. In: DAC, pp. 734–739 (June 2008)Google Scholar
  40. 40.
    Coskun, A.K., Rosing, T., Gross, K.: Proactive Temperature Balancing for Low-Cost Thermal Management in MPSoCs. In: International Conference on Computer-Aided Design (ICCAD), pp. 250–257 (2008)Google Scholar
  41. 41.
    Cochran, R., Reda, S.: Consistent runtime thermal prediction and control through workload phase detection. In: Design Automation Conference, DAC (2010)Google Scholar
  42. 42.
    Yeo, I., Liu, C.C., Kim, E.J.: Predictive dynamic thermal management for multicore systems. In: Proceedings of the 45th Annual Design Automation Conference, DAC 2008, pp. 734–739. ACM, New York (2008), http://doi.acm.org/10.1145/1391469.1391658 CrossRefGoogle Scholar
  43. 43.
    Coskun, A., Rosing, T., Gross, K.: Utilizing predictors for efficient thermal management in multiprocessor socs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 28(10), 1503–1516 (2009)CrossRefGoogle Scholar
  44. 44.
    Ge, Y., Malani, P., Qiu, Q.: Distributed task migration for thermal management in many-core systems. In: 2010 47th ACM/IEEE on Design Automation Conference (DAC), pp. 579–584 (June 2010)Google Scholar
  45. 45.
    Bartolini, A., Cacciari, M., Tilli, A., Benini, L.: A distributed and self-calibrating model-predictive controller for energy and thermal management of high-performance multicores. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 1–6 (March 2011)Google Scholar
  46. 46.
    Ebi, T., Faruque, M., Henkel, J.: Tape: Thermal-aware agent-based power econom multi/many-core architectures. In: IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers, ICCAD 2009, pp. 302–309 (November 2009)Google Scholar
  47. 47.
    Ge, Y., Qiu, Q.: Dynamic thermal management for multimedia applications using machine learning. In: 2011 48th ACM/EDAC/IEEE on Design Automation Conference (DAC), pp. 95–100 (June 2011)Google Scholar
  48. 48.
    Coskun, A., Rosing, T., Gross, K.: Temperature management in multiprocessor socs using online learning. In: 45th ACM/IEEE on Design Automation Conference, DAC 2008, pp. 890–893 (June 2008)Google Scholar
  49. 49.
    Pakbaznia, E., Ghasemazar, M., Pedram, M.: Temperature-aware dynamic resource provisioning in a power-optimized datacenter. In: Design, Automation Test in Europe Conference Exhibition (DATE), pp. 124–129 (March 2010)Google Scholar
  50. 50.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Dhireesha Kudithipudi
    • 1
  • Qinru Qu
    • 2
  • Ayse K. Coskun
    • 3
  1. 1.Computer Engineering DepartmentRochester Institute of TechnologyRochesterUSA
  2. 2.Electrical and Computer Engineering DepartmentSyracuse UniversitySyracuseUSA
  3. 3.Electrical and Computer Engineering DepartmentBoston UniversityBostonUSA

Personalised recommendations