Design Space Exploration and Run-Time Adaptation for Multicore Resource Management Under Performance and Power Constraints

Reference work entry

Abstract

This chapter focuses on resource management techniques for performance or energy optimization in multi-/many-core systems. First, it gives a comprehensive overview about resource management in a broad perspective. Secondly, it discusses the possible optimization goals and constraints of resource management techniques: computational performance, power consumption, energy consumption, and temperature. Finally, it details the state-of-the-art techniques on resource management for performance optimization under power and thermal constraints, as well as for energy optimization under performance constraints.

Acronyms

DPM

Dynamic Power Management

DSE

Design Space Exploration

DSP

Digital Signal Processor

DTM

Dynamic Thermal Management

DVFS

Dynamic Voltage and Frequency Scaling

EOH

Extremal Optimization meta-Heuristic

EWFD

Equally-Worst-Fit-Decreasing

GIPS

Giga-Instruction Per Second

GPP

General-Purpose Processor

ILP

Instruction-Level Parallelism

IPC

Instructions Per Cycle

IPS

Instruction Per Second

ISA

Instruction-Set Architecture

ITRS

International Technology Roadmap for Semiconductors

LTF

Largest Task First

MPSoC

Multi-Processor System-on-Chip

NoC

Network-on-Chip

QoS

Quality of Service

SCC

Single Chip Cloud computer

SFA

Single Frequency Approximation

SoC

System-on-Chip

TDP

Thermal Design Power

TLP

Thread-Level Parallelism

TSP

Thermal Safe Power

Notes

Acknowledgements

This work was partly supported by the German Research Foundation (DFG) as part of the Transregional Collaborative Research Centre Invasive Computing [SFB/TR 89] http://invasic.de.

References

  1. 1.
    Al Faruque MA, Krist R, Henkel J (2008) ADAM: run-time agent-based distributed application mapping for on-chipcommunication. In: Proceedings of the 45th IEEE/ACM design automation conference (DAC), pp 760–765. doi:10.1145/1391469.1391664
  2. 2.
    Aydin H, Yang Q (2003) Energy-aware partitioning for multiprocessor real-time systems. In: Proceedings of 17th international parallel and distributed processing symposium (IPDPS), pp 113–121Google Scholar
  3. 3.
    Bienia C, Kumar S, Singh JP, Li K (2008) The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th international conference on parallel architectures and compilation techniques (PACT), pp 72–81Google Scholar
  4. 4.
    Binkert N, Beckmann B, Black G, Reinhardt SK, Saidi A, Basu A, Hestness J, Hower DR, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill MD, Wood DA (2011) The gem5 simulator. ACM SIGARCH Comput Archit News 39(2):1–7CrossRefGoogle Scholar
  5. 5.
    Carlson TE, Heirman W, Eyerman S, Hur I, Eeckhout L (2014) An evaluation of high-level mechanistic core models. ACM Trans Archit Code Optim (TACO) 11(3):28:1–28:25. doi:10.1145/2629677
  6. 6.
    Casazza J (2009) First the tick, now the tock: intel microarchitecture (Nehalem). White paper, Intel CorporationGoogle Scholar
  7. 7.
    Ceng J, Castrillon J, Sheng W, Scharwächter H, Leupers R, Ascheid G, Meyr H, Isshiki T, Kunieda H (2008) MAPS: an integrated framework for MPSoC application parallelization. In: Proceedings of the 45th IEEE/ACM design automation conference (DAC), pp 754–759. doi:10.1145/1391469.1391663
  8. 8.
    Charles J, Jassi P, Ananth NS, Sadat A, Fedorova A (2009) Evaluation of the Intel core i7 turbo boost feature. In: IISWC, pp 188–197Google Scholar
  9. 9.
    Chen JJ, Hsu HR, Kuo TW (2006) Leakage-aware energy-efficient scheduling of real-time tasks in multiprocessor systems. In: Proceedings of the 12th IEEE real-time and embedded technology and applications symposium (RTAS), pp 408–417Google Scholar
  10. 10.
    Chen JJ, Thiele L (2010) Energy-efficient scheduling on homogeneous multiprocessor platforms. In: Proceedings of the ACM symposium on applied computing (SAC), pp 542–549Google Scholar
  11. 11.
    Choi J, Oh H, Kim S, Ha S (2012) Executing synchronous dataflow graphs on a SPM-based multicore architecture. In: Proceedings of the 49th IEEE/ACM design automation conference (DAC), pp 664–671. doi:10.1145/2228360.2228480
  12. 12.
    Devadas V, Aydin H (2010) Coordinated power management of periodic real-time tasks on chip multiprocessors. In: Proceedings of the international conference on green computing (GREENCOMP), pp 61–72Google Scholar
  13. 13.
    Elewi A, Shalan M, Awadalla M, Saad EM (2014) Energy-efficient task allocation techniques for asymmetric multiprocessor embedded systems. ACM Trans Embed Comput Syst (TECS) 13(2s):71:1–71:27Google Scholar
  14. 14.
    Grenat A, Pant S, Rachala R, Naffziger S (2014) 5.6 adaptive clocking system for improved power efficiency in a 28nm x86-64 microprocessor. In: IEEE international solid-state circuits conference digest of technical papers (ISSCC), pp 106–107Google Scholar
  15. 15.
    Han JJ, Wu X, Zhu D, Jin H, Yang L, Gaudiot JL (2012) Synchronization-aware energy management for vfi-based multicore real-time systems. IEEE Trans Comput (TC) 61(12):1682–1696MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Hanumaiah V, Vrudhula S, Chatha KS (2011) Performance optimal online DVFS and task migration techniques for thermally constrained multi-core processors. Trans Comput Aided Des Integr Circuits Syst (TCAD) 30(11):1677–1690CrossRefGoogle Scholar
  17. 17.
    Henkel J, Khdr H, Pagani S, Shafique M (2015) New trends in dark silicon. In: Proceedings of the 52nd ACM/EDAC/IEEE design automation conference (DAC), pp 119:1–119:6. doi:10.1145/2744769.2747938
  18. 18.
    Herbert S, Marculescu D (2007) Analysis of dynamic voltage/frequency scaling in chip-multiprocessors. In: Proceedings of the international symposium on low power electronics and design (ISLPED), pp 38–43Google Scholar
  19. 19.
    Howard J, Dighe S, Vangal S, Ruhl G, Borkar N, Jain S, Erraguntla V, Konow M, Riepen M, Gries M, Droege G, Lund-Larsen T, Steibl S, Borkar S, De V, Van Der Wijngaart R (2011) A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling. IEEE J Solid State Circuits 46(1):173–183. doi:10.1109/JSSC.2010.2079450 CrossRefGoogle Scholar
  20. 20.
    Huang W, Ghosh S, Velusamy S, Sankaranarayanan K, Skadron K, Stan MR (2006) HotSpot: a compact thermal modeling methodology for early-stage VLSI design. IEEE Trans VLSI Syst 14(5):501–513. doi:10.1109/TVLSI.2006.876103 CrossRefGoogle Scholar
  21. 21.
    Intel Corporation (2007) Dual-core intel xeon processor 5100 series datasheet, revision 003Google Scholar
  22. 22.
    Intel Corporation (2008) Intel turbo boost technology in Intel CoreTM microarchitecture (Nehalem) based processors. White paperGoogle Scholar
  23. 23.
    Intel Corporation (2010) SCC external architecture specification (EAS), revision 0.98Google Scholar
  24. 24.
    International technology roadmap for semiconductors (ITRS), 2011 edition. www.itrs.net
  25. 25.
    Jahn J, Pagani S, Kobbe S, Chen JJ, Henkel J (2013) Optimizations for configuring and mapping software pipelines in manycore. In: Proceedings of the 50th IEEE/ACM design automation conference (DAC), pp 130:1–130:8. doi:10.1145/2463209.2488894
  26. 26.
    Javaid H, Parameswaran S (2009) A design flow for application specific heterogeneous pipelined multiprocessor systems. In: Proceedings of the 46th IEEE/ACM design automation conference (DAC), pp 250–253. doi:10.1145/1629911.1629979
  27. 27.
    Jejurikar R, Pereira C, Gupta R (2004) Leakage aware dynamic voltage scaling for real-time embedded systems. In: Proceedings of the 41st design automation conference (DAC), pp 275–280Google Scholar
  28. 28.
    Khdr H, Pagani S, Shafique M, Henkel J (2015) Thermal constrained resource management for mixed ILP-TLP workloads in dark silicon chips. In: Proceedings of the 52nd ACM/EDAC/IEEE design automation conference (DAC), pp 179:1–179:6. doi:10.1145/2744769.2744916
  29. 29.
    Kong F, Yi W, Deng Q (2011) Energy-efficient scheduling of real-time tasks on cluster-based multicores. In: Proceedings of the 14th design, automation and test in Europe (DATE), pp 1–6Google Scholar
  30. 30.
    Kultursay E, Swaminathan K, Saripalli V, Narayanan V, Kandemir MT, Datta S (2012) Performance enhancement under power constraints using heterogeneous CMOS-TFET multicores. In: Proceedings of the 8th international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 245–254Google Scholar
  31. 31.
    Li S, Ahn JH, Strong R, Brockman J, Tullsen D, Jouppi N (2009) McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture (MICRO), pp 469–480Google Scholar
  32. 32.
    Li Y, Henkel J (1998) A framework for estimation and minimizing energy dissipation of embedded hw/sw systems. In: Proceedings of the 35th ACM/IEEE design automation conference (DAC), pp 188–193. doi:10.1145/277044.277097
  33. 33.
    Lin LY, Wang CY, Huang PJ, Chou CC, Jou JY (2005) Communication-driven task binding for multiprocessor with latency insensitive network-on-chip. In: The 15th Asia and South Pacific design automation conference (ASP-DAC), pp 39–44. doi:10.1145/1120725.1120739
  34. 34.
    Mallik A, Marwedel P, Soudris D, Stuijk S (2010) MNEMEE: a framework for memory management and optimization of static and dynamic data in MPSoCs. In: Proceedings of the international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 257–258. doi:10.1145/1878921.1878959
  35. 35.
    Martin G (2006) Overview of the MPSoC design challenge. In: Proceedings of the 43rd IEEE/ACM design automation conference (DAC), pp 274–279. doi:10.1109/DAC.2006.229245
  36. 36.
    Moreno G, de Niz D (2012) An optimal real-time voltage and frequency scaling for uniform multiprocessors. In: Proceedings of the 18th IEEE international conference on embedded and real-time computing systems and applications (RTCSA), pp 21–30Google Scholar
  37. 37.
    Muthukaruppan T, Pricopi M, Venkataramani V, Mitra T, Vishin S (2013) Hierarchical power management for asymmetric multi-core in dark silicon era. In: DAC, pp 174:1–174:9Google Scholar
  38. 38.
    Muthukaruppan TS, Pathania A, Mitra T (2014) Price theory based power management for heterogeneous multi-cores. In: Proceedings of the 19th international conference on architectural support for programming languages and operating systems (ASPLOS), pp 161–176Google Scholar
  39. 39.
    Nikitin N, Cortadella J (2012) Static task mapping for tiled chip multiprocessors with multiple voltage islands. In: Proceedings of the 25th international conference on architecture of computing systems (ARCS), pp 50–62Google Scholar
  40. 40.
    Orsila H, Kangas T, Salminen E, Hämäläinen TD, Hännikäinen M (2007) Automated memory-aware application distribution for multi-processor system-on-chips. J Syst Archit 53(11):795–815. doi:10.1016/j.sysarc.2007.01.013 CrossRefGoogle Scholar
  41. 41.
    Pagani S, Chen JJ (2013) Energy efficiency analysis for the single frequency approximation (SFA) scheme. In: Proceedings of the 19th IEEE international conference on embedded and real-time computing systems and applications (RTCSA), pp 82–91. doi:10.1109/RTCSA.2013.6732206
  42. 42.
    Pagani S, Chen JJ (2013) Energy efficient task partitioning based on the single frequency approximation scheme. In: Proceedings of the 34th IEEE real-time systems symposium (RTSS), pp 308–318. doi:10.1109/RTSS.2013.38
  43. 43.
    Pagani S, Chen JJ, Henkel J (2015) Energy and peak power efficiency analysis for the single voltage approximation (SVA) scheme. IEEE Trans Comput Aided Des Integr Circuits Syst (TCAD) 34(9):1415–1428. doi:10.1109/TCAD.2015.2406862 CrossRefGoogle Scholar
  44. 44.
    Pagani S, Chen JJ, Li M (2015) Energy efficiency on multi-core architectures with multiple voltage islands. IEEE Trans Parallel Distrib Syst (TPDS) 26(6):1608–1621. doi:10.1109/TPDS.2014.2323260 CrossRefGoogle Scholar
  45. 45.
    Pagani S, Chen JJ, Shafique M, Henkel J (2015) MatEx: efficient transient and peak temperature computation for compact thermal models. In: Proceedings of the 18th design, automation and test in Europe (DATE), pp 1515–1520Google Scholar
  46. 46.
    Pagani S, Khdr H, Chen JJ, Shafique M, Li M, Henkel J (2016) Thermal safe power: efficient thermal-aware power budgeting for manycore systems in dark silicon. In: The dark side of silicon. SpringerMATHGoogle Scholar
  47. 47.
    Pagani S, Khdr H, Munawar W, Chen JJ, Shafique M, Li M, Henkel J (2014) TSP: thermal safe power – efficient power budgeting for many-core systems in dark silicon. In: The international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 10:1–10:10. doi:10.1145/2656075.2656103
  48. 48.
    Pagani S, Shafique M, Khdr H, Chen JJ, Henkel J (2015) seBoost: selective boosting for heterogeneous manycores. In: Proceedings of the 10th IEEE/ACM international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 104–113Google Scholar
  49. 49.
    Pinckney N, Sewell K, Dreslinski RG, Fick D, Mudge T, Sylvester D, Blaauw D (2012) Assessing the performance limits of parallelized near-threshold computing. In: 49th design automation conference (DAC), pp 1147–1152Google Scholar
  50. 50.
    Quan W, Pimentel AD (2015) A hybrid task mapping algorithm for heterogeneous MPSoCs. ACM Trans Embed Comput Syst (TECS) 14(1):14:1–14:25. doi:10.1145/2680542
  51. 51.
    Raghavan A, Luo Y, Chandawalla A, Papaefthymiou M, Pipe KP, Wenisch TF, Martin MMK (2012) Computational sprinting. In: Proceedings of the IEEE 18th international symposium on high-performance computer architecture (HPCA), pp 1–12Google Scholar
  52. 52.
    Raghunathan B, Turakhia Y, Garg S, Marculescu D (2013) Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors. In: DATE, pp 39–44Google Scholar
  53. 53.
    Rotem E, Naveh A, Rajwan D, Ananthakrishnan A, Weissmann E (2012) Power-management architecture of the Intel microarchitecture code-named sandy bridge. IEEE Micro 32(2):20–27CrossRefGoogle Scholar
  54. 54.
    Samsung Electronics Co., Ltd.: Exynos 5 Octa (5422). www.samsung.com/exynos
  55. 55.
    Sartori J, Kumar R (2009) Three scalable approaches to improving many-core throughput for a given peak power budget. In: International conference on high performance computing (HiPC), pp 89–98Google Scholar
  56. 56.
    Schor L, Bacivarov I, Rai D, Yang H, Kang SH, Thiele L (2012) Scenario-based design flow for mapping streaming applications onto on-chip many-core systems. In: Proceedings of the 15th international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 71–80. doi:10.1145/2380403.2380422
  57. 57.
    Seo E, Jeong J, Park SY, Lee J (2008) Energy efficient scheduling of real-time tasks on multicore processors. IEEE Trans Parallel Distrib Syst (TPDS) 19(11):1540–1552. doi:10.1109/TPDS.2008.104 CrossRefGoogle Scholar
  58. 58.
    Shafique M, Gnad D, Garg S, Henkel J (2015) Variability-aware dark silicon management in on-chip many-core systems. In: Proceedings of the 18th design, automation and test in Europe (DATE), pp 387–392Google Scholar
  59. 59.
    Sharifi S, Coskun AK, Rosing TS (2010) Hybrid dynamic energy and thermal management in heterogeneous embedded multiprocessor SoCs. In: Proceedings of the Asia and South Pacific design automation conference (ASP-DAC), pp 873–878Google Scholar
  60. 60.
    Singh AK, Kumar A, Srikanthan T (2011) A hybrid strategy for mapping multiple throughput-constrained applications on MPSoCs. In: Proceedings of the 14th international conference on compilers, architectures and synthesis for embedded systems (CASES), pp 175–184. doi:10.1145/2038698.2038726
  61. 61.
    Smit L, Smit G, Hurink J, Broersma H, Paulusma D, Wolkotte P (2004) Run-time mapping of applications to a heterogeneous reconfigurable tiled system on chip architecture. In: Proceedings of the IEEE international conference on field-programmable technology (FPT), pp 421–424. doi:10.1109/FPT.2004.1393315
  62. 62.
    Tan C, Muthukaruppan T, Mitra T, Ju L (2015) Approximation-aware scheduling on heterogeneous multi-core architectures. In: The 20th Asia and South Pacific design automation conference (ASP-DAC), pp 618–623Google Scholar
  63. 63.
    Weichslgartner A, Gangadharan D, Wildermann S, GlaßM, Teich J (2014) DAARM: design-time application analysis and run-time mapping for predictable execution in many-core systems. In: Proceedings of the international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 34:1–34:10. doi:10.1145/2656075.2656083
  64. 64.
    Wu X, Zeng Y, Han JJ (2013) Energy-efficient task allocation for VFI-based real-time multi-core systems. In: Proceedings of the international conference on information science and cloud computing companion (ISCC-C), pp 123–128Google Scholar
  65. 65.
    Xu R, Zhu D, Rusu C, Melhem R, Mossé D (2005) Energy-efficient policies for embedded clusters. In: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on languages, compilers, and tools for embedded systems (LCTES), pp 1–10Google Scholar
  66. 66.
    Yang CY, Chen JJ, Kuo TW (2005) An approximation algorithm for energy-efficient scheduling on a chip multiprocessor. In: Proceedings of the 8th design, automation and test in Europe (DATE), pp 468–473Google Scholar
  67. 67.
    Yang CY, Chen JJ, Kuo TW, Thiele L (2009) An approximation scheme for energy-efficient scheduling of real-time tasks in heterogeneous multiprocessor systems. In: Proceedings of the 12th design, automation and test in Europe (DATE), pp 694–699Google Scholar
  68. 68.
    Ykman-Couvreur C, Hartmann PA, Palermo G, Colas-Bigey F, San L (2012) Run-time resource management based on design space exploration. In: Proceedings of the 8th IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis (CODES+ISSS), pp 557–566. doi:10.1145/2380445.2380530

Copyright information

© Springer Science+Business Media Dordrecht 2017

Authors and Affiliations

  • Santiago Pagani
    • 1
  • Muhammad Shafique
    • 1
  • Jörg Henkel
    • 1
  1. 1.Chair for Embedded Systems (CES)Karlsruhe Institute of Technology (KIT)KarlsruheGermany

Personalised recommendations