Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures

Abstract

Continuous improvements in integration scale have made possible the inclusion of several processor cores on the same chip. Such designs have been named chip-multiprocessors (or CMPs) and constitute a good alternative to traditional monolithic designs for several reasons, among others, better levels of performance, scalability, and performance/energy ratio. On the other hand, higher clock frequencies and increasing number of transistors available on a single chip have revealed energy consumption as a critical design issue in current and future microarchitectures. In these architectures, the design of the on-chip interconnection network has proven to have significant impact on overall system performance and energy consumption, and that the wires used in such interconnect can be designed with varying latency, bandwidth, and power characteristics.

In this work, we present a detailed characterization of the energy-efficiency of a CMP for parallel scientific applications using Sim-PowerCMP, a detailed architectural-level power-performance simulation tool for CMP architectures that integrates several well-known contemporary simulators (RSIM, Hot Leakage and Orion) into a single framework that allows precise analysis and optimization of power dissipation (both dynamic and static) taking into account performance. In this characterization, we pay special attention to the energy consumed on the interconnection network. Results for an 8- and 16-core CMP show that the most power consuming messages are the replies that carry data (almost 70% on average of the total energy consumed in the interconnect) although they represent 30% of the total number of messages. Furthermore, we show that using on-chip wires with varying latency, bandwidth, and energy characteristics can reduce the energy dissipated by the links of the interconnection network about 65% with an average impact of 10% in the execution time.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Austin T, Larson E, Ernst D (2002) SimpleScalar: an infrastructure for computer system modeling. Computer 35(2):59–67

  2. 2.

    Balasubramonian R, Muralimanohar N, Ramani K, Venkatachalapathy V (2005) Microarchitectural wire management for performance and power in partitioned architectures. In: Proc of the 11th int’l symp on high-performance computer architecture (HPCA-11), pp 28–39

  3. 3.

    Banerjee K, Mehrotra A (2002) A power-optimal repeater insertion methodology for global interconnects in nanometer designs. IEEE Trans Electron Devices 49(11):2001–2007

  4. 4.

    Binkert NL, Dreslinski RG, Hsu LR, Lim KT, Saidi AG, Reinhardt SK (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60

  5. 5.

    Brooks D, Tiwari V, Martonosi M (2000) Wattch: a framework for architectural-level power analysis and optimizations. In: Proc of the 27th int’l symp on computer architecture (ISCA-27), pp 83–94

  6. 6.

    Chen J, Dubois M, Stenström P (2003) Integrating complete-system and user-level performance/power simulators: the SimWattch approach. In: Proc of the 2003 IEEE int’l symp on performance analysis of systems and software

  7. 7.

    Cheng L, Muralimanohar N, Ramani K, Balasubramonian R, Carter J (2006) Interconnect-aware coherence protocols for chip multiprocessors. In: Proc of the 33rd int’l symp on computer architecture (ISCA-33), pp 339–351

  8. 8.

    Fernandez-Pascual R, Garcia JM (2005) RSIM x86: a cost effective performance simulator. In: Proc of the 19th European conf on modelling and simulation

  9. 9.

    Flores A, Aragón JL, Acacio ME (2007) Sim-PowerCMP: a detailed simulator for energy consumption analysis in future embedded CMP architectures. In: Proc of the 4th int’l symp on embedded computing (SEC-4)

  10. 10.

    Hughes CJ, Pai VS, Ranganathan P, Adve SV (2002) RSIM: simulating shared-memory multiprocessors with ILP processors. IEEE Comput 35(2):40–49

  11. 11.

    Kahle JA, Day MN, Hofstee HP, Johns CR, Maeurer TR, Shippy D (2005) Introduction to the cell multiprocessor. IBM J Res Dev 49(4/5):589–604

  12. 12.

    Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proc of the 10th int’l conf on architectural support for programming languages and operating systems (ASPLOS-10), pp 211–222

  13. 13.

    Kumar R, Zyuban V, Tullsen DM (2005) Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling. In: Proc of the 32nd int’l symp on computer architecture (ISCA-32), pp 408–419

  14. 14.

    Liu C, Sivasubramaniam A, Kandemir M (2004) Organizing the last line of defense before hitting the memory wall for CMPs. In: 10th int’l symp on high performance computer architecture (HPCA-10), pp 176–185

  15. 15.

    Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect-power dissipation in a microprocessor. In: Proc of the 2004 int’l workshop on system level interconnect prediction (SLIP’04), pp 7–13

  16. 16.

    Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput Archit News 33(4):92–99

  17. 17.

    Monchiero M, Canal R, Gonzalez A (2006) Design space exploration for multicore architectures: a power/performance/thermal view. In: ICS ’06: proc of the 20th int’l conf on supercomputing, pp 177–186

  18. 18.

    Renau J SESC Website http://sourceforge.net/projects/sesc

  19. 19.

    Shang L, Peh L, Jha N (2003) Dynamic voltage scaling with links for power optimization of interconnection networks. In: Proc of the 9th int’l symp on high-performance computer architecture (HPCA-9), pp 91–102

  20. 20.

    Shivakumar P, Jouppi NP (2001) Cacti 3.0: an integrated cache timing, power and area model. Technical report, Western Research Lab (WRL)

  21. 21.

    Singh J, Weber W-D, Gupta A (1992) SPLASH: Stanford parallel applications for shared-memory. Comput Archit News 20(1):5–44

  22. 22.

    Sohi GS (1990) Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Trans Comput 39(3):349–359

  23. 23.

    Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Johnson P, Lee J-W, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The Raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro 22(2):25–35

  24. 24.

    Wang H, Peh L-S, Malik S (2003) Power-driven design of router microarchitectures in on-chip networks. In: Proc of the 36th int’l symp on microarchitecture (MICRO-36), pp 105–111

  25. 25.

    Wang H-S, Peh L-S, Malik S (2003) A power model for routers: modeling alpha 21364 and Infiniband routers. IEEE Micro 23(1):26–35

  26. 26.

    Wang H-S, Zhu X, Peh L-S, Malik S (2002) Orion: a power-performance simulator for interconnection networks. In: Proc of the 35th int’l symp on microarchitecture (MICRO-35), pp 294–305

  27. 27.

    Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proc of the 22nd int’l symp on computer architecture (ISCA-22), pp 24–36

  28. 28.

    Zhang M, Asanovic K (2005) Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: Proc of the 32nd int’l symp on computer architecture (ISCA-32), pp 336–345

  29. 29.

    Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M (2003) HotLeakage: a temperature-aware model of subthreshold and gate leakage for architects. Technical report, University of Virginia

  30. 30.

    Zhao L, Iyer R, Makineni S, Moses J, Illikkal R, Newell D (2007) Performance, area and bandwidth implications on large-scale CMP cache design. In: Proc of the 1st workshop on chip multiprocessor memory systems and interconnects (CMP-MSI’07). In conjunction with HPCA-13

Download references

Author information

Correspondence to Antonio Flores.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Flores, A., Aragón, J.L. & Acacio, M.E. An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures. J Supercomput 45, 341–364 (2008). https://doi.org/10.1007/s11227-008-0178-0

Download citation

Keywords

  • Chip-multiprocessor
  • Power dissipation model
  • Microarchitectural level simulator
  • Heterogeneus on-chip interconnection network
  • Parallel scientific applications