Abstract
Continuous improvements in integration scale have made possible the inclusion of several processor cores on the same chip. Such designs have been named chip-multiprocessors (or CMPs) and constitute a good alternative to traditional monolithic designs for several reasons, among others, better levels of performance, scalability, and performance/energy ratio. On the other hand, higher clock frequencies and increasing number of transistors available on a single chip have revealed energy consumption as a critical design issue in current and future microarchitectures. In these architectures, the design of the on-chip interconnection network has proven to have significant impact on overall system performance and energy consumption, and that the wires used in such interconnect can be designed with varying latency, bandwidth, and power characteristics.
In this work, we present a detailed characterization of the energy-efficiency of a CMP for parallel scientific applications using Sim-PowerCMP, a detailed architectural-level power-performance simulation tool for CMP architectures that integrates several well-known contemporary simulators (RSIM, Hot Leakage and Orion) into a single framework that allows precise analysis and optimization of power dissipation (both dynamic and static) taking into account performance. In this characterization, we pay special attention to the energy consumed on the interconnection network. Results for an 8- and 16-core CMP show that the most power consuming messages are the replies that carry data (almost 70% on average of the total energy consumed in the interconnect) although they represent 30% of the total number of messages. Furthermore, we show that using on-chip wires with varying latency, bandwidth, and energy characteristics can reduce the energy dissipated by the links of the interconnection network about 65% with an average impact of 10% in the execution time.
Similar content being viewed by others
References
Austin T, Larson E, Ernst D (2002) SimpleScalar: an infrastructure for computer system modeling. Computer 35(2):59–67
Balasubramonian R, Muralimanohar N, Ramani K, Venkatachalapathy V (2005) Microarchitectural wire management for performance and power in partitioned architectures. In: Proc of the 11th int’l symp on high-performance computer architecture (HPCA-11), pp 28–39
Banerjee K, Mehrotra A (2002) A power-optimal repeater insertion methodology for global interconnects in nanometer designs. IEEE Trans Electron Devices 49(11):2001–2007
Binkert NL, Dreslinski RG, Hsu LR, Lim KT, Saidi AG, Reinhardt SK (2006) The M5 simulator: modeling networked systems. IEEE Micro 26(4):52–60
Brooks D, Tiwari V, Martonosi M (2000) Wattch: a framework for architectural-level power analysis and optimizations. In: Proc of the 27th int’l symp on computer architecture (ISCA-27), pp 83–94
Chen J, Dubois M, Stenström P (2003) Integrating complete-system and user-level performance/power simulators: the SimWattch approach. In: Proc of the 2003 IEEE int’l symp on performance analysis of systems and software
Cheng L, Muralimanohar N, Ramani K, Balasubramonian R, Carter J (2006) Interconnect-aware coherence protocols for chip multiprocessors. In: Proc of the 33rd int’l symp on computer architecture (ISCA-33), pp 339–351
Fernandez-Pascual R, Garcia JM (2005) RSIM x86: a cost effective performance simulator. In: Proc of the 19th European conf on modelling and simulation
Flores A, Aragón JL, Acacio ME (2007) Sim-PowerCMP: a detailed simulator for energy consumption analysis in future embedded CMP architectures. In: Proc of the 4th int’l symp on embedded computing (SEC-4)
Hughes CJ, Pai VS, Ranganathan P, Adve SV (2002) RSIM: simulating shared-memory multiprocessors with ILP processors. IEEE Comput 35(2):40–49
Kahle JA, Day MN, Hofstee HP, Johns CR, Maeurer TR, Shippy D (2005) Introduction to the cell multiprocessor. IBM J Res Dev 49(4/5):589–604
Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proc of the 10th int’l conf on architectural support for programming languages and operating systems (ASPLOS-10), pp 211–222
Kumar R, Zyuban V, Tullsen DM (2005) Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling. In: Proc of the 32nd int’l symp on computer architecture (ISCA-32), pp 408–419
Liu C, Sivasubramaniam A, Kandemir M (2004) Organizing the last line of defense before hitting the memory wall for CMPs. In: 10th int’l symp on high performance computer architecture (HPCA-10), pp 176–185
Magen N, Kolodny A, Weiser U, Shamir N (2004) Interconnect-power dissipation in a microprocessor. In: Proc of the 2004 int’l workshop on system level interconnect prediction (SLIP’04), pp 7–13
Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput Archit News 33(4):92–99
Monchiero M, Canal R, Gonzalez A (2006) Design space exploration for multicore architectures: a power/performance/thermal view. In: ICS ’06: proc of the 20th int’l conf on supercomputing, pp 177–186
Renau J SESC Website http://sourceforge.net/projects/sesc
Shang L, Peh L, Jha N (2003) Dynamic voltage scaling with links for power optimization of interconnection networks. In: Proc of the 9th int’l symp on high-performance computer architecture (HPCA-9), pp 91–102
Shivakumar P, Jouppi NP (2001) Cacti 3.0: an integrated cache timing, power and area model. Technical report, Western Research Lab (WRL)
Singh J, Weber W-D, Gupta A (1992) SPLASH: Stanford parallel applications for shared-memory. Comput Archit News 20(1):5–44
Sohi GS (1990) Instruction issue logic for high-performance, interruptible, multiple functional unit, pipelined computers. IEEE Trans Comput 39(3):349–359
Taylor MB, Kim J, Miller J, Wentzlaff D, Ghodrat F, Greenwald B, Hoffman H, Johnson P, Lee J-W, Lee W, Ma A, Saraf A, Seneski M, Shnidman N, Strumpen V, Frank M, Amarasinghe S, Agarwal A (2002) The Raw microprocessor: a computational fabric for software circuits and general-purpose programs. IEEE Micro 22(2):25–35
Wang H, Peh L-S, Malik S (2003) Power-driven design of router microarchitectures in on-chip networks. In: Proc of the 36th int’l symp on microarchitecture (MICRO-36), pp 105–111
Wang H-S, Peh L-S, Malik S (2003) A power model for routers: modeling alpha 21364 and Infiniband routers. IEEE Micro 23(1):26–35
Wang H-S, Zhu X, Peh L-S, Malik S (2002) Orion: a power-performance simulator for interconnection networks. In: Proc of the 35th int’l symp on microarchitecture (MICRO-35), pp 294–305
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proc of the 22nd int’l symp on computer architecture (ISCA-22), pp 24–36
Zhang M, Asanovic K (2005) Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: Proc of the 32nd int’l symp on computer architecture (ISCA-32), pp 336–345
Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M (2003) HotLeakage: a temperature-aware model of subthreshold and gate leakage for architects. Technical report, University of Virginia
Zhao L, Iyer R, Makineni S, Moses J, Illikkal R, Newell D (2007) Performance, area and bandwidth implications on large-scale CMP cache design. In: Proc of the 1st workshop on chip multiprocessor memory systems and interconnects (CMP-MSI’07). In conjunction with HPCA-13
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Flores, A., Aragón, J.L. & Acacio, M.E. An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures. J Supercomput 45, 341–364 (2008). https://doi.org/10.1007/s11227-008-0178-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-008-0178-0