Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms

Chang, Kuei-Chung; Liao, Ing-Ming; Liao, Chiu-Han

doi:10.1007/s11227-012-0793-7

Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms

Published: 09 June 2012

Volume 62, pages 1318–1337, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Kuei-Chung Chang¹,
Ing-Ming Liao¹ &
Chiu-Han Liao¹

233 Accesses
1 Citation
Explore all metrics

Abstract

The significant speed-gap between processor and memory makes last-level cache performance crucial for multi-core architectures (MCA). Non-uniform cache architecture (NUCA) has been proposed to overcome the performance limitations of MCA for many embedded applications. The cache is partitioned into sub-banks, with each sub-bank being an independently accessible entity connected with a fast on-chip network (NoC). This paper presents two NoC-assisted mechanisms to improve the performance and power consumption of NUCA coherence. The first mechanism provides priority-based communication based on the wormhole routing architecture to support NUCA coherence. High-priority coherent packets are transmitted first to save time. The second mechanism offers multicasting communication based on the proposed priority-based NoC to provide efficient cache coherency for NUCA. We dispatch and collect coherence packets at the collecting nodes (CN) to further decrease the number of coherent messages flowing in the NoC. Experimental results show that the priority-based transmission can improve performance by approximately 10 %. The proposed multicasting mechanism can further improve performance and decrease power consumption of the NoC in NUCA by approximately 15 %. The two proposed mechanisms can together enhance the performance by 25 % averagely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on chiplets: interface, interconnect and integration methodology

Article 31 March 2022

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Article 21 September 2023

References

Trawick T (2007) Multicore communication: today and the future. Embed Comput Des
Parkhurst J, Darringer J, Grundmann B (2006) From single core to multi-core: preparing for a new exponential. In: Proceedings of the 2006 IEEE/ACM international conference on computer-aided design, November 2006, pp 67–72
Chapter Google Scholar
Haritan E, Yagi H, Wolf W, Hattori T, Paulin P, Nohl A, Wingard D, Muller M (2008) Multicore design is the challenge! What is the solution? In: Proceedings of design automation conference, June 2008, pp 128–130
Google Scholar
Chai L, Gao Q, Panda DK (2007) Understanding the impact of multi-core architecture in cluster computing: a case study with intel dual-core system. In: Proceedings of seventh IEEE international symposium on cluster computing and the grid, May 2007, pp 471–478
Chapter Google Scholar
Marino MD (2006) 32-core CMP with multi-sliced L2, 2 and 4 cores sharing a L2 slice. In: Proceedings of symposium on computer architecture and high performance computing, October 2006, pp 141–150
Google Scholar
Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceeding of international conference of architectural support for programming languages and operating systems, pp 211–222
Chapter Google Scholar
Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. IEEE Comput Mag January:70–78
Article Google Scholar
Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proceedings of design automation conference, June 2001, pp 684–689
Google Scholar
Bambha NK, Bhattacharyya SS (2005) Joint application mapping/interconnect synthesis techniques for embedded chip-scale multiprocessors. IEEE Trans Parallel Distrib Syst 16(2):99–112
Article Google Scholar
Bertozzi D, Jalabert A, Murali S, Tamhankar R, Stergio S, Benini L, Micheli GD (2005) NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans Parallel Distrib Syst 16(2):113–129
Article Google Scholar
Lee J, Lee K, Yoo H-J (2005) Packet-switched on-chip interconnection network for system-on-chip applications. IEEE Trans Circuits Syst 52(6):308–312
Article Google Scholar
Pande PP, Micheli GD, Grecu C, Ivanov A, Saleh R (2005) Design, synthesis, and test of networks on chips. IEEE Des Test Comput 22(5):404–413
Article Google Scholar
Chang K-C, Shen J-S, Chen T-F (2006) Evaluation and design trade-offs between circuit-switched and packet-switched NoCs for application-specific SoCs. In: Proceedings of design automation conference, July 2006, pp 143–148
Google Scholar
Chang K-C, Shen J-S, Chen T-F (2008) Tailoring circuit-switched network-on-chip to application-specific SoC. ACM Trans Des Autom Electron Syst 13(1):1–31
Article Google Scholar
Kim C, Burger D, Keckler SW (2003) An adaptive, non uniform cache structure for wire delay dominated on chip caches. IEEE MICRO, 99–107
Zhou X, Yu C, Dash A, Petrove P (2008) Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors. ACM Trans Des Autom Electron Syst 13(1)
Brown JA, Kumar R, Tullsen D (2007) Proximity-aware directory-based coherence for multi-core processor architectures. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, San Diego, California, USA, pp 126–134
Chapter Google Scholar
de Massas PG, Pétro F (2008) Comparison of memory write policies for NoC based multicore cache coherent systems. In: Proceedings of design, automation and test in Europe, March 2008, pp 997–1002
Chapter Google Scholar
Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler SW (2007) A NUCA substrate for flexible CMP cache sharing. IEEE Trans Parallel Distrib Syst 18(8):1028–1040
Article Google Scholar
Foglia P, Mangano D, Prete CA (2005) A NUCA model for embedded systems cache design. In: Proceedings of workshop on embedded systems for real-time multimedia, September 2005, pp 41–46
Chapter Google Scholar
Loghi M, Letis M, Benini L, Poncino M (2005) Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors. In: Proceedings of the 15th ACM great lakes symposium on VLSI, April 2005, pp 276–281
Chapter Google Scholar
Lira J, Molina C, González A (2009) Analysis of non-uniform cache architecture policies for chip-multiprocessor using the parsec benchmark suite. In: Proceedings of the workshop on managed many-core systems, March 2009
Google Scholar
Mohapatra P (1998) Wormhole routing techniques for directly connected multicomputer system. Proc ACM Comput Surv 30(3):374–410
Article Google Scholar
Open SystemC Initiative. http://www.systemc.org/home
Tomasevic M, Milutinovic VM (1994) Hardware approaches to cache coherence in shared-memory multiprocessors. IEEE MICRO 14(5–6):52–59
Article Google Scholar
Gracia DS, Dimitrakopoulos G, Arnal TM, Katevenis MGH, Yufera VV (2011) LP-NUCA: networks-in-cache for high-performance low-power embedded processors. IEEE Trans Very Large Scale Integr Syst
Bolotin E, Guz Z, Cidon I, Ginosar R, Kolodny A (2007) The power of priority: NoC based distributed cache coherency. In: Proceedings of the international symposium on networks-on-chip, May 2007, pp 117–126
Google Scholar
SPEC OMP. http://www.spec.org/omp
Magnussion PS et al (2002) Simics: a full system simulation platform. Computer 35(2):50–58
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan
Kuei-Chung Chang, Ing-Ming Liao & Chiu-Han Liao

Authors

Kuei-Chung Chang
View author publications
You can also search for this author in PubMed Google Scholar
Ing-Ming Liao
View author publications
You can also search for this author in PubMed Google Scholar
Chiu-Han Liao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuei-Chung Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, KC., Liao, IM. & Liao, CH. Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms. J Supercomput 62, 1318–1337 (2012). https://doi.org/10.1007/s11227-012-0793-7

Download citation

Published: 09 June 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s11227-012-0793-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation