Skip to main content
Log in

Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The significant speed-gap between processor and memory makes last-level cache performance crucial for multi-core architectures (MCA). Non-uniform cache architecture (NUCA) has been proposed to overcome the performance limitations of MCA for many embedded applications. The cache is partitioned into sub-banks, with each sub-bank being an independently accessible entity connected with a fast on-chip network (NoC). This paper presents two NoC-assisted mechanisms to improve the performance and power consumption of NUCA coherence. The first mechanism provides priority-based communication based on the wormhole routing architecture to support NUCA coherence. High-priority coherent packets are transmitted first to save time. The second mechanism offers multicasting communication based on the proposed priority-based NoC to provide efficient cache coherency for NUCA. We dispatch and collect coherence packets at the collecting nodes (CN) to further decrease the number of coherent messages flowing in the NoC. Experimental results show that the priority-based transmission can improve performance by approximately 10 %. The proposed multicasting mechanism can further improve performance and decrease power consumption of the NoC in NUCA by approximately 15 %. The two proposed mechanisms can together enhance the performance by 25 % averagely.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

References

  1. Trawick T (2007) Multicore communication: today and the future. Embed Comput Des

  2. Parkhurst J, Darringer J, Grundmann B (2006) From single core to multi-core: preparing for a new exponential. In: Proceedings of the 2006 IEEE/ACM international conference on computer-aided design, November 2006, pp 67–72

    Chapter  Google Scholar 

  3. Haritan E, Yagi H, Wolf W, Hattori T, Paulin P, Nohl A, Wingard D, Muller M (2008) Multicore design is the challenge! What is the solution? In: Proceedings of design automation conference, June 2008, pp 128–130

    Google Scholar 

  4. Chai L, Gao Q, Panda DK (2007) Understanding the impact of multi-core architecture in cluster computing: a case study with intel dual-core system. In: Proceedings of seventh IEEE international symposium on cluster computing and the grid, May 2007, pp 471–478

    Chapter  Google Scholar 

  5. Marino MD (2006) 32-core CMP with multi-sliced L2, 2 and 4 cores sharing a L2 slice. In: Proceedings of symposium on computer architecture and high performance computing, October 2006, pp 141–150

    Google Scholar 

  6. Kim C, Burger D, Keckler SW (2002) An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: Proceeding of international conference of architectural support for programming languages and operating systems, pp 211–222

    Chapter  Google Scholar 

  7. Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. IEEE Comput Mag January:70–78

    Article  Google Scholar 

  8. Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proceedings of design automation conference, June 2001, pp 684–689

    Google Scholar 

  9. Bambha NK, Bhattacharyya SS (2005) Joint application mapping/interconnect synthesis techniques for embedded chip-scale multiprocessors. IEEE Trans Parallel Distrib Syst 16(2):99–112

    Article  Google Scholar 

  10. Bertozzi D, Jalabert A, Murali S, Tamhankar R, Stergio S, Benini L, Micheli GD (2005) NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans Parallel Distrib Syst 16(2):113–129

    Article  Google Scholar 

  11. Lee J, Lee K, Yoo H-J (2005) Packet-switched on-chip interconnection network for system-on-chip applications. IEEE Trans Circuits Syst 52(6):308–312

    Article  Google Scholar 

  12. Pande PP, Micheli GD, Grecu C, Ivanov A, Saleh R (2005) Design, synthesis, and test of networks on chips. IEEE Des Test Comput 22(5):404–413

    Article  Google Scholar 

  13. Chang K-C, Shen J-S, Chen T-F (2006) Evaluation and design trade-offs between circuit-switched and packet-switched NoCs for application-specific SoCs. In: Proceedings of design automation conference, July 2006, pp 143–148

    Google Scholar 

  14. Chang K-C, Shen J-S, Chen T-F (2008) Tailoring circuit-switched network-on-chip to application-specific SoC. ACM Trans Des Autom Electron Syst 13(1):1–31

    Article  Google Scholar 

  15. Kim C, Burger D, Keckler SW (2003) An adaptive, non uniform cache structure for wire delay dominated on chip caches. IEEE MICRO, 99–107

  16. Zhou X, Yu C, Dash A, Petrove P (2008) Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors. ACM Trans Des Autom Electron Syst 13(1)

  17. Brown JA, Kumar R, Tullsen D (2007) Proximity-aware directory-based coherence for multi-core processor architectures. In: Proceedings of the nineteenth annual ACM symposium on parallel algorithms and architectures, San Diego, California, USA, pp 126–134

    Chapter  Google Scholar 

  18. de Massas PG, Pétro F (2008) Comparison of memory write policies for NoC based multicore cache coherent systems. In: Proceedings of design, automation and test in Europe, March 2008, pp 997–1002

    Chapter  Google Scholar 

  19. Huh J, Kim C, Shafi H, Zhang L, Burger D, Keckler SW (2007) A NUCA substrate for flexible CMP cache sharing. IEEE Trans Parallel Distrib Syst 18(8):1028–1040

    Article  Google Scholar 

  20. Foglia P, Mangano D, Prete CA (2005) A NUCA model for embedded systems cache design. In: Proceedings of workshop on embedded systems for real-time multimedia, September 2005, pp 41–46

    Chapter  Google Scholar 

  21. Loghi M, Letis M, Benini L, Poncino M (2005) Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors. In: Proceedings of the 15th ACM great lakes symposium on VLSI, April 2005, pp 276–281

    Chapter  Google Scholar 

  22. Lira J, Molina C, González A (2009) Analysis of non-uniform cache architecture policies for chip-multiprocessor using the parsec benchmark suite. In: Proceedings of the workshop on managed many-core systems, March 2009

    Google Scholar 

  23. Mohapatra P (1998) Wormhole routing techniques for directly connected multicomputer system. Proc ACM Comput Surv 30(3):374–410

    Article  Google Scholar 

  24. Open SystemC Initiative. http://www.systemc.org/home

  25. Tomasevic M, Milutinovic VM (1994) Hardware approaches to cache coherence in shared-memory multiprocessors. IEEE MICRO 14(5–6):52–59

    Article  Google Scholar 

  26. Gracia DS, Dimitrakopoulos G, Arnal TM, Katevenis MGH, Yufera VV (2011) LP-NUCA: networks-in-cache for high-performance low-power embedded processors. IEEE Trans Very Large Scale Integr Syst

  27. Bolotin E, Guz Z, Cidon I, Ginosar R, Kolodny A (2007) The power of priority: NoC based distributed cache coherency. In: Proceedings of the international symposium on networks-on-chip, May 2007, pp 117–126

    Google Scholar 

  28. SPEC OMP. http://www.spec.org/omp

  29. Magnussion PS et al (2002) Simics: a full system simulation platform. Computer 35(2):50–58

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuei-Chung Chang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, KC., Liao, IM. & Liao, CH. Improving performance of multi-core NUCA coherent systems using NoC-assisted mechanisms. J Supercomput 62, 1318–1337 (2012). https://doi.org/10.1007/s11227-012-0793-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-012-0793-7

Keywords

Navigation