Design Automation for Embedded Systems

, Volume 18, Issue 3–4, pp 121–139 | Cite as

Centaur: a hybrid network-on-chip architecture utilizing micro-network fusion

  • Junghee Lee
  • Chrysostomos Nicopoulos
  • Hyung Gyu Lee
  • Jongman Kim
Article
  • 238 Downloads

Abstract

The escalating proliferation of multicore chips has accentuated the criticality of the on-chip network. Packet-based networks-on-chip (NoC) have emerged as the de facto interconnect of future chip multi-processors (CMP). On-chip traffic comprises a mixture of data and control messages from the cache coherence protocol. Given the latency-criticality of control messages, in this paper we aim to optimize their delivery times. Instead of treating the on-chip router as a monolithic component, we advocate the introduction of an ultra-low-latency ring-inspired (i.e., utilizing ring primitive building blocks) support micro-network that is optimized for control messages. This \(\upmu \)NoC is fused with a throughput-driven conventional NoC router to form a hybrid architecture, called Centaur, which maintains separate data paths and control logic for the two fused networks. Full-system simulation results from a 64-core CMP indicate that the proposed fused Centaur router improves overall system performance by up to 26 %, as compared to a state-of-the-art router implementation. Furthermore, hardware synthesis results using commercial 65 nm libraries indicate that Centaur’s area and power overheads are 9 and 3 %, respectively, as compared to a baseline router design. More importantly, the new design does not affect the router’s critical path.

Keywords

Networks-on-chip Interconnection networks Segregated/separated networks 

References

  1. 1.
    Abad P, Puente V, Gregorio JA (2013) LIGERO: a light but efficient router conceived for cache-coherent chip multiprocessors. ACM Trans Archit Code Optim 9(4):37:1–37:21.Google Scholar
  2. 2.
    Abad P, Puente V, Gregorio JA, Prieto P (2007) Rotary router: an efficient architecture for cmp interconnection networks. In: Proceedings of the 34th annual international symposium on computer architecture, ISCA ’07, pp 116–125.Google Scholar
  3. 3.
    Abousamra A, Melhem R, Jones A (2012) Deja vu switching for multiplane nocs. In: Sixth IEEE/ACM international symposium on networks on chip (NoCS), pp 11–18.Google Scholar
  4. 4.
    Agarwal N, Krishna T, Peh LS, Jha N (2009) GARNET: A detailed on-chip network model inside a full-system simulator. In: IEEE international symposium on performance analysis of systems and software.Google Scholar
  5. 5.
    Agarwal N, Peh LS, Jha N (2009), In-network snoop ordering (INSO): snoopy coherence on unordered interconnects. In: Proceedings of the 15th international symposium on high-performance computer, architecture, pp 67–78.Google Scholar
  6. 6.
    Anjan K, Pinkston T, Duato J (1996) Generalized theory for deadlock-free adaptive wormhole routing and its application to disha concurrent. In: Proceedings of IPPS ’96. The 10th international parallel processing symposium, pp 815–821.Google Scholar
  7. 7.
    Balfour J, Dally WJ (2006) Design tradeoffs for tiled CMP on-chip networks. In: Proceedings of the 20th annual international conference on supercomputing, pp 187–198.Google Scholar
  8. 8.
    Bienia C (2011) Benchmarking modern multiprocessors. Ph.D. Thesis, Princeton University.Google Scholar
  9. 9.
    Bolotin E, Guz Z, Cidon I, Ginosar R, Kolodny A (2007) The power of priority: NoC based distributed cache coherency. In: Proceedings of the first international symposium on networks-on-chip.Google Scholar
  10. 10.
    Bourduas S, Zilic Z (2007) A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In: Proceedings of the first international symposium on networks-on-chip, pp 195–204.Google Scholar
  11. 11.
    Chuang JH, Chao WC (1994) Torus with slotted rings architecture for a cache-coherent multiprocessor. In: Proceedings of the 1994 international conference on parallel and distributed systems, pp 76–81.Google Scholar
  12. 12.
    Das R, Eachempati S, Mishra A, Narayanan V, Das C (2009), Design and evaluation of a hierarchical on-chip interconnect for next-generation cmps. In: Proceedings of the 15th international symposium on high-performance computer, architecture, pp 175–186.Google Scholar
  13. 13.
    Das R, Mutlu O, Moscibroda T, Das C (2009) Application-aware prioritization mechanisms for on-chip networks. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, pp 280–291.Google Scholar
  14. 14.
    Duato J, Yalamanchili S, Ni L (2003) Interconnection networks. Margan Kaufmann, San FranciscoGoogle Scholar
  15. 15.
    Flores A, Aragon J, Acacio M (2010) Heterogeneous interconnects for energy-efficient message management in cmps. IEEE Trans Comput 59(1):16–28MathSciNetCrossRefGoogle Scholar
  16. 16.
    Gratz P, Kim C, McDonald R, Keckler S, Burger D (2006) Implementation and evaluation of on-chip network architectures. In: Proceedings of international conference on computer design.Google Scholar
  17. 17.
    Hayenga M, Jerger NE, Lipasti M (2009) SCARAB: a single cycle adaptive routing and bufferless network. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture.Google Scholar
  18. 18.
    Holliday M, Stumm M (1994) Performance evaluation of hierarchical ring-based shared memory multiprocessors. IEEE Trans Comput 43:52–67CrossRefGoogle Scholar
  19. 19.
    Jerger NDE, Peh LS, Lipasti MH (2008) Circuit-switched coherence. In: Proceedings of the second ACM/IEEE international symposium on networks-on-chip, pp 193–202.Google Scholar
  20. 20.
    Kim J (2009) Low-cost router microarchitecture for on-chip networks. In: Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture, pp 255–266.Google Scholar
  21. 21.
    Kim J, Nicopoulos C, Park D (2006) A gracefully degrading and energy-efficient modular router architecture for on-chip networks. SIGARCH Comput Archit News 34(2):4–15CrossRefGoogle Scholar
  22. 22.
    Kumar A, Peh LS, Kundu P, Jha NK (2007) Express virtual channels: towards the ideal interconnection fabric. In: Proceedings of the 34th annual international symposium on computer architecture.Google Scholar
  23. 23.
    Kumary A, Kunduz P, Singhx A, Peh LS, Jhay N (2007) A 4.6Tbits/s 3.6 GHz single-cycle NoC router with a novel switch allocator in 65 nm CMOS. In: Proceedings of the 25th international conference on computer design, pp 63–70.Google Scholar
  24. 24.
    Martin MMK, Sorin DJ, Beckmann BM, Marty MR, Xu M, Alameldeen AR, Moore KE, Hill MD, Wood DA (2005) Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput Archit News 33:2005CrossRefGoogle Scholar
  25. 25.
    Matsutani H, Koibuchi M, Amano H, Yoshinaga T (2009), Prediction router: yet another low latency on-chip router architecture. In: Proceedings of the IEEE 15th international symposium on high performance computer, architecture, pp 367–378.Google Scholar
  26. 26.
    Mullins R, West A, Moore S (2004), Low-latency virtual-channel routers for on-chip networks. In: Proceedings of the 31st, annual international symposium on computer architecture, p 188.Google Scholar
  27. 27.
    Mullins R, West A, Moore S (2006) The design and implementation of a low-latency on-chip network. In: Proceedings of Asia and South Pacific conference on design automation, p 6.Google Scholar
  28. 28.
    Nicopoulos C, Park D, Kim J, Vijaykrishnan N, Yousif M, Das C (2006) Vichar: a dynamic virtual channel regulator for network-on-chip routers. In: 39th annual IEEE/ACM international symposium on microarchitecture, pp 333–346.Google Scholar
  29. 29.
    Park C, Badeau R, Biro L, Chang J, Singh T, Vash J, Wang B, Wang T (2010) A 1.2 TB/s on-chip ring interconnect for 45nm 8-core enterprise Xeon processor. In: Proceedings of IEEE international solid-state circuits conference digest of technical papers, pp 180–181.Google Scholar
  30. 30.
    Peh LS, Dally WJ (2001), A delay model and speculative architecture for pipelined routers. In: Proceedings of the 7th international symposium on high-performance computer, architecture, p 255.Google Scholar
  31. 31.
    Pinkston T (1999) Flexible and efficient routing based on progressive deadlock recovery. IEEE Trans Comput 48(7):649–669CrossRefGoogle Scholar
  32. 32.
    Sibai F (2008) Adapting the hyper-ring interconnect for many-core processors. In: International symposium on parallel and distributed processing with applications, pp 649–654.Google Scholar
  33. 33.
    Singh A, Dally W, Towles B, Gupta A (2004) Globally adaptive load-balanced routing on tori. Comput Archit Lett 3(1):2CrossRefGoogle Scholar
  34. 34.
    Song YH, Pinkston T (2003) A progressive approach to handling message-dependent deadlock in parallel computer systems. IEEE Trans Parallel Distrib Syst 14(3):259–275CrossRefGoogle Scholar
  35. 35.
    Volos S, Seiculescu C, Grot B, Pour N, Falsafi B, De Micheli G (2012) Ccnoc: Specializing on-chip interconnects for energy efficiency in cache-coherent servers. In: Sixth IEEE/ACM international symposium on networks on chip (NoCS), pp 67–74.Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Junghee Lee
    • 1
  • Chrysostomos Nicopoulos
    • 2
  • Hyung Gyu Lee
    • 3
  • Jongman Kim
    • 1
  1. 1.Georgia Institute of TechnologyAtlantaUSA
  2. 2.University of CyprusNicosiaCyprus
  3. 3.Daegu UniversityGyeongsanSouth Korea

Personalised recommendations