Advertisement

Reliable Networks-on-Chip Design for Sustainable Computing Systems

  • Paul Ampadu
  • Qiaoyan Yu
  • Bo Fu
Chapter

Abstract

To achieve sustainability, computing systems demand a high-performance and energy-efficient on-chip communication infrastructure. Because of its scalability, reusability and high throughput, Networks-on-Chip (NoCs) have been increasingly adopted in the sustainable computing systems. The growing transient and permanent errors induced by the scaled technologies add a new challenge—reliability—on the sustainable computing system design. The commonly used techniques for reliable networks-on-chip design are overviewed in this chapter. The very recent energy-efficient NoC link and router design approached are presented, as well.

References

  1. 1.
    Cray Research, Inc. (1985) The cray-2 computer systemGoogle Scholar
  2. 2.
    Gioiosa R (2010) Towards sustainable exascale computing. In: Proceedings of the18th IEEE/IFIP VLSI system on chip conference (VLSI-SoC), Madrid, Spain, pp 270–275Google Scholar
  3. 3.
    Zhang Y, Sun J, Yuan G, Zhang L (2010) Perspectives of China’s HPC system development: a view from the 2009 China HPC TOP100 list. J Frontiers Comput Sci China 4(4):437–444CrossRefGoogle Scholar
  4. 4.
    Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69CrossRefGoogle Scholar
  5. 5.
    Truong DN et al (2009) A 167-processor computational platform in 65 nm CMOS. IEEE J Solid State Circuits 44(4):1130–1144CrossRefGoogle Scholar
  6. 6.
    Seiler L et al (2009) Larrabee: a many-core x86 architecture for visual computing. IEEE Micro 29(1):10–21CrossRefGoogle Scholar
  7. 7.
    Dally WJ, Towles B (2001) Route packets, not wires: on-chip interconnection networks. In: Proceedings of the 38th design automation conference (DAC’01), Las Vegas, NV, USA, pp 684–689Google Scholar
  8. 8.
    Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. Computer 35:70–78CrossRefGoogle Scholar
  9. 9.
    Agarwal A, Iskander C, Shankar R (2009) Survey of network on chip (NoC) architectures & contributions. Eng Comput Architec 3:1–15Google Scholar
  10. 10.
    Kogge P et al (2008) Exascale computing study: technology challenges in achieving exascale systems. Tech Rep DARPA-2008-13, DARPA IPTOGoogle Scholar
  11. 11.
    Naffziger S (2006) High-performance processors in a power-limited world. In: Proceedings of the symposium on VLSI Circuits, Honolulu, Hawaii, USA, pp 93–97Google Scholar
  12. 12.
    Constantinescu C (2003) Trends and challenges in VLSI circuit reliability. IEEE Micro 23: 14–19CrossRefGoogle Scholar
  13. 13.
    Hussein MA, He J (2005) Materials’ impact on interconnect process technology and reliability. IEEE Trans Semiconduct Manuf 18:69–85CrossRefGoogle Scholar
  14. 14.
    Jakushokas R et al (2011) Power distribution networks with on-chip decoupling capacitors. Springer, New YorkCrossRefMATHGoogle Scholar
  15. 15.
    Chandra V, Aitken R (2008) Impact of technology and voltage scaling on the soft error susceptibility in nanoscale CMOS. In: Proceedings of DFT’08, Cambridge, MA, USA, pp 114–122Google Scholar
  16. 16.
    Barsky R, Wagner IA (2004) Reliability and yield: a joint defect-oriented approach. In: Proceedings of the 19th IEEE international symposium on defect and fault tolerance in VLSI Syst (DFT’04), Cannes, France, pp 2–10Google Scholar
  17. 17.
    Shivakumar P et al (2002) Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proceedings of international conference on dependable systems and networks, Washington, DC, USA, pp 389–398Google Scholar
  18. 18.
    Agarwal K, Sylvester D, Blaauw D (2006) Modeling and analysis of crosstalk noise in coupled RLC interconnects. IEEE Trans Comput Aided Des Integr Circuits Syst 25:892–901CrossRefGoogle Scholar
  19. 19.
    Baumann R (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans Device Mater Reliab 5:305–316CrossRefGoogle Scholar
  20. 20.
    Bertozzi D, Benini L, De Micheli G (2005) Error control scheme for on-chip communication links: the energy-reliability tradeoff. IEEE Trans Comput Aided Des Integr Circuits Syst (TCAD) 24:818–831CrossRefGoogle Scholar
  21. 21.
    Lin S, Costello D, Miller M (1984) Automatic-repeat-request error control schemes. IEEE Commun Mag 22:5–17CrossRefGoogle Scholar
  22. 22.
    Metzner J (1979) Improvements in block-retransmission schemes. IEEE Trans Commun COM 23:525–532Google Scholar
  23. 23.
    Lehtonen T, Lijieberg P, Plosila J (2007) Analysis of forward error correction methods for nanoscale networks-on-chip. In: Proceedings of the nano-net, Catania, Italy, pp 1–5Google Scholar
  24. 24.
    Lin S, Costello DJ (2004) Error control coding, 2nd edn. Prentice HallGoogle Scholar
  25. 25.
    Sridhara S, Shanbhag RN (2005) Coding for system-on-chip networks: a unified framework. IEEE Trans Very Large Scale Integr (VLSI) Syst 12:655–667CrossRefGoogle Scholar
  26. 26.
    Rossi D, Metra C, Nieuwland KA, Katoch A (2005) Exploiting ECC redundancy to minimize crosstalk impact. IEEE Des Test Comput 22:59–70CrossRefGoogle Scholar
  27. 27.
    Zimmer H, Jantsch A (2003) A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip. In: Proceedings of the international conference on hardware/software codesign and system synthsis (CODES-ISSS), Newport Beach, CA, USA, pp 188–193Google Scholar
  28. 28.
    Yu Q, Ampadu P (2008) Adaptive error control for NoC switch-to-switch links in a variable noise environment. In: Proceedings of IEEE international symposiun on defect and fault tolerance in VLSI system (DFT), Cambridge, MA, USA, pp 352–360Google Scholar
  29. 29.
    Reed SI, Solomon G (1960) Polynomial codes over certain finite fields. J Soc Ind Appl Math 8:300–304MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Dumitras T, Kerner S, Marculescu R (2003) Towards on-chip fault-tolerant communication. In: Proceedings of the Asia and South Pacific design automation conference (ASP-DAC’03), Kitakyushu, Japan, pp 225–232Google Scholar
  31. 31.
    Haas ZJ, Halpern JY, Li L (2006) Gossip-based ad hoc routing. IEEE/ACM Trans Network (TON) 14:476–491Google Scholar
  32. 32.
    Pirretti M et al (2004) Fault tolerant algorithms for network-on-chip interconnect. In: Proceedings IEEE computer society annual symposium on VLSI emerging trends in VLSI syst design (ISVLSI’04), Lafayette, Louisiana, USA, pp 46–51Google Scholar
  33. 33.
    Patooghy A, Miremadi SG (2008) LTR: a low-overhead and reliable routing algorithm for network on chips. In: Proceedings of international SoC design conference Busan, Korea, I-129–I-133Google Scholar
  34. 34.
    Bobda C et al (2005) DyNoC: a dynamic infrastructure for communication in dynamically reconfigurable devices. In: Proceedings of international conference on field programmable logic and applications, Tampere, Finland, pp 153–158Google Scholar
  35. 35.
    Zhang Z, Greiner A, Taktak S (2008) A reconfigurable routing algorithm for a fault-tolerant 2D-mesh network-on-chip. In: Proceedings of IEEE design automation conference (DAC’08), Austin, TX, USA, pp 441–446Google Scholar
  36. 36.
    Glass CJ, Ni LM (1992) The turn model for adaptive routing. In: Proceedings of international symposium computer architecture, Gold Coast, Australia, pp 278–287Google Scholar
  37. 37.
    Chiu G-M (2000) The odd-even turn model for adaptive routing. IEEE Trans Parallel Distr Syst 11:729–738CrossRefGoogle Scholar
  38. 38.
    Li M, Zeng QA, Jone WB(2006) DyXY-A proximity congestion-aware deadlock-free dynamic routing method for network-on-chip. In: Proceedings of DAC 2006, San Francisco, CA, USA, pp 849–852Google Scholar
  39. 39.
    Hosseini A, Ragheb T, Massoud Y (2008) A fault-ware dynamic routing algorithm for on-chip networks. In: Proceedings of IEEE international symposium circuits and syst( ISCAS ’08), Seattle, Washington, USA, pp 2653–2656Google Scholar
  40. 40.
    Aliabadi MR, Khademzadeh A, Raiya AM (2008) Dynamic intermediate node algorithm (DINA): a novel fault tolerance routing methodology for NoCs. In: Proceedings of international symposium on telecommunication, Tehran, Iran, pp 521–526Google Scholar
  41. 41.
    Schonwald T, Zimmermann J, Bringmann O, Rosenstiel W (2007) Fully adaptive fault-tolerant routing algorithm for network-on-chip architectures. In: Proceedings of euromicro conference on digital system design architecture, Lubeck, Germany, pp 527–534Google Scholar
  42. 42.
    Zhou J, Lau FCM (2001) Adaptive fault-tolerant wormhole routing in 2D meshes. In: Proceedings of 15th international parallel and distributed processing symposium, pp 1–8Google Scholar
  43. 43.
    Boppana RV, Chalasani S (1995) Fault-tolerant wormhole routing algorithms for mesh networks. IEEE Trans Comput 44:848–864CrossRefMATHGoogle Scholar
  44. 44.
    Chen K-H, Chiu G-M (1998) Fault-tolerant routing algorithm for meshes without using virtual channels. Inform Sci Eng 14:765–783Google Scholar
  45. 45.
    Park D, Nicopoulos C, Kim J, Vijaykrishnan N, Das CR (2006) Exploring fault-tolerant network-on-chip architectures. In: Proceedings of international conference on dependable syst and networks (DSN’06), Philadelphia, PA, USA, pp 93–104Google Scholar
  46. 46.
    Duato J (1997) A theory of fault-tolerant routing in wormhole networks. IEEE Trans Parallel Distr Syst 8:790–802CrossRefGoogle Scholar
  47. 47.
    Lehtonen T, Wolpert D, Liljeberg P, Plosila J, Ampadu P (2010) Self-adaptive system for addressing permanent errors in on-chip interconnects. IEEE Trans Very Large Scale Integr (VLSI) Syst 18:527–540CrossRefGoogle Scholar
  48. 48.
    Lehtonen T, Liljeberg P, Plosila J (2007) Online reconfigurable self-timed links for fault tolerant NoC. VLSI Des 2007:1–13CrossRefGoogle Scholar
  49. 49.
    Elias P (1954) Error-free coding. IEEE Trans Inf Theory 4:29–37MathSciNetGoogle Scholar
  50. 50.
    Fujiwara E (2006) Code design for dependable systems: theory and practical applications. Wiley Interscience, HobokenCrossRefGoogle Scholar
  51. 51.
    Pyndiah R (1998) Near-optimum decoding of product codes: block turbo codes. IEEE Trans Commun 46(8):1003–1010CrossRefMATHGoogle Scholar
  52. 52.
    Fu B, Ampadu P (2009) On hamming product codes with type-II hybrid ARQ for on-chip interconnects. IEEE Trans Circuits Syst I, Reg Papers 9:2042–2054Google Scholar
  53. 53.
    Constantinides K et al (2006) BulletProof: a defect-tolerant CMP switch architecture. In: Proceedings of HPCA’06, Austin, Feb 2006, pp 5–16Google Scholar
  54. 54.
    Patel KN, Markov IL (2004) Error-correction and crosstalk avoidance in DSM busses. IEEE Trans Very Large Scale Integr (VLSI) Syst 12:1076–1080CrossRefGoogle Scholar
  55. 55.
    Ganguly A, Pande PP, Belzer B, Grecu C (2008) Design of low power & reliable networks on chip through joint crosstalk avoidance and multiple error correction coding. J Electron Test Theory Appl (JETTA), Special Issue on Defect and Fault Tolerance 24:67–81CrossRefGoogle Scholar
  56. 56.
    Ganguly A, Pande PP, Belzer B (2009) Crosstalk-aware channel coding schemes for energy efficient and reliable NOC interconnects. IEEE Trans Very Large Scale Integr (VLSI) Syst 17(11):1626–1639CrossRefGoogle Scholar
  57. 57.
    Sridhara S, Shanbhag RN (2007) Coding for reliable on-chip buses: a class of fundamental bounds and practical codes. IEEE Trans Comput Aided Des Integr Circuits Syst 5:977–982CrossRefGoogle Scholar
  58. 58.
    Sridhara S, Ahmed A, Shanbhag RN (2004) Area and energy-efficient crosstalk avoidance codes for on-chip busses. In: Proceedings of international conference on computer design (ICCD), San Jose, CA, USA, pp 12–17Google Scholar
  59. 59.
    Duan C, Tirumala A, Khatri SP (2001) Analysis and avoidance of crosstalk in on-chip buses. In: Proceedings of hot interconnects, Stanford, California, USA, pp 133–138Google Scholar
  60. 60.
    Victor B, Keutzer K (2001) Bus encoding to prevent crosstalk delay. In: Proceedings of IEEE/ACM international conference on computer-aided design (ICCAD), San Jose, CA, USA, pp 57–63Google Scholar
  61. 61.
    Hirose K, Yassura H (2000) A bus delay reduction technique considering crosstalk. In: Proceedings of design, automation and test in Europe (DATE), Paris, France, pp 441–445Google Scholar
  62. 62.
    Nose K, Sakurai T (2001) Two schemes to reduce interconnect delay in bi-directional and uni-directional buses. In: Proceedings of VLSI symposium, Kyoto, Japan, pp 193–194Google Scholar
  63. 63.
    Fu B, Ampadu P (2010) Exploiting parity computation latency for on-chip crosstalk reduction. IEEE Trans Circuits Syst II: Expr Briefs 57:399–403CrossRefGoogle Scholar
  64. 64.
    Arizona State University Predictive Technology Model [Online]. http://ptm.asu.edu/
  65. 65.
    Fick D et al. (2009) A highly resilient routing algorithm for fault-tolerant NoCs. In: Proceedings of DATE’09, Nice, France, Mar 2009, pp 21–26Google Scholar
  66. 66.
    Sanusi A, Bayoumi MA (2009) Smart-flooding: a novel scheme for fault-tolerant NoCs. In: Proceedings of IEEE SoC conference, Belfast, Northern Ireland, Sept 2009, pp 259–262Google Scholar
  67. 67.
    Rodrigo S, Flich J, Roca A, Medardoni S, Bertozzi D, Camacho J, Silla F, Duato J (2010) Addressing manufacturing challenges with cost-efficient fault tolerant routing. In: Proceedings of NOCS’10, Grenoble, France, May 2010, pp 25–32Google Scholar
  68. 68.
    Yanamandra A et al (2010) Optimizing power and performance for reliable on-chip networks. In: Proceedings of ASP-DAC’10, Taipei, Taiwan, Jan 2010, pp 431–436Google Scholar
  69. 69.
    Lyons REAND, Vanderkulk W (1962) The use of triple-modular redundancy to improve computer reliability. IBM J Res Dev 6(2):200–209CrossRefMATHGoogle Scholar
  70. 70.
    Vangal S et al (2008) An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J Solid State Circuits 43(1):29–41CrossRefGoogle Scholar
  71. 71.
    Yu Q, Zhang M, Ampadu P (2011) Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration. In: Proceedings of. 5th ACM/IEEE international symposium on networks-on-chip (NoCS’11), Pittsburgh, Pennsylvania, USA, pp 105–112Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.University of RochesterRochesterUSA
  2. 2.University of New HampshireDurhamUSA
  3. 3.Marvell Technology Group Ltd.Santa ClaraUSA

Personalised recommendations