Advertisement

Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy

  • Yung-Chang ChangEmail author
  • Cihun-Siyong Alex Gong
  • Ching-Te Chiu
Article
  • 11 Downloads

Abstract

The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.

Keywords

Fault tolerance Interconnections Integrated circuit reliability Network topology 

Notes

References

  1. 1.
    Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.-C. (2016). Knights landing: Second-generation intel xeon phi product. IEEE Micro, 36 (2), 34–46.CrossRefGoogle Scholar
  2. 2.
    Davidson, S., Xie, S., Torng, C., Al-Hawai, K., Rovinski, A., Ajayi, T., Vega, L., Zhao, C., Zhao, R., Dai, S., Amarnath, A., Veluri, B., Gao, P., Rao, A., Liu, G., Gupta, R.K., Zhang, Z., Dreslinski, R., Batten, C., Taylor, M.B. (2018). The celerity open-source 511-Core RISC-V tiered accelerator fabric: fast architectures and design methodologies for fast chips. IEEE Micro, 38(2), 30–41.CrossRefGoogle Scholar
  3. 3.
    Chen, Y.-H., Yang, T.-J., Emer, J., Sze, V. (2018). Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. arXiv:1807.07928.
  4. 4.
    Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.-J., et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.CrossRefGoogle Scholar
  5. 5.
    Jerger, N.E., & Peh, L.-S. (2009). On-chip networks. Synthesis Lectures on Computer Architecture, 4(1), 1–141.CrossRefGoogle Scholar
  6. 6.
    Gaur, M.S., Laxmi, V., Zwolinski, M., Kumar, M., Gupta, N., Ashish. (2015). Network-on-chip: Current issues and challenges. In 2015 19th international symposium on VLSI design and test (pp. 1–3).Google Scholar
  7. 7.
    Ansari, A.Q., Ansari, M.R., Khan, M.A. (2015). Performance evaluation of various parameters of Network-on-Chip (NoC) for different topologies. In 2015 annual IEEE India conference (INDICON) (pp. 1–4).Google Scholar
  8. 8.
    Wang, Z., Liu, W., Xu, J., Li, B., Iyer, R., Illikkal, R., Wu, X., Mow, W.H., Ye, W. (2014). A case study on the communication and computation behaviors of real applications in NoC-based MPSoCs. In 2014 IEEE computer society annual symposium on VLSI (pp. 480–485).Google Scholar
  9. 9.
    Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z. (2014). Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Transactions on Computers, 63(6), 1446–1459.MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Loucif, S. (2013). Performance evaluation of hierarchical-torus NoC. In 2013 27th international conference on advanced information networking and applications workshops (pp. 837–842): IEEE.Google Scholar
  11. 11.
    El-Moursy, M.A., Korzec, D., Ismail, M., et al. (2009). High throughput architecture for OCTAGON network on chip. In 2009 16th IEEE international conference on electronics, circuits and systems-(ICECS 2009) (pp. 101–104): IEEE.Google Scholar
  12. 12.
    Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O. (2015). ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH computer architecture news, (Vol. 43 pp. 92–104): ACM.Google Scholar
  13. 13.
    Constantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. Micro, IEEE, 23(4), 14–19.CrossRefGoogle Scholar
  14. 14.
    Polian, I., Hayes, J.P., Reddy, S.M., Becker, B. (2011). Modeling and mitigating transient errors in logic circuits. IEEE Transactions on Dependable and Secure Computing, 8(4), 537– 547.CrossRefGoogle Scholar
  15. 15.
    Braga, M., Cota, E., Kastensmidt, F.L., Lubaszewski, M. (2010). Efficiently using data splitting and retransmission to tolerate faults in networks-onchip interconnects. In Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 4101–4104).Google Scholar
  16. 16.
    Poluri, P., & Louri, A. (2014). A soft error tolerant network-on-chip router pipeline for multi-core systems. IEEE Computer Architecture Letters, 14(2), 107–110.CrossRefGoogle Scholar
  17. 17.
    Yu, Q., Zhang, M., Ampadu, P. (2011). Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration. Pittsburgh, Pennsylvania, 105–112.Google Scholar
  18. 18.
    Chen, X., Lu, Z., Lei, Y., Wang, Y., Chen, S. (2016). Multi-bit transient fault control for NoC links using 2D fault coding method. In 2016 tenth IEEE/ACM international symposium on Networks-on-Chip (NOCS) (pp. 1–8): IEEE.Google Scholar
  19. 19.
    Chang, Y.-C., Chiu, C.-T., Lin, S.-Y., Liu, C.-K. (2011). On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 16th Asia and South Pacific design automation conference (pp. 431–436): IEEE Press.Google Scholar
  20. 20.
    Li, C., Yang, M., Ampadu, P. (2016). An energy-efficient noc router with adaptive fault-tolerance using channel slicing and on-demand tmr. IEEE Transactions on Emerging Topics in Computing, 6(4), 538–550.CrossRefGoogle Scholar
  21. 21.
    Constantinides, K., Plaza, S., Blome, J., Bin, Z., Bertacco, V., Mahlke, S., Austin, T., Orshansky, M. (2006). BulletProof: a defect-tolerant CMP switch architecture. In The twelfth international symposium on high-performance computer architecture, 2006 (pp. 5–16).Google Scholar
  22. 22.
    Xie, L., Mei, K., Li, Y. (2013). Repair: a reliable partial-redundancybased router in NoC. In 2013 IEEE eighth international conference on networking, architecture and storage (pp. 173–177): IEEE.Google Scholar
  23. 23.
    Fick, D., DeOrio, A., Jin, H., Bertacco, V., Blaauw, D., Sylvester, D. (2009). Vicis: a reliable network for unreliable silicon. In Design automation conference, 2009. DAC ’09. 46th ACM/IEEE (pp. 812–817).Google Scholar
  24. 24.
    Sung-Jui, P., & Kwang-Ting, C. (2007). A framework for system reliability analysis considering both system error tolerance and component test quality. In Design, automation & test in europe conference & exhibition, 2007. DATE ’07 (pp. 1–6).Google Scholar
  25. 25.
    Lehtonen, T., Wolpert, D., Liljeberg, P., Plosila, J., Ampadu, P. (2010). Self-adaptive system for addressing permanent errors in on-chip interconnects. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 18(4), 527–540.CrossRefGoogle Scholar
  26. 26.
    Kia, H.S., & Ababei, C. (2011). Improving fault tolerance of Network-on-Chip links via minimal redundancy and reconfiguration. In 2011 international conference on reconfigurable computing and FPGAs (ReConFig) (pp. 363–368).Google Scholar
  27. 27.
    Chatterjee, N., Chattopadhyay, S., Manna, K. (2014). A spare router based reliable network-on-chip design. In 2014 IEEE international symposium on circuits and systems (ISCAS) (pp. 1957–1960): IEEE.Google Scholar
  28. 28.
    Cheng, L., Lei, Z., Yinhe, H., Xiaowei, L. (2011). A resilient on-chip router design through data path salvaging. In 2011 16th Asia and South Pacific design automation conference (ASP-DAC) (pp. 437–442).Google Scholar
  29. 29.
    Chen, C., Fu, Y., Cotofana, S. (2016). Towards maximum utilization of remained bandwidth in defected NoC links. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(2), 285–298.CrossRefGoogle Scholar
  30. 30.
    Koibuchi, M., Matsutani, H., Amano, H., Pinkston, T.M. (2008). A lightweight fault-tolerant mechanism for Network-on-Chip. In Second ACM/IEEE international symposium on Networks-on-Chip, 2008. NoCS 2008 (pp. 13–22).Google Scholar
  31. 31.
    Castro, H.S., & de Lima, O.A. (2013). A fault tolerant NoC architecture based upon external router backup paths. In 2013 IEEE 11th international new circuits and systems conference (NEWCAS) (pp. 1–4): IEEE.Google Scholar
  32. 32.
    Khalil, K., Eldash, O., Kumar, A., Bayoumi, M. (2018). Flexible self-healing router for reliable and high-performance Network-on-Chips architecture. In 2018 31st IEEE international system-on-chip conference (SOCC) (pp. 152–157).Google Scholar
  33. 33.
    Yuan, C., Huang, L., Wang, J., Li, Q. (2018). Micro-architecture design for low overhead fault tolerant network-on-chip. In 2018 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5).Google Scholar
  34. 34.
    DiTomaso, D., Kodi, A., Louri, A. (2014). QORE: a fault tolerant network-on-chip architecture with power-efficient quad-function channel (QFC) buffers. In 2014 IEEE 20th international symposium on high performance computer architecture (HPCA) (pp. 320–331): IEEE.Google Scholar
  35. 35.
    Wang, L., Ma, S., Li, C., Chen, W., Wang, Z. (2017). A high performance reliable NoC router. Integration, 58, 583–592.CrossRefGoogle Scholar
  36. 36.
    Lei, Z., Yinhe, H., Qiang, X., Xiao-Wei, L., Huawei, L. (2009). On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 17(9), 1173–1186.CrossRefGoogle Scholar
  37. 37.
    Werner, S., Navaridas, J., Luján, M. (2016). A survey on design approaches to circumvent permanent faults in networks-on-chip. ACM Computing Surveys (CSUR), 48(4), 59.CrossRefGoogle Scholar
  38. 38.
    Cota, É., Amory, A.d.M., Lubaszewski, M.S. (2011). Reliability, availability and serviceability of networks-on-chip. Berlin: Springer.zbMATHGoogle Scholar
  39. 39.
    Ren, Y., Liu, L., Yin, S., Han, J., Wu, Q., Wei, S. (2013). A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration. Journal of Systems Architecture, 59(7), 482–491.CrossRefGoogle Scholar
  40. 40.
    Nishi, Y., & Doering, R. (2012). Handbook of semiconductor manufacturing technology. Boca Raton: CRC Press.Google Scholar
  41. 41.
    Chang, Y.-C., Huang, L.-R., Liu, H.-C., Yang, C.-J., Chiu, C.-T. (2014). Assessing automotive functional safety microprocessor with ISO 26262 hardware requirements. In Technical papers of 2014 international symposium on VLSI design, automation and test (pp. 1–4): IEEE.Google Scholar
  42. 42.
    Lu, K.-L., Chen, Y.-Y., Huang, L.-R. (2018). FMEDA-based fault injection and data analysis in compliance with ISO-26262. In 2018 48th Annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W) (pp. 275–278): IEEE.Google Scholar
  43. 43.
    Shamshiri, S., & Kwang-Ting, C. (2009). Yield and cost analysis of a reliable NoC. In VLSI test symposium, 2009. VTS ’09. 27th IEEE (pp. 173–178).Google Scholar
  44. 44.
    Carulli, J.M., & Anderson, T.J. (2006). The impact of multiple failure modes on estimating product field reliability. Design & Test of Computers, IEEE, 23(2), 118–126.CrossRefGoogle Scholar
  45. 45.
    Catania, V., Mineo, A., Monteleone, S., Palesi, M., Patti, D. (2015). Noxim: an open, extensible and cycle-accurate network on chip simulator. In 2015 IEEE 26th international conference on application-specific systems, architectures and processors (ASAP) (pp. 162–163).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceNational Tsing Hua UniversityHsinchuTaiwan
  2. 2.Department of Electrical Engineering, College of EngineeringChang Gung UniversityTaoyuanTaiwan
  3. 3.Portable Energy System Group, Green Technology Research Center, College of EngineeringChang Gung UniversityTaoyuanTaiwan
  4. 4.Department of OphthalmologyChang Gung Memorial HospitalTaoyuanTaiwan

Personalised recommendations