Abstract
The aggressively scaled CMOS technology is increasingly threatening the dependability of network-on-chips (NoCs) architecture. In a mesh-based NoC, a faulty router or broken link may isolate a well functional processing element (PE). Also, a set of faulty routers may form isolated regions, which can degrade the design. In this paper, we propose a router-level redundancy (RLR) fault-tolerant scheme that differs from the traditional microarchitecture-level redundancy (MLR) approach to relieve the problem of isolated PE and isolated region. By simply adding one spare router within each router set in a mesh, RLR can be created and connection paths between adjacent routers can be diversified. To exploit this extra resource, two reconfiguration algorithms are demonstrated to detour observed faulty routers/links. The proposed RLR fault-tolerant scheme can tolerate at most one faulty router within a router set. After the reconfiguration, the original mesh topology is maintained. As a result, the proposed architecture does not need any support from the network layer routing algorithms. The scheme has been evaluated based on the three fault-tolerant metrics: reliability, mean time to failure (MTTF), and yield. The experimental results show that the performance RLR increases as the size of NoC grows; however, the relative connection cost decreases at the same time. This characteristic makes our architecture suitable for large-scale NoC designs.
Similar content being viewed by others
References
Sodani, A., Gramunt, R., Corbal, J., Kim, H.-S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.-C. (2016). Knights landing: Second-generation intel xeon phi product. IEEE Micro, 36 (2), 34–46.
Davidson, S., Xie, S., Torng, C., Al-Hawai, K., Rovinski, A., Ajayi, T., Vega, L., Zhao, C., Zhao, R., Dai, S., Amarnath, A., Veluri, B., Gao, P., Rao, A., Liu, G., Gupta, R.K., Zhang, Z., Dreslinski, R., Batten, C., Taylor, M.B. (2018). The celerity open-source 511-Core RISC-V tiered accelerator fabric: fast architectures and design methodologies for fast chips. IEEE Micro, 38(2), 30–41.
Chen, Y.-H., Yang, T.-J., Emer, J., Sze, V. (2018). Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. arXiv:1807.07928.
Akopyan, F., Sawada, J., Cassidy, A., Alvarez-Icaza, R., Arthur, J., Merolla, P., Imam, N., Nakamura, Y., Datta, P., Nam, G.-J., et al. (2015). Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 34(10), 1537–1557.
Jerger, N.E., & Peh, L.-S. (2009). On-chip networks. Synthesis Lectures on Computer Architecture, 4(1), 1–141.
Gaur, M.S., Laxmi, V., Zwolinski, M., Kumar, M., Gupta, N., Ashish. (2015). Network-on-chip: Current issues and challenges. In 2015 19th international symposium on VLSI design and test (pp. 1–3).
Ansari, A.Q., Ansari, M.R., Khan, M.A. (2015). Performance evaluation of various parameters of Network-on-Chip (NoC) for different topologies. In 2015 annual IEEE India conference (INDICON) (pp. 1–4).
Wang, Z., Liu, W., Xu, J., Li, B., Iyer, R., Illikkal, R., Wu, X., Mow, W.H., Ye, W. (2014). A case study on the communication and computation behaviors of real applications in NoC-based MPSoCs. In 2014 IEEE computer society annual symposium on VLSI (pp. 480–485).
Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z. (2014). Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Transactions on Computers, 63(6), 1446–1459.
Loucif, S. (2013). Performance evaluation of hierarchical-torus NoC. In 2013 27th international conference on advanced information networking and applications workshops (pp. 837–842): IEEE.
El-Moursy, M.A., Korzec, D., Ismail, M., et al. (2009). High throughput architecture for OCTAGON network on chip. In 2009 16th IEEE international conference on electronics, circuits and systems-(ICECS 2009) (pp. 101–104): IEEE.
Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O. (2015). ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH computer architecture news, (Vol. 43 pp. 92–104): ACM.
Constantinescu, C. (2003). Trends and challenges in VLSI circuit reliability. Micro, IEEE, 23(4), 14–19.
Polian, I., Hayes, J.P., Reddy, S.M., Becker, B. (2011). Modeling and mitigating transient errors in logic circuits. IEEE Transactions on Dependable and Secure Computing, 8(4), 537– 547.
Braga, M., Cota, E., Kastensmidt, F.L., Lubaszewski, M. (2010). Efficiently using data splitting and retransmission to tolerate faults in networks-onchip interconnects. In Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 4101–4104).
Poluri, P., & Louri, A. (2014). A soft error tolerant network-on-chip router pipeline for multi-core systems. IEEE Computer Architecture Letters, 14(2), 107–110.
Yu, Q., Zhang, M., Ampadu, P. (2011). Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration. Pittsburgh, Pennsylvania, 105–112.
Chen, X., Lu, Z., Lei, Y., Wang, Y., Chen, S. (2016). Multi-bit transient fault control for NoC links using 2D fault coding method. In 2016 tenth IEEE/ACM international symposium on Networks-on-Chip (NOCS) (pp. 1–8): IEEE.
Chang, Y.-C., Chiu, C.-T., Lin, S.-Y., Liu, C.-K. (2011). On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 16th Asia and South Pacific design automation conference (pp. 431–436): IEEE Press.
Li, C., Yang, M., Ampadu, P. (2016). An energy-efficient noc router with adaptive fault-tolerance using channel slicing and on-demand tmr. IEEE Transactions on Emerging Topics in Computing, 6(4), 538–550.
Constantinides, K., Plaza, S., Blome, J., Bin, Z., Bertacco, V., Mahlke, S., Austin, T., Orshansky, M. (2006). BulletProof: a defect-tolerant CMP switch architecture. In The twelfth international symposium on high-performance computer architecture, 2006 (pp. 5–16).
Xie, L., Mei, K., Li, Y. (2013). Repair: a reliable partial-redundancybased router in NoC. In 2013 IEEE eighth international conference on networking, architecture and storage (pp. 173–177): IEEE.
Fick, D., DeOrio, A., Jin, H., Bertacco, V., Blaauw, D., Sylvester, D. (2009). Vicis: a reliable network for unreliable silicon. In Design automation conference, 2009. DAC ’09. 46th ACM/IEEE (pp. 812–817).
Sung-Jui, P., & Kwang-Ting, C. (2007). A framework for system reliability analysis considering both system error tolerance and component test quality. In Design, automation & test in europe conference & exhibition, 2007. DATE ’07 (pp. 1–6).
Lehtonen, T., Wolpert, D., Liljeberg, P., Plosila, J., Ampadu, P. (2010). Self-adaptive system for addressing permanent errors in on-chip interconnects. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 18(4), 527–540.
Kia, H.S., & Ababei, C. (2011). Improving fault tolerance of Network-on-Chip links via minimal redundancy and reconfiguration. In 2011 international conference on reconfigurable computing and FPGAs (ReConFig) (pp. 363–368).
Chatterjee, N., Chattopadhyay, S., Manna, K. (2014). A spare router based reliable network-on-chip design. In 2014 IEEE international symposium on circuits and systems (ISCAS) (pp. 1957–1960): IEEE.
Cheng, L., Lei, Z., Yinhe, H., Xiaowei, L. (2011). A resilient on-chip router design through data path salvaging. In 2011 16th Asia and South Pacific design automation conference (ASP-DAC) (pp. 437–442).
Chen, C., Fu, Y., Cotofana, S. (2016). Towards maximum utilization of remained bandwidth in defected NoC links. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 36(2), 285–298.
Koibuchi, M., Matsutani, H., Amano, H., Pinkston, T.M. (2008). A lightweight fault-tolerant mechanism for Network-on-Chip. In Second ACM/IEEE international symposium on Networks-on-Chip, 2008. NoCS 2008 (pp. 13–22).
Castro, H.S., & de Lima, O.A. (2013). A fault tolerant NoC architecture based upon external router backup paths. In 2013 IEEE 11th international new circuits and systems conference (NEWCAS) (pp. 1–4): IEEE.
Khalil, K., Eldash, O., Kumar, A., Bayoumi, M. (2018). Flexible self-healing router for reliable and high-performance Network-on-Chips architecture. In 2018 31st IEEE international system-on-chip conference (SOCC) (pp. 152–157).
Yuan, C., Huang, L., Wang, J., Li, Q. (2018). Micro-architecture design for low overhead fault tolerant network-on-chip. In 2018 IEEE international symposium on circuits and systems (ISCAS) (pp. 1–5).
DiTomaso, D., Kodi, A., Louri, A. (2014). QORE: a fault tolerant network-on-chip architecture with power-efficient quad-function channel (QFC) buffers. In 2014 IEEE 20th international symposium on high performance computer architecture (HPCA) (pp. 320–331): IEEE.
Wang, L., Ma, S., Li, C., Chen, W., Wang, Z. (2017). A high performance reliable NoC router. Integration, 58, 583–592.
Lei, Z., Yinhe, H., Qiang, X., Xiao-Wei, L., Huawei, L. (2009). On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 17(9), 1173–1186.
Werner, S., Navaridas, J., Luján, M. (2016). A survey on design approaches to circumvent permanent faults in networks-on-chip. ACM Computing Surveys (CSUR), 48(4), 59.
Cota, É., Amory, A.d.M., Lubaszewski, M.S. (2011). Reliability, availability and serviceability of networks-on-chip. Berlin: Springer.
Ren, Y., Liu, L., Yin, S., Han, J., Wu, Q., Wei, S. (2013). A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration. Journal of Systems Architecture, 59(7), 482–491.
Nishi, Y., & Doering, R. (2012). Handbook of semiconductor manufacturing technology. Boca Raton: CRC Press.
Chang, Y.-C., Huang, L.-R., Liu, H.-C., Yang, C.-J., Chiu, C.-T. (2014). Assessing automotive functional safety microprocessor with ISO 26262 hardware requirements. In Technical papers of 2014 international symposium on VLSI design, automation and test (pp. 1–4): IEEE.
Lu, K.-L., Chen, Y.-Y., Huang, L.-R. (2018). FMEDA-based fault injection and data analysis in compliance with ISO-26262. In 2018 48th Annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W) (pp. 275–278): IEEE.
Shamshiri, S., & Kwang-Ting, C. (2009). Yield and cost analysis of a reliable NoC. In VLSI test symposium, 2009. VTS ’09. 27th IEEE (pp. 173–178).
Carulli, J.M., & Anderson, T.J. (2006). The impact of multiple failure modes on estimating product field reliability. Design & Test of Computers, IEEE, 23(2), 118–126.
Catania, V., Mineo, A., Monteleone, S., Palesi, M., Patti, D. (2015). Noxim: an open, extensible and cycle-accurate network on chip simulator. In 2015 IEEE 26th international conference on application-specific systems, architectures and processors (ASAP) (pp. 162–163).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chang, YC., Gong, CS.A. & Chiu, CT. Fault-Tolerant Mesh-Based NoC with Router-Level Redundancy. J Sign Process Syst 92, 345–355 (2020). https://doi.org/10.1007/s11265-019-01476-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-019-01476-3