Abstract
Future many-core processors may contain more than 1000 cores on single die. However, continued scaling of silicon fabrication technology exposes chip orders of such magnitude to a higher vulnerability to errors. A low-overhead and adaptive fault-tolerance mechanism is desired for general-purpose many-core processors. We propose high-level adaptive redundancy (HLAR), which possesses several unique properties. First, the technique employs selective redundancy based application assistance and dynamically cores schedule. Second, the method requires minimal overhead when the mechanism is disabled. Third, it expands the local memory within the replication sphere, which heightens the replication level and simplifies the redundancy mechanism. Finally, it decreases bandwidth through various compression methods, thus effectively balancing reliability, performance, and power. Experimental results show a remarkably low overhead while covering 99.999% errors with only 0.25% more networks-on-chip traffic.
Chapter PDF
References
Borkar, S.: Thousand core chips: a technology perspective. In: Proceedings of the 44th Annual Design Automation Conference (June 2007)
Srinivasan, J., Adve, S.V., Bose, P., Rivers, J.A.: The impact of technology scaling on lifetime reliability. In: Intl. Conf. on DSN (June 2004)
Subramanyan, P., Singh, V., Saluja, K.K., Larsson, E.: Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding. In: Intl. Conf. on Dependable Systems and Networks (June 2010)
Wells, P.M., Chakraborty, K., Sohi, G.S.: Mixed-mode multicore reliability. In: Intl. Conf. on ASPLOS (March 2009)
de Kruijf, M., Nomura, S., Sankaralingam, K.: Relax: An architectural framework for software recovery of hardware faults. In: ISCA (2010)
LaFrieda, C., Ipek, E., Martinez, J.F., Manohar, R.: Utilizing dynamically coupled cores to form a resilient chip multiprocessor. In: Intl. Conf. on DSN (2007)
Smolens, J.C., Gold, B.T., Kim, J., Falsafi, B., Hoe, J.C., Nowatzyk, A.G.: Fingerprinting: bounding soft-error detection latency and bandwidth. In: Intl. Conf. on ASPLOS (October 2004)
Lampret, D.: OpenRISC 1200 IP Core Specification (September 2001), http://www.opencores.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jia, W., Li, R., Zhang, C. (2013). An Adaptive Low-Overhead Mechanism for Dependable General-Purpose Many-Core Processors. In: Mustofa, K., Neuhold, E.J., Tjoa, A.M., Weippl, E., You, I. (eds) Information and Communication Technology. ICT-EurAsia 2013. Lecture Notes in Computer Science, vol 7804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36818-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-36818-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36817-2
Online ISBN: 978-3-642-36818-9
eBook Packages: Computer ScienceComputer Science (R0)