Abstract
In Chapter 4, we discussed how to diagnose permanent faults. Diagnosis, by itself, is not useful, though. Diagnosis is useful when it is combined with the ability of a processor to repair itself. In this chapter, we discuss some of the many ways in which a processor can perform self-repair. The unifying theme to all self-repair schemes is that they require physical redundancy. Without physical redundancy, no self-repair is possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
5.6 References
N. Aggarwal, P. Ranganathan, N. P. Jouppi, and J. E. Smith. Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 470–481, June 2007.
D. K. Bhavsar. An Algorithm for Row-Column Self-Repair of RAMs and Its Implementation in the Alpha 21264. In Proceedings of the International Test Conference, pp. 311–318, 1999. doi:https://doi.org/10.1109/TEST.1999.805645
F. A. Bower, S. Ozev, and D. J. Sorin. Autonomic Microprocessor Execution via Self-Repairing Arrays. IEEE Transactions on Dependable and Secure Computing, 2(4), pp. 297–310, Oct.-Dec. 2005. doi:https://doi.org/10.1109/TDSC.2005.44
A. Charlesworth. Starfire: Extending the SMP Envelope. IEEE Micro, 18(1), pp. 39–49, Jan./Feb. 1998.
T. Chen and G. Sunada. A Self-Testing and Self-Repairing Structure for Ultra-Large Capacity Memories. In Proceedings of the International Test Conference, pp. 623–631, Oct. 1992. doi:https://doi.org/10.1109/TEST.1992.527883
M. Gschwind et al. Synergistic Processing in Cell’s Multicore Architecture. IEEE Micro, 26(2), pp. 10–24, Mar./Apr. 2006.
S. Gupta, S. Feng, A. Ansari, J. Blome, and S. Mahlke. The StageNet Fabric for Constructing Reslilient Multicore Systems. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pp. 141–151, Nov. 2008.
R. Joseph. Exploring Core Salvage Techniques for Multi-core Architectures. In Proceedings of the Workshop on High Performance Computing Reliability Issues, Feb. 2005.
P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE Micro, 25(2), pp. 21–29, Mar./Apr. 2005. doi:https://doi.org/10.1109/MM.2005.35
A. Meixner and D. J. Sorin. Detouring: Translating Software to Circumvent Hard Faults in Simple Cores. In Proceedings of the International Conference on Dependable Systems and Networks, June 2008.
R. Rajsuman. Design and Test of Large Embedded Memories: An Overview. IEEE Design & Test of Computers, pp. 16–27, May/June 2001. doi:https://doi.org/10.1109/54.922800
B. F. Romanescu and D. J. Sorin. Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults. In Seventeenth International Conference on Parallel Architectures and Compilation Techniques, Oct. 2008.
E. Schuchman and T. N. Vijaykumar. Rescue: A Microarchitecture for Testability and Defect Tolerance. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 160–171, June 2005. doi:https://doi.org/10.1109/ISCA.2005.44
S. E. Schuster. Multiple Word/Bit Line Redundancy for Semiconductor Memories. IEEE Journal of Solid-State Circuits, SC-13(5), pp. 698–703, Oct. 1978. doi:https://doi.org/10.1109/JSSC.1978.1051122
S. L. Scott. Synchronization and Communication in the Cray T3E Multiprocessor. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 26–36, Oct. 1996.
L. Seiler et al. Larrabee: A Many-Core x86 Architecture for Visual Computing. In Proceedings of ACM SIGGRAPH, 2008.
M. Shah et al. UltraSPARC T2: A Highly-Threaded, Power-Efficient, SPARC SOC. In Proceedings of the IEEE Asian Solid-State Circuits Conference, pp. 22–25, Nov. 2007.
P. Shivakumar, S. W. Keckler, C. R. Moore, and D. Burger. Exploiting Microarchitectural Redundancy For Defect Tolerance. In Proceedings of the 21st International Conference on Computer Design, Oct. 2003.
L. Spainhower and T. A. Gregg. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. IBM Journal of Research and Development, 43(5/6), Sept./Nov. 1999.
J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. Exploiting Structural Duplication for Lifetime Reliability Enhancement. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, June 2005. doi:https://doi.org/10.1109/ISCA.2005.28
C. Wilkerson et al. Trading off Cache Capacity for Reliability to Enable Low Voltage Operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 203–214, June 2008.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sorin, D. (2009). Self-Repair. In: Fault Tolerant Computer Architecture. Synthesis Lectures on Computer Architecture. Springer, Cham. https://doi.org/10.1007/978-3-031-01723-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-01723-0_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00595-4
Online ISBN: 978-3-031-01723-0
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 2