Skip to main content

Part of the book series: Synthesis Lectures on Computer Architecture ((SLCA))

  • 160 Accesses

Abstract

In Chapter 4, we discussed how to diagnose permanent faults. Diagnosis, by itself, is not useful, though. Diagnosis is useful when it is combined with the ability of a processor to repair itself. In this chapter, we discuss some of the many ways in which a processor can perform self-repair. The unifying theme to all self-repair schemes is that they require physical redundancy. Without physical redundancy, no self-repair is possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

5.6 References

  1. N. Aggarwal, P. Ranganathan, N. P. Jouppi, and J. E. Smith. Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors. In Proceedings of the 34th Annual International Symposium on Computer Architecture, pp. 470–481, June 2007.

    Google Scholar 

  2. D. K. Bhavsar. An Algorithm for Row-Column Self-Repair of RAMs and Its Implementation in the Alpha 21264. In Proceedings of the International Test Conference, pp. 311–318, 1999. doi:https://doi.org/10.1109/TEST.1999.805645

  3. F. A. Bower, S. Ozev, and D. J. Sorin. Autonomic Microprocessor Execution via Self-Repairing Arrays. IEEE Transactions on Dependable and Secure Computing, 2(4), pp. 297–310, Oct.-Dec. 2005. doi:https://doi.org/10.1109/TDSC.2005.44

    Article  Google Scholar 

  4. A. Charlesworth. Starfire: Extending the SMP Envelope. IEEE Micro, 18(1), pp. 39–49, Jan./Feb. 1998.

    Article  Google Scholar 

  5. T. Chen and G. Sunada. A Self-Testing and Self-Repairing Structure for Ultra-Large Capacity Memories. In Proceedings of the International Test Conference, pp. 623–631, Oct. 1992. doi:https://doi.org/10.1109/TEST.1992.527883

  6. M. Gschwind et al. Synergistic Processing in Cell’s Multicore Architecture. IEEE Micro, 26(2), pp. 10–24, Mar./Apr. 2006.

    Article  Google Scholar 

  7. S. Gupta, S. Feng, A. Ansari, J. Blome, and S. Mahlke. The StageNet Fabric for Constructing Reslilient Multicore Systems. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture, pp. 141–151, Nov. 2008.

    Google Scholar 

  8. R. Joseph. Exploring Core Salvage Techniques for Multi-core Architectures. In Proceedings of the Workshop on High Performance Computing Reliability Issues, Feb. 2005.

    Google Scholar 

  9. P. Kongetira, K. Aingaran, and K. Olukotun. Niagara: A 32-Way Multithreaded SPARC Processor. IEEE Micro, 25(2), pp. 21–29, Mar./Apr. 2005. doi:https://doi.org/10.1109/MM.2005.35

    Article  Google Scholar 

  10. A. Meixner and D. J. Sorin. Detouring: Translating Software to Circumvent Hard Faults in Simple Cores. In Proceedings of the International Conference on Dependable Systems and Networks, June 2008.

    Google Scholar 

  11. R. Rajsuman. Design and Test of Large Embedded Memories: An Overview. IEEE Design & Test of Computers, pp. 16–27, May/June 2001. doi:https://doi.org/10.1109/54.922800

  12. B. F. Romanescu and D. J. Sorin. Core Cannibalization Architecture: Improving Lifetime Chip Performance for Multicore Processors in the Presence of Hard Faults. In Seventeenth International Conference on Parallel Architectures and Compilation Techniques, Oct. 2008.

    Google Scholar 

  13. E. Schuchman and T. N. Vijaykumar. Rescue: A Microarchitecture for Testability and Defect Tolerance. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, pp. 160–171, June 2005. doi:https://doi.org/10.1109/ISCA.2005.44

  14. S. E. Schuster. Multiple Word/Bit Line Redundancy for Semiconductor Memories. IEEE Journal of Solid-State Circuits, SC-13(5), pp. 698–703, Oct. 1978. doi:https://doi.org/10.1109/JSSC.1978.1051122

    Article  Google Scholar 

  15. S. L. Scott. Synchronization and Communication in the Cray T3E Multiprocessor. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 26–36, Oct. 1996.

    Google Scholar 

  16. L. Seiler et al. Larrabee: A Many-Core x86 Architecture for Visual Computing. In Proceedings of ACM SIGGRAPH, 2008.

    Google Scholar 

  17. M. Shah et al. UltraSPARC T2: A Highly-Threaded, Power-Efficient, SPARC SOC. In Proceedings of the IEEE Asian Solid-State Circuits Conference, pp. 22–25, Nov. 2007.

    Google Scholar 

  18. P. Shivakumar, S. W. Keckler, C. R. Moore, and D. Burger. Exploiting Microarchitectural Redundancy For Defect Tolerance. In Proceedings of the 21st International Conference on Computer Design, Oct. 2003.

    Google Scholar 

  19. L. Spainhower and T. A. Gregg. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. IBM Journal of Research and Development, 43(5/6), Sept./Nov. 1999.

    Google Scholar 

  20. J. Srinivasan, S. V. Adve, P. Bose, and J. A. Rivers. Exploiting Structural Duplication for Lifetime Reliability Enhancement. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, June 2005. doi:https://doi.org/10.1109/ISCA.2005.28

  21. C. Wilkerson et al. Trading off Cache Capacity for Reliability to Enable Low Voltage Operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture, pp. 203–214, June 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Nature Switzerland AG

About this chapter

Cite this chapter

Sorin, D. (2009). Self-Repair. In: Fault Tolerant Computer Architecture. Synthesis Lectures on Computer Architecture. Springer, Cham. https://doi.org/10.1007/978-3-031-01723-0_5

Download citation

Publish with us

Policies and ethics