Diagnosis

Sorin, Daniel J.

doi:10.1007/978-3-031-01723-0_4

Daniel J. Sorin²

Part of the book series: Synthesis Lectures on Computer Architecture ((SLCA))

159 Accesses

Abstract

In the past two chapters, we have discussed how to detect errors and recover from them. For transient errors, detection and recovery are sufficient. After recovery, the transient error is no longer present and execution can resume without a problem. However, if an error is due to a permanent fault, detection and recovery may not be sufficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

4.6 References

F. A. Bower, D. J. Sorin, and S. Ozev. A Mechanism for Online Diagnosis of Hard Faults in Microprocessors. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 197–208, Nov. 2005. doi:https://doi.org/10.1109/MICRO.2005.8
A. Charlesworth. Starfire: Extending the SMP Envelope. IEEE Micro, 18(1), pp. 39–49, Jan./Feb. 1998.
Article Google Scholar
K. Constantinides, O. Mutlu, T. Austin, and V. Bertacco. Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 97–108, Dec. 2007.
Google Scholar
C. E. Leiserson et al. The Network Architecture of the Connection Machine CM-5. In Proceedings of the Fourth ACM Symposium on Parallel Algorithms and Architectures, pp. 272–285, June 1992. doi:https://doi.org/10.1145/140901.141883
M.-L. Li, P. Ramachandran, S. K. Sahoo, S. Adve, V. Adve, and Y. Zhou. Trace-Based Diagnosis of Permanent Hardware Faults. In Proceedings of the International Conference on Dependable Systems and Networks, June 2008.
Google Scholar
M.-L. Li, P. Ramachandran, S. K. Sahoo, S. Adve, V. Adve, and Y. Zhou. Understanding the Propagation of Hard Errors to Software and Implications for Resilient System Design. In Proceedings of the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2008. doi:https://doi.org/10.1145/1346281.1346315
M. Mueller, L. Alves, W. Fischer, M. Fair, and I. Modi. RAS Strategy for IBM S/390 G5 and G6. IBM Journal of Research and Development, 43(5/6), Sept./Nov. 1999.
Google Scholar
R. Rajsuman. Deisgn and Test of Large Embedded Memories: An Overview. IEEE Design & Test of Computers, pp. 16–27, May/June 2001.
Google Scholar
S. Shyam, K. Constantinides, S. Phadke, V. Bertacco, and T. Austin. Ultra Low-Cost Defect Protection for Microprocessor Pipelines. In Proceedings of the Twelfth International Conference on Architectural Support for Programming Languages and Operating Systems, Oct. 2006. doi:https://doi.org/10.1145/1168857.1168868
J. C. Smolens, B. T. Gold, J. C. Hoe, B. Falsafi, and K. Mai. Detecting Emerging Wearout Faults. In Proceedings of the Workshop on Silicon Errors in Logic—System Effects, Apr. 2007.
Google Scholar
L. Spainhower and T. A. Gregg. IBM S/390 Parallel Enterprise Server G5 Fault Tolerance: A Historical Perspective. IBM Journal of Research and Development, 43(5/6), Sept./Nov. 1999.
Google Scholar
R. Treuer and V. K. Agarwal. Built-In Self-Diagnosis for Repairable Embedded RAMs. IEEE Design & Test of Computers, pp. 24–33, June 1993. doi:https://doi.org/10.1109/54.211525

Download references

Author information

Authors and Affiliations

Duke University, USA
Daniel J. Sorin

Authors

Daniel J. Sorin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sorin, D. (2009). Diagnosis. In: Fault Tolerant Computer Architecture. Synthesis Lectures on Computer Architecture. Springer, Cham. https://doi.org/10.1007/978-3-031-01723-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-01723-0_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00595-4
Online ISBN: 978-3-031-01723-0
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 2

Publish with us

Policies and ethics