Two techniques for transient software error recovery

  • Yennun Huang
  • Pankaj Jalote
  • Chandra Kintala
Software Architectures for Fault Tolerance
Part of the Lecture Notes in Computer Science book series (LNCS, volume 774)


The traditional approaches for fault tolerance in software — the recovery block approach and the N-version programming — are too expensive, and consequently of limited practical use. Experience has shown that techniques, such as rollback and retry, that do not employ multiple versions of software are able to mask a range of software faults that exhibit transient software failures. These techniques are cost effective as they do not employ design diversity for supporting fault tolerance. In this report we discuss two such techniques that can be used to enhance the reliability of software systems.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    P. E. Ammann and J. C. Knight. Data diversity: an approach to software fault tolerance. In Dij. of papers: 17th Int. Conf. on Fault Tolerant Comput. Sys., pages 122–126, Pittsburgh, 1987.Google Scholar
  2. [2]
    A. Avizienis. The n-version approach to fault tolerant software. IEEE Tran. on Software Engg., SE-11(12):1491–1501, Dec 1985.Google Scholar
  3. [3]
    J. F. Bartlett. A nonstop kernel. In Proc. of 7th ACM Symp. on Operating Sys., pages 22–29, 1981.Google Scholar
  4. [4]
    A. Borg, J. Baumback, and S. Galzer. A message system supporting fault tolerance. In 9th ACM Symp. on Op. Sys. Principles, Op. Sys. Review, 17:5, pages 90–99, 1983.Google Scholar
  5. [5]
    J. Gray. Why do computers stop and what can be done about it? Technical Report 85.7, Tandem Computers, Cupertino, CA, June 1985.Google Scholar
  6. [6]
    D. Gupta and P. Jalote Increasing system availability through on-line software version change. 23rd Int. Conf. on Fault Tolerance Computing Systems, Toulouse, France, pages 30–35, June 1993.Google Scholar
  7. [7]
    F. Cristian. Exception handling and software fault tolerance. IEEE Tran. on Comput., C-31(6):531–540, June 1982.Google Scholar
  8. [8]
    F. Cristian. Correct and robust programs. IEEE Tran. on Soft. Engg., SE-10(2):163–174, March 1984.Google Scholar
  9. [9]
    Y. Huang and C. M. R. Kintala. Software implemented fault tolerance: technologies and experience. 23rd Int. Conf. on Fault Tolerance Computing Systems, Toulouse, France, pages 2–9, June 1993.Google Scholar
  10. [10]
    G. Fowler and Y. Huang and D. Korn and H. C. Rao, “A User-Level Replicated File System,” Proceedings of Summer USENIX, pages 279–290, June, 1993.Google Scholar
  11. [11]
    P. Jalote. Fault tolerant processes. Distributed Computing, 3:187–195, 1989.CrossRefGoogle Scholar
  12. [12]
    D. B. Johnson and W. Zwaenepoel. Sender-based message logging. In Dij. of Papers, 17th Int. Conf. on Fault Tolerant Computing Sys., pages 14–19, 1987.Google Scholar
  13. [13]
    D. B. Johnson and W. Zwaenepoel. Recovery in distributed systems using optimistic message logging an d checkpointing. Journal of Algorithms, 11:462–491, 1990.CrossRefGoogle Scholar
  14. [14]
    J. C. Knight and N. G. Leveson. An experimental evaluation of the assumption of independence in multiversion programming. IEEE Tran. on Soft. Engg., SE-12(1):96–109, Jan 1986.Google Scholar
  15. [15]
    B. Randell. System structure for software fault tolerance. IEEE Tran. on Software Engg., SE-1:220–232, June 1975.Google Scholar
  16. [16]
    M. E. Segal and O. Frieder. On-the-fly modification: systems for dynamic updating. IEEE Software, pp. 53–65, March 1993.Google Scholar
  17. [17]
    R. E. Strom and S. Yemini. Optimistic recovery: an asynchronous approach to fault tolerance in distributed systems. In Proc. of 14th Symp. of Fault Tolerant Computing, pages 374–379, 1984.Google Scholar
  18. [18]
    Y. Wang, Y. Huang and K. Fuchs, “Progressive retry for software errors,” 23rd International Symposium on Fault Tolerant Computer Systems (FTCS-23), Toulouse, France, pages 138–144, June 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • Yennun Huang
    • 1
  • Pankaj Jalote
    • 2
  • Chandra Kintala
    • 1
  1. 1.AT&T Bell LaboratoriesMurray Hill
  2. 2.Indian Institute of TechnologyKanpurIndia

Personalised recommendations