Skip to main content
Log in

Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

As multiprocessor systems become more complex, their reliability will need to increase as well. In this paper we propose a novel technique which is applicable to a wide variety of distributed real-time systems, especially those exhibiting data parallelism. System-level fault tolerance involves reliability techniques incorporated within the system hardware and software whereas application-level fault tolerance involves reliability techniques incorporated within the application software. We assert that, for high reliability, a combination of system-level fault tolerance and application-level fault tolerance works best. In many systems, application-level fault tolerance can be used to bridge the gap when system-level fault tolerance alone does not provide the required reliability. We exemplify this with the RTHT target tracking benchmark and the ABF beamforming benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D. P. Siewiorek and R. S. Swarz. Reliable Computer Systems Design and Evaluation, 2nd ed. Digital Press, Burlington, MA, 1992.

    Google Scholar 

  2. B. Randell. System Structure for Software Fault Tolerance. IEEE Transactions on Software Engineering, SE-1: 220–232, 1975.

    Google Scholar 

  3. J. W. S. Liu, W. Shih, K. Lin, R. Bettati, and J. Chung. Imprecise Computations. Proceedings of the IEEE, 82(1): 83–93, Jan. 1994.

    Google Scholar 

  4. N. A. Speirs and P. A. Barrett. Using Passive replicates in Delta-4 to Provide Dependable Distributed Computing. Proceedings of the Nineteenth International Symposium on Fault-Tolerant Computing, 1989, pp. 184–190.

  5. A. L. Liestman and R. H. Campbell. A Fault-Tolerant Scheduling Problem. IEEE Transactions on Software Engineering, SE-12: 1089–1095, Nov. 1986.

  6. B. VanVoorst, R. Jha, L. Pires, M. Muhammad. Implementation and Results of Hypothesis Testing from the C3I Parallel Benchmark Suite. Proceedings of the 11th International Parallel Processing Symposium, 1997.

  7. D. A. Castanon and R. Jha. Multi-Hypothesis Tracking (Draft). DARPA Real-Time Benchmarks, Technical Information Report (A006), 1997.

  8. R. Hamza, Honeywell Technology Center. Sonar Adaptive Beamformer (Draft). DARPA Real-Time Benchmarks, Primary Technical Information Report, 1998.

  9. M. Allalouf, J. Chang, G. Durairaj, V. R. Lakamraju, O. S. Unsal, I. Koren, C. M. Krishna. RAPIDS: A Simulator Testbed for Distributed Real-Time Systems. Advanced Simulation and Technology Conference, 1998, pp. 191–196.

  10. C. M. Krishna and K. G. Shin Real-Time Systems, McGraw Hill, New York, NY, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haines, J., Lakamraju, V., Koren, I. et al. Application-Level Fault Tolerance as a Complement to System-Level Fault Tolerance. The Journal of Supercomputing 16, 53–68 (2000). https://doi.org/10.1023/A:1008181429693

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008181429693

Navigation