Advertisement

Real-Time Systems

, Volume 15, Issue 2, pp 149–181 | Cite as

Fault-Tolerant Rate-Monotonic Scheduling

  • Sunondo Ghosh
  • Rami Melhem
  • Daniel Mossé
  • Joydeep Sen Sarma
Article

Abstract

Due to the critical nature of the tasks in hard real-time systems, it is essential that faults be tolerated. In this paper, we present a scheme which can be used to tolerate faults during the execution of preemptive real-time tasks. We describe a recovery scheme which can be used to re-execute tasks in the event of single and multiple transient faults and discuss conditions that must be met by any such recovery scheme. We then extend the original Rate Monotonic Scheduling (RMS) scheme and the exact characterization of RMS to provide tolerance for single and multiple transient faults. We derive schedulability bounds for sets of real-time tasks given the desired level of fault tolerance for each task or subset of tasks. Finally, we analyze and compare those bounds with existing bounds for non-fault-tolerant and other variations of RMS.

Keywords

Operating System Fault Tolerance Computing Methodology Recovery Scheme Transient Fault 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Audsley, N. C., Burns, A., Richardson, M. F., and Tindell, K. 1993. Applying new scheduling theory to static priority pre-emptive scheduling. Software Engineering Journal 8(5): 284–292.Google Scholar
  2. Sha, L., Sprunt, B., and Lehoczky, J. 1989. Aperiodic task scheduling for hard-real-time systems. Journal of Real-Time Systems: 27–60.Google Scholar
  3. Baldwin, J. T. 1995. Predicting and estimating real-time performance. Embedded Systems Programming 8(2).Google Scholar
  4. Burns, A., Davis, R., and Punnekkat, S. 1996. Feasibility analysis of fault-tolerant real-time task sets. In 8th Euromicro Workshop on Real-Time Systems.Google Scholar
  5. Campbell, A., McDonald, P., and Ray, K. 1992. Single event upset rates in space. IEEE Trans. on Nuclear Science 39(6):1828–1835.Google Scholar
  6. Carpenter, T., Driscoll, K., Hoyme, K., and Carciofini, J. 1994. ARINC 659 scheduling: problem definition. In Real-Time Systems Symposium, IEEE, pp. 165–169.Google Scholar
  7. Castillo, X., McConnel, S. R., and Siewiorek, D. P. 1982. Derivation and caliberation of a transient error reliability model. IEEE Trans. on Computers C-31(7): 658–671.Google Scholar
  8. Doyle, L., and Elzey, J. 1994. Successful use of rate monotonic theory on a formidable real time system. In 11th IEEE Workshop on Real-Time Operating Systems and Software, pp. 74–78.Google Scholar
  9. Gaisler, J. 1994. Concurrent error-detection and modular fault-tolerance in a 32-bit processing core for embedded space flight applications. In Symp. on Fault Tolerant Computing (FTCS-24), IEEE, pp. 128–130.Google Scholar
  10. Ghosh, S. 1996. Guaranteeing fault tolerance through scheduling in real-time systems. Ph.D. Thesis, University of Pittsburgh, ftp://cs.pitt.edu/realtime/ghosh-diss.ps.gz.Google Scholar
  11. Ghosh, S., Melhem, R., and Mossé, D. 1995. Enhancing real-time schedules to tolerate transient faults. In Real-Time Systems Symposium.Google Scholar
  12. Ghosh, S., Melhem, R., and Mossé, D. 1997. Fault tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems. 8(3):272–284.Google Scholar
  13. Hecht, M. S., Hammer, J. B., Locke, C. D., Dehn, J. D., and Bohlmann, R. 1994. Rate monotonic analysis of a large, distributed system. In IEEE Workshop on Real-Time Applications, IEEE, pp. 4–7.Google Scholar
  14. Iyer, R. K., Rossetti, D. J., and Hsueh, M. C. 1986. Measurement and modeling of computer reliability as affected by system activity. ACM Trans. on Computer Systems 4(3):214–237.Google Scholar
  15. Johnson, B. W. 1989. Design and Analysis of Fault Tolerant Digital Systems. Addison Wesley Pub. Co., Inc.Google Scholar
  16. Kane, J. R., and Yau, S. S. 1975. Concurrent software fault detection. IEEE Transactions on Software Engineering SE-1(1):87–99.Google Scholar
  17. Kopetz, H. 1995. Automotive electronics-present state and future prospects. In FTCS 25.Google Scholar
  18. Kopetz, H., Kantz, H., Grunsteidl, G., Puschner, P., and Reisinger, J. 1990. Tolerating transient faults in MARS. In Symp. on Fault Tolerant Computing (FTCS-20), IEEE, pp. 466–473.Google Scholar
  19. Krishna, C. M., and Shin, K. 1986. On scheduling tasks with a quick recovery from failure. IEEE Trans on Computers 35(5):448–455.Google Scholar
  20. Krishna, C. M., and Singh, A. D. 1993. Reliability of checkpointed real-time systems using time redundancy. IEEE Trans. on Reliability 42(3):427–435.Google Scholar
  21. Lachenmaier, R., and Stretch, T. 1994. The IEEE scalable coherent interface: An approach for a unified avionics network. In Advanced Packaging Concepts for Digital Avionics.Google Scholar
  22. Lehoczky, J. P., Sha, L., and Ding, Y. 1989. The rate monotonic scheduling algorithm: Exact characterization and average case behavior. In Real Time Systems Symposium, pp. 166–171.Google Scholar
  23. Lehoczky, J. P., Sha, L., and Strosnider, J. K. 1987. Enhanced aperiodic responsiveness in hard-real-time environments. In Real-Time Systems Symposium, pp. 261–270.Google Scholar
  24. Liestman, A. L., and Campbell, R. H. 1988. A fault-tolerant scheduling problem. Trans Software Engineering SE-12(11):1089–1095.Google Scholar
  25. Liu, C. L., and Layland, J. 1973. Scheduling algorithm for multiprogramming in a hard real-time environment. Journal of the ACM 20(1):46–61.Google Scholar
  26. Liu, J. W., Lin, K., Liu, C. L., and Gear, C. W. 1989. Research on imprecise computations in project quartZ. In Proc. of Workshop on Operating Systems for Mission Critical Computing.Google Scholar
  27. Locke, C. D., Vogel, D. R., and Mesler, T. J. 1991. Building a predictable avionics platform in Ada: A case study. In Real-Time Systems Symposium, IEEE, pp. 181–189.Google Scholar
  28. Mahmood, A., and McCluskey, E. J. 1988. Concurrent error detection using watchdog processors—A survey. IEEE Transactions on Computers 37(2):160–174.Google Scholar
  29. Mehra, A., Rexford, J., Ang, H. S., and Jahanian, F. 1995. Design and evaluation of a window-consistent replication service. In Proc. Real-Time Technology and Applications Symp. Google Scholar
  30. Miremadi, G., and Torin, J. 1995. Evaluating processor-behavior and three error-detection mechanisms using physical fault-injection. IEEE Transactions on Reliability 44(3):441–453.Google Scholar
  31. Mossé, D., Melhem, R., and Ghosh, S. 1994. Analysis of a fault-tolerant multiprocessor scheduling algorithm. In 24 th Int'l Symposium on Fault-Tolerant Computing, Austin, TX, IEEE.Google Scholar
  32. Oh, S. K., and MacEwen, G. 1992. Toward fault-tolerant adaptive real-time distributed systems. External Technical Report 92–325, Department of Computing and Information Science, Queen's University, Kingston, Ontario, Canada.Google Scholar
  33. Oh, Y. 1994. The design and analysis of scheduling algorithms for real-time and fault-tolerant computer systems. Ph.D. Thesis, University of Virginia.Google Scholar
  34. Oh, Y., and Son, S. H. 1994. Enhancing fault-tolerance in rate-monotonic Scheduling. The Journal of Real-Time Systems 7(3):315–329.Google Scholar
  35. Pandya, M., and Malek, M. 1994. Minimum achievable utilization for fault-tolerant processing of periodic tasks. Technical Report TR 94–07, Univ of Texas at Austin, Dept of Computer Science.Google Scholar
  36. Pradhan, D. K. 1986. Fault Tolerant Computing: Theory and Techniques. Prentice-Hall, NJ.Google Scholar
  37. Ramos-Thuel, S. 1993. Enhancing fault tolerance of real-time systems through time redundancy. Ph.D. Thesis, Carnegie Mellon University.Google Scholar
  38. Ramos-Thuel, S., and Strosnider, J. K. 1995. Scheduling fault recovery operations for time-critical applications. In 4th IFIP Conference on Dependable Computing for Critical Applications.Google Scholar
  39. Randell, B. 1975. System structure for software fault tolerance. IEEE Trans. on Software Engineering SE-1(2):220–232.Google Scholar
  40. Schuette, M. A., and Shen, J. P. 1987. Processor control flow monitoring using signatured instruction streams. IEEE Transactions on Computers C-36(3):264–275.Google Scholar
  41. Siewiorek, D. P., Kini, V., Mashburn, H., McConnel, S., and Tsao, M. 1978. A case study of C.mmp, Cm*, and C.vmp: Part 1-Experiences with fault tolerance in multiprocessor systems. Proceedings of the IEEE 66(10):1178–1199.Google Scholar
  42. Tindell, K. W. 1994. Fixed Priority Scheduling of Hard Real-Time Systems. PhD thesis, Univ of York, UK.Google Scholar
  43. Yau, S. S., and Chen, F. C. 1980. An approach to concurrent control flow checking. IEEE Transactions on Software Engineering SE-6(2):126–137.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Sunondo Ghosh
    • 1
  • Rami Melhem
    • 1
  • Daniel Mossé
    • 1
  • Joydeep Sen Sarma
    • 1
  1. 1.Department of Computer ScienceUniversity of PittsburghPittsburgh

Personalised recommendations