On Distributed Real-Time Scheduling in Networked Embedded Systems in the Presence of Crash Failures

  • Binoy Ravindran
  • Jonathan S. Anderson
  • E. Douglas Jensen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4761)

Abstract

We consider the problem of scheduling distributable real-time threads in networked embedded systems that operate under run-time uncertainties including those on thread execution times, thread arrivals, and node failure occurrences. We present a distributed scheduling algorithm called CUA. We show that CUA satisfies thread time constraints in the presence of crash failures, is early-deciding, has an efficient message complexity of O(fn) (where f is the number of crashes that actually occur and n is the number of nodes), and is time-optimal with a time lower bound of O(D + fd + nk) (where D is the message delay upper bound, d is the failure detection bound, and k is the maximum number of threads). In crash-free runs, the algorithm constructs schedules within O(D + nk), and yields optimal total utility if nodes are also not overloaded. The algorithm is also “best-effort” in that a high importance thread that may arrive at any time has a very high likelihood for feasible completion (in contrast to classical admission control algorithms which favor feasible completion of admitted threads over admitting new ones, irrespective of thread importance).

References

  1. 1.
    CCRP: Network centric warfare. http://www.dodccrp.org/ncwPages/ncwPage.html
  2. 2.
    Northcutt, J.D.: Mechanisms for Reliable Distributed Real-Time Operating Systems — The Alpha Kernel. Academic Press, London (1987)MATHGoogle Scholar
  3. 3.
    Ford, B., Lepreau, J.: Evolving Mach 3.0 to a migrating thread model. In: Ford, B., Lepreau, J. (eds.) USENIX Technical Conference, pp. 97–114 (1994)Google Scholar
  4. 4.
    The Open Group: MK7.3a Release Notes. The Open Group Research Institute, Cambridge, Massachusetts (October 1998)Google Scholar
  5. 5.
    OMG: Real-time CORBA 2.0: Dynamic scheduling specification. Technical report, Object Management Group (September 2001)Google Scholar
  6. 6.
    Jensen, E.D., Wellings, A., Clark, R., Wells, D.: The distributed real-time specification for Java: A status report. In: Proceedings of The Embedded Systems Conference (2002)Google Scholar
  7. 7.
    Horn, W.: Some simple scheduling algorithms. Naval Research Logistics Quaterly 21, 177–185 (1974)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Jensen, E.D., et al.: A time-driven scheduling model for real-time systems. In: Jensen, E.D. (ed.) IEEE RTSS, pp. 112–122. IEEE Computer Society Press, Los Alamitos (1985)Google Scholar
  9. 9.
    Ravindran, B., Jensen, E.D., Li, P.: On recent advances in time/utility function real-time scheduling and resource management. In: IEEE ISORC, pp. 55–60. IEEE Computer Society Press, Los Alamitos (2005)Google Scholar
  10. 10.
    Locke, C.D.: Best-Effort Decision Making for Real-Time Scheduling. PhD thesis, CMU (1986)Google Scholar
  11. 11.
    Clark, R.K.: Scheduling Dependent Real-Time Activities. PhD thesis, CMU (1990)Google Scholar
  12. 12.
    Kao, B., Garcia-Molina, H.: Deadline assignment in a distributed soft real-time system. IEEE TPDS 8(12), 1268–1274 (1997)Google Scholar
  13. 13.
    Curley, E., Anderson, J.S., Ravindran, B., Jensen, E.D.: Recovering from distributable thread failures with assured timeliness in real-time distributed systems. In: IEEE SRDS, pp. 267–276. IEEE Computer Society Press, Los Alamitos (2006)Google Scholar
  14. 14.
    Goldberg, J., Greenberg, I., et al.: Adaptive fault-resistant systems (chapter 5: Adpative distributed thread integrity). Technical Report csl-95-02, SRI International (January 1995), http://www.csl.sri.com/papers/sri-csl-95-02/
  15. 15.
    Aguilera, M.K., Lann, G.L., Toueg, S.: On the impact of fast failure detectors on real-time fault-tolerant systems. In: Malkhi, D. (ed.) DISC 2002. LNCS, vol. 2508, pp. 354–370. Springer, Heidelberg (2002)Google Scholar
  16. 16.
    Anderson, J., Jensen, E.D.: The distributed real-time specification for Java: Status report. In: JTRES (2006)Google Scholar
  17. 17.
    Maynard, D.P., Shipman, S.E.,:et al.: An example real-time command, control, and battle management application for Alpha. Technical Report Archons Technical Report 88121, CMU CS Dept (December 1988)Google Scholar
  18. 18.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225–267 (1996)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service of failure detectors. IEEE Transactions on Computers 51(5), 561–580 (2002)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Lynch, N.: Distributed Algorithms. Morgan Kaufmann (1996)Google Scholar
  21. 21.
    Bestavros, A., Nagy, S.: Admission control and overload management for real-time databases. In: Real-Time Database Systems: Issues and Applications, Kluwer Academic Publishers, Dordrecht (1997)CrossRefGoogle Scholar
  22. 22.
    Fetzer, C., Schmid, U., Susskraut, M.: On the possibility of consensus in asynchronous systems with finite average response times. In: ICDCS 2005: Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS 2005), pp. 271–280. IEEE Computer Society Press, Washington, DC (2005)Google Scholar
  23. 23.
    Hermant, J.F., Widder, J.: Implementing reliable distributed real-time systems with the theta-model. In: Anderson, J.H., Prencipe, G., Wattenhofer, R. (eds.) OPODIS 2005. LNCS, vol. 3974, pp. 334–350. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2007

Authors and Affiliations

  • Binoy Ravindran
    • 1
  • Jonathan S. Anderson
    • 1
  • E. Douglas Jensen
    • 2
  1. 1.ECE Dept., Virginia Tech, Blacksburg, VA 24061USA
  2. 2.The MITRE Corporation, Bedford, MA 01730USA

Personalised recommendations