Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems

  • Jaiganesh Balasubramanian
  • Aniruddha Gokhale
  • Douglas C. Schmidt
  • Nanbor Wang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5053)


Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to provide fault tolerance capabilities that respect time-critical needs of DRE systems. Conventional middleware solutions, such as Fault-tolerant CORBA (FT-CORBA) and Continuous Availability API for J2EE, have limited utility for DRE systems because they are heavyweight (e.g., the complexity of their feature-rich fault tolerance capabilities consumes excessive runtime resources), yet incomplete (e.g., they lack mechanisms that enable fault tolerance while maintaining real-time predictability).

This paper provides three contributions to the development and standardization of lightweight real-time and fault-tolerant middleware for DRE systems. First, we discuss the challenges in realizing real-time fault-tolerant solutions for DRE systems using contemporary middleware. Second, we describe recent progress towards standardizing a CORBA lightweight fault-tolerance specification for DRE systems. Third, we present the architecture of FLARe, which is a prototype based on the OMG real-time fault-tolerant CORBA middleware standardization efforts that is lightweight (e.g., leverages only those server- and client-side mechanisms required for real-time systems) and predictable (e.g., provides fault-tolerant mechanisms that respect time-critical performance needs of DRE systems).


Fault Tolerance Failure Recovery Stream Control Transmission Protocol Object Request Broker Replica Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Assayad, I., Girault, A., Kalla, H.: A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints. In: DSN 2004, Florence, Italy, p. 347 (2004)Google Scholar
  2. 2.
    Balasubramanian, J., Tambe, S., Gokhale, A., Lu, C., Gill, C., Schmidt, D.C.: FLARe: A Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems. Technical Report ISIS-07-812, Institute for Software Integrated Systems, Vander- bilt University, Nashville, TN (May 2007)Google Scholar
  3. 3.
    Bennani, T., Blain, L., Courtes, L., Fabre, J.-C., Killijian, M.-O., Marsden, E., Taiani, F.: Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization. In: DSN 2004, Florence, Italy, pp. 549–554 (2004)Google Scholar
  4. 4.
    Déplanche, A.M., Théaudi‘ere, P.Y., Trinquet, Y.: Implementing a semi-active replication strategy in chorus/classix, a distributed real-time executive. In: SRDS 1999: Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Washington, DC, USA, p. 90. IEEE Computer Society, Los Alamitos (1999)Google Scholar
  5. 5.
    Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1995)zbMATHGoogle Scholar
  6. 6.
    Gonzalez, O., Shrikumar, H., Stankovic, J.A., Ramamritham, K.: Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. In: RTSS 1997, San Francisco, CA, USA, p. 79 (1997)Google Scholar
  7. 7.
    Douglas Jensen, E.: Distributed Real-time Specification for Java (2000),
  8. 8.
    Kalogeraki, V., Melliar-Smith, P.M., Moser, L.E.: Dynamic Scheduling of Distributed Method Invocations. In: 21st IEEE Real-time Systems Symposium, Orlando. IEEE, Los Alamitos (2000)Google Scholar
  9. 9.
    Kim, K.H., Subbaraman, C.: The pstr/sns scheme for real-time fault tolerance via active object replication and network surveillance. IEEE Trans. on Know. and Data Engg. 12(2) (2000)Google Scholar
  10. 10.
    Lehoczky, J., Sha, L., Ding, Y.: The Rate Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior. In: RTSS 1989, pp. 166–171 (1989)Google Scholar
  11. 11.
    Marin, O., Bertier, M., Sens, P.: Darx: A framework for the fault tolerant support of agent software. In: ISSRE 2003: Proceedings of the 14th International Symposium on Software Reliability Engineering, Washington, DC, USA, p. 406. IEEE Computer Society, Los Alamitos (2003)CrossRefGoogle Scholar
  12. 12.
    Van Moorsel, A.P.A.: The ’qos query service’ for improved quality-of-service decision making in corba. In: SRDS 1999, Lausanne, Switzerland, p. 274 (1999)Google Scholar
  13. 13.
    Object Management Group. Fault Tolerant CORBA, Chapter 23, CORBA v3.0.3, OMG Document formal/04-03-10 edition (March 2004)Google Scholar
  14. 14.
    Object Management Group. Real-time CORBA Specification v1.2 (static), OMG Document formal/05-01-04 edition (November 2005)Google Scholar
  15. 15.
    Object Management Group. Lightweight Real-Time Fault Tolerant CORBA DRAFT RFP, OMG Document realtime/06-06-06 edition (June 2006)Google Scholar
  16. 16.
    Felber, P., Narasimhan, P.: Experiences, Approaches and Challenges in building Fault-tolerant CORBA Systems. Transactions of Computers 54(5), 497–511 (2004)CrossRefGoogle Scholar
  17. 17.
    Pertet, S., Narasimhan, P.: Proactive recovery in distributed corba applications. In: DSN 2004, Florence, Italy, p. 357 (2004)Google Scholar
  18. 18.
    Powell, D.: Distributed fault tolerance: Lessons from delta-4. IEEE Micro. 14(1), 36–47 (1994)CrossRefGoogle Scholar
  19. 19.
    Prez-Sorrosal, F., Patino-Martinez, M., Jimenez-Peris, R., Vuckovic, J.: Highly available long running transactions and activities for j2ee applications. In: ICDCS 2006: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, Washington, DC, USA, p. 2. IEEE Computer Society, Los Alamitos (2006)Google Scholar
  20. 20.
    Ravindran, B., Curley, E., Anderson, J.S., Jensen, E.D.: On best-effort real-time assurances for recovering from distributable thread failures in distributed real-time systems. In: ISORC 2007: Proceedings of the 10th IEEE In-ternational Symposium on Object and Component-Oriented Real-Time Distributed Computing, Washington, DC, USA, pp. 344–353. IEEE Computer Society, Los Alamitos (2007)Google Scholar
  21. 21.
    Schmidt, D.C., Stal, M., Rohnert, H., Buschmann, F.: Pattern- Oriented Software Architecture: Patterns for Concurrent and Networked Objects, vol. 2. Wiley & Sons, New York (2000)zbMATHGoogle Scholar
  22. 22.
    Stewart, R., Xie, Q.: Stream Control Transmission Protocol (SCTP) A Reference Guide. Addison-Wesley, Reading (2001)Google Scholar
  23. 23.
    Sun Microsystems. Java Specification Request, JSR 117, J2EE APIs for Continu- ous Availability, JSR 117 edition (April 2001)Google Scholar
  24. 24.
    Wang, F., Ramamritham, K., Stankovic, J.A.: Determining redun- dancy levels for fault tolerant real-time systems. IEEE Transactions on Computers 44(2), 292–301 (1995)CrossRefzbMATHGoogle Scholar
  25. 25.
    Cai, Z., Kumar, V., Cooper, B.F., Eisenhauer, G., Schwan, K., Strom, R.E.: Utility-Driven Proactive Management of Availability in Enterprise-Scale Information Flows. Proceedings of ACM/Usenix/IFIP Middleware, 382–403 (2006)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2008

Authors and Affiliations

  • Jaiganesh Balasubramanian
    • 1
  • Aniruddha Gokhale
    • 1
  • Douglas C. Schmidt
    • 1
  • Nanbor Wang
    • 2
  1. 1.Department of Electrical Engineering and Computer ScienceVanderbilt UniversityNashvilleUSA
  2. 2.Tech-X CorporationBoulderUSA

Personalised recommendations