Abstract
Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to provide fault tolerance capabilities that respect time-critical needs of DRE systems. Conventional middleware solutions, such as Fault-tolerant CORBA (FT-CORBA) and Continuous Availability API for J2EE, have limited utility for DRE systems because they are heavyweight (e.g., the complexity of their feature-rich fault tolerance capabilities consumes excessive runtime resources), yet incomplete (e.g., they lack mechanisms that enable fault tolerance while maintaining real-time predictability).
This paper provides three contributions to the development and standardization of lightweight real-time and fault-tolerant middleware for DRE systems. First, we discuss the challenges in realizing real-time fault-tolerant solutions for DRE systems using contemporary middleware. Second, we describe recent progress towards standardizing a CORBA lightweight fault-tolerance specification for DRE systems. Third, we present the architecture of FLARe, which is a prototype based on the OMG real-time fault-tolerant CORBA middleware standardization efforts that is lightweight (e.g., leverages only those server- and client-side mechanisms required for real-time systems) and predictable (e.g., provides fault-tolerant mechanisms that respect time-critical performance needs of DRE systems).
Chapter PDF
Similar content being viewed by others
Keywords
- Fault Tolerance
- Failure Recovery
- Stream Control Transmission Protocol
- Object Request Broker
- Replica Selection
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Assayad, I., Girault, A., Kalla, H.: A bi-criteria scheduling heuristic for distributed embedded systems under reliability and real-time constraints. In: DSN 2004, Florence, Italy, p. 347 (2004)
Balasubramanian, J., Tambe, S., Gokhale, A., Lu, C., Gill, C., Schmidt, D.C.: FLARe: A Fault-tolerant Lightweight Adaptive Real-time Middleware for Distributed Real-time and Embedded Systems. Technical Report ISIS-07-812, Institute for Software Integrated Systems, Vander- bilt University, Nashville, TN (May 2007)
Bennani, T., Blain, L., Courtes, L., Fabre, J.-C., Killijian, M.-O., Marsden, E., Taiani, F.: Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization. In: DSN 2004, Florence, Italy, pp. 549–554 (2004)
Déplanche, A.M., Théaudi‘ere, P.Y., Trinquet, Y.: Implementing a semi-active replication strategy in chorus/classix, a distributed real-time executive. In: SRDS 1999: Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems, Washington, DC, USA, p. 90. IEEE Computer Society, Los Alamitos (1999)
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1995)
Gonzalez, O., Shrikumar, H., Stankovic, J.A., Ramamritham, K.: Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling. In: RTSS 1997, San Francisco, CA, USA, p. 79 (1997)
Douglas Jensen, E.: Distributed Real-time Specification for Java (2000), java.sun.com/aboutJava/communityprocess/jsr/jsr_050_drt.html
Kalogeraki, V., Melliar-Smith, P.M., Moser, L.E.: Dynamic Scheduling of Distributed Method Invocations. In: 21st IEEE Real-time Systems Symposium, Orlando. IEEE, Los Alamitos (2000)
Kim, K.H., Subbaraman, C.: The pstr/sns scheme for real-time fault tolerance via active object replication and network surveillance. IEEE Trans. on Know. and Data Engg. 12(2) (2000)
Lehoczky, J., Sha, L., Ding, Y.: The Rate Monotonic Scheduling Algorithm: Exact Characterization and Average Case Behavior. In: RTSS 1989, pp. 166–171 (1989)
Marin, O., Bertier, M., Sens, P.: Darx: A framework for the fault tolerant support of agent software. In: ISSRE 2003: Proceedings of the 14th International Symposium on Software Reliability Engineering, Washington, DC, USA, p. 406. IEEE Computer Society, Los Alamitos (2003)
Van Moorsel, A.P.A.: The ’qos query service’ for improved quality-of-service decision making in corba. In: SRDS 1999, Lausanne, Switzerland, p. 274 (1999)
Object Management Group. Fault Tolerant CORBA, Chapter 23, CORBA v3.0.3, OMG Document formal/04-03-10 edition (March 2004)
Object Management Group. Real-time CORBA Specification v1.2 (static), OMG Document formal/05-01-04 edition (November 2005)
Object Management Group. Lightweight Real-Time Fault Tolerant CORBA DRAFT RFP, OMG Document realtime/06-06-06 edition (June 2006)
Felber, P., Narasimhan, P.: Experiences, Approaches and Challenges in building Fault-tolerant CORBA Systems. Transactions of Computers 54(5), 497–511 (2004)
Pertet, S., Narasimhan, P.: Proactive recovery in distributed corba applications. In: DSN 2004, Florence, Italy, p. 357 (2004)
Powell, D.: Distributed fault tolerance: Lessons from delta-4. IEEE Micro. 14(1), 36–47 (1994)
Prez-Sorrosal, F., Patino-Martinez, M., Jimenez-Peris, R., Vuckovic, J.: Highly available long running transactions and activities for j2ee applications. In: ICDCS 2006: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, Washington, DC, USA, p. 2. IEEE Computer Society, Los Alamitos (2006)
Ravindran, B., Curley, E., Anderson, J.S., Jensen, E.D.: On best-effort real-time assurances for recovering from distributable thread failures in distributed real-time systems. In: ISORC 2007: Proceedings of the 10th IEEE In-ternational Symposium on Object and Component-Oriented Real-Time Distributed Computing, Washington, DC, USA, pp. 344–353. IEEE Computer Society, Los Alamitos (2007)
Schmidt, D.C., Stal, M., Rohnert, H., Buschmann, F.: Pattern- Oriented Software Architecture: Patterns for Concurrent and Networked Objects, vol. 2. Wiley & Sons, New York (2000)
Stewart, R., Xie, Q.: Stream Control Transmission Protocol (SCTP) A Reference Guide. Addison-Wesley, Reading (2001)
Sun Microsystems. Java Specification Request, JSR 117, J2EE APIs for Continu- ous Availability, JSR 117 edition (April 2001)
Wang, F., Ramamritham, K., Stankovic, J.A.: Determining redun- dancy levels for fault tolerant real-time systems. IEEE Transactions on Computers 44(2), 292–301 (1995)
Cai, Z., Kumar, V., Cooper, B.F., Eisenhauer, G., Schwan, K., Strom, R.E.: Utility-Driven Proactive Management of Availability in Enterprise-Scale Information Flows. Proceedings of ACM/Usenix/IFIP Middleware, 382–403 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 IFIP International Federation for Information Processing
About this paper
Cite this paper
Balasubramanian, J., Gokhale, A., Schmidt, D.C., Wang, N. (2008). Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems. In: Meier, R., Terzis, S. (eds) Distributed Applications and Interoperable Systems. DAIS 2008. Lecture Notes in Computer Science, vol 5053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68642-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-68642-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68639-2
Online ISBN: 978-3-540-68642-2
eBook Packages: Computer ScienceComputer Science (R0)