Managing Fault Tolerance Transparently Using CORBA Services
Fault tolerance problems arise in large-scale distributed systems because application components may eventually fail due to hardware problems, operator mistakes or design faults. Fault tolerance mechanisms must be employed to reduce the susceptibility of a given system to failure. In this paper, we describe the design of an architecture to overcome potential application component failures, using CORBA, a distributed object middleware specified by the OMG. Of primary importance to this architecture is OMG’s CORBA Object Trading Service as the mechanism to advertise and manage service offers for fault tolerant application components. This mechanism enables clients transparently to detect a failed connection to a service object, to discover a similar backup service object and to re-connect to it. This improves overall system stability and enables scalability.
Unable to display preview. Download preview PDF.
- J. Bacon, Concurrent Systems. Addison-Wesley, 1993.Google Scholar
- M. Banâtre and P.A. Lee, Hardware and Software Architectures for Fault Tolerance. Springer Verlag, 1994.Google Scholar
- S. Landis and S. Maffeis, Building Reliable Distributed Systems with CORBA. Theory and Practice of Object Systems, John Wiley, New York, April 1997.Google Scholar
- P.A. Lee and T. Anderson, Fault Tolerance: Principles and Practice (second edition). Springer Verlag, 1990.Google Scholar
- R. Meier and P. Nixon, Managing Fault Tolerance Transparently using CORBA Services. Technical Report TCD-CS-1999-05, University of Dublin, Trinity College, February 1999. http://www.cs.tcd.ie/publications/tech-reports.
- S. Mullender, Distributed Systems. Addison-Wesley, 1993.Google Scholar
- Object Management Group. http://www.omg.org.