Abstract
High-Availability, or HA, often tops the feature “wish-list” for customers putting mission-critical applications on-line. The meaning of HA is often imprecise, however. There is a common perception that HA is actually secondbest to “fault-tolerance” — identified with hardware redundancy, and perceived to depend on complex, proprietary, costly technology. The perceived requirement for a “fault-tolerant” machine arises from an erroneous focus on single machine availability and on hardware faults as the dominant issues for service availability. On the other hand, empirical studies reveal that software faults and planned administrative procedures are the dominant issues; and that a customer's real HA requirement comes down to a need for HA data access from a client network; i.e. service that is available despite software faults, hardware faults, scheduled maintenance, software upgrade, etc. Recent UNIX-based implementations of an HA configuration, based on dual-hosted disks, have demonstrated that HA data service is achievable using commodity UNIX hardware and software components. We illustrate with a description of a Highly Available Data Facility prototype we implemented in 1992. We compare our approach to other contemporary approaches to HA client-server computing.
Preview
Unable to display preview. Download preview PDF.
References
M. Baker, M. Sullivan: “The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment,” Proc. of Summer 1992 USENIX Conference.
J. Bartlett:“ANonStopKemel,”EighthSigops. ACM, NewYork, 1981, pp.22–29.
A. Bhide, E. Elnozahy, S.Morgan: “A Highly Available Network File Server,” Proc. of Winter 1991 USENIX Conference.
A. Bhide, S. Shepler: “A Highly Available Lock Manager for HA-NFS,” Proc. of Summer 1992 USENIX Conference.
A. Bhide, E. Elnozahy, S. Morgan, A. Siegel: “A Comparison of Two Approaches to Build Reliable Distributed File Servers,”DCS-91, May 1991, pp. 616–623.
A. Borr: “Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-Processor Approach,” Tenth International Conference on Very Large Databases, Aug. 1984, pp. 445–453.
A.Borr: “Guardian 90: A Distributed Operating System Optimized Simultaneously for High-Performance OLTP, Parallelized Batch/Query, and Mixed Workloads,” Tandem TR90.8, Tandem Computers, Cupertino, CA, July 1990.
Chutani etal.: “The Episode File System,” Proc. ofWinter 1992 USENIX Conference.
J. Gray: “Why Do Computers Stop and What Can Be Done About It?,” Tandem TR85.7, Tandem Computers, Cupertino, CA, June 1985.
J. Gray: “A Census of Tandem System Availability, 1985–1990,” IEEE Trans. Reliability, Vol. 39, No. 4, Oct. 1990, pp. 409–418.
J. Gray, A. Reuter: “Transaction Processing: Concepts and Technology,” Morgan Kaufmann Publishers, San Mateo, CA, 1993, pp. 117–152.
J. Gray, D. Siewiorek: “High-Availability Computer Systems,” Computer, 24:39–48, Sept. 1991.
T. Haerder, A. Reuter: “Principles of Transaction-Oriented Database Recovery,” ACM Computing Surveys, Vol. 15.4, 1983.
C. Juszczak: “Improving the Performance and Correctness of an NFS Server,” Proc. of Winter 1989 USENIX Conference.
J.Katzman: “A Fault-Tolerant Computing System,” Proc. of the Eleventh Hawaii International Conference on System Sciences, Jan. 1978.
M. Kazar et al.: “DEcorum File System Architectural Overview,” Proc. of Summer 1990 USENIX Conference.
R. King, et. al.: “Management of a Remote Backup Copy for Disaster Recovery,” ACM Transactions on Database Systems, Vol. 16, No. 2, June 1991, pp. 338–368.
P. Norwood: “Overview of the NonStop-UX Operating System for the Integrity S2,” Tandem Systems Review, April 1991, Tandem Computers, Cupertino, CA.
R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, B. Lyon: “Design and Implementation of the Sun Network File System,” Proc. of Summer 1985 USENIX Conference.
Author information
Authors and Affiliations
Editor information
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Borr, A., Wilhelmy, C. (1994). Highly-available data services for UNIX client-server networks: Why fault-tolerant hardware isn't the answer. In: Banâtre, M., Lee, P.A. (eds) Hardware and Software Architectures for Fault Tolerance. Fault Tolerance 1993. Lecture Notes in Computer Science, vol 774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020041
Download citation
DOI: https://doi.org/10.1007/BFb0020041
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57767-6
Online ISBN: 978-3-540-48330-4
eBook Packages: Springer Book Archive