Highly-available data services for UNIX client-server networks: Why fault-tolerant hardware isn't the answer

Borr, Andrea; Wilhelmy, Carol

doi:10.1007/BFb0020041

Andrea Borr¹ &
Carol Wilhelmy²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 774))

Included in the following conference series:

Workshop on Fault Tolerance

171 Accesses
2 Citations

Abstract

High-Availability, or HA, often tops the feature “wish-list” for customers putting mission-critical applications on-line. The meaning of HA is often imprecise, however. There is a common perception that HA is actually secondbest to “fault-tolerance” — identified with hardware redundancy, and perceived to depend on complex, proprietary, costly technology. The perceived requirement for a “fault-tolerant” machine arises from an erroneous focus on single machine availability and on hardware faults as the dominant issues for service availability. On the other hand, empirical studies reveal that software faults and planned administrative procedures are the dominant issues; and that a customer's real HA requirement comes down to a need for HA data access from a client network; i.e. service that is available despite software faults, hardware faults, scheduled maintenance, software upgrade, etc. Recent UNIX-based implementations of an HA configuration, based on dual-hosted disks, have demonstrated that HA data service is achievable using commodity UNIX hardware and software components. We illustrate with a description of a Highly Available Data Facility prototype we implemented in 1992. We compare our approach to other contemporary approaches to HA client-server computing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Baker, M. Sullivan: “The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment,” Proc. of Summer 1992 USENIX Conference.
Google Scholar
J. Bartlett:“ANonStopKemel,”EighthSigops. ACM, NewYork, 1981, pp.22–29.
Google Scholar
A. Bhide, E. Elnozahy, S.Morgan: “A Highly Available Network File Server,” Proc. of Winter 1991 USENIX Conference.
Google Scholar
A. Bhide, S. Shepler: “A Highly Available Lock Manager for HA-NFS,” Proc. of Summer 1992 USENIX Conference.
Google Scholar
A. Bhide, E. Elnozahy, S. Morgan, A. Siegel: “A Comparison of Two Approaches to Build Reliable Distributed File Servers,”DCS-91, May 1991, pp. 616–623.
Google Scholar
A. Borr: “Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-Processor Approach,” Tenth International Conference on Very Large Databases, Aug. 1984, pp. 445–453.
Google Scholar
A.Borr: “Guardian 90: A Distributed Operating System Optimized Simultaneously for High-Performance OLTP, Parallelized Batch/Query, and Mixed Workloads,” Tandem TR90.8, Tandem Computers, Cupertino, CA, July 1990.
Google Scholar
Chutani etal.: “The Episode File System,” Proc. ofWinter 1992 USENIX Conference.
Google Scholar
J. Gray: “Why Do Computers Stop and What Can Be Done About It?,” Tandem TR85.7, Tandem Computers, Cupertino, CA, June 1985.
Google Scholar
J. Gray: “A Census of Tandem System Availability, 1985–1990,” IEEE Trans. Reliability, Vol. 39, No. 4, Oct. 1990, pp. 409–418.
Article Google Scholar
J. Gray, A. Reuter: “Transaction Processing: Concepts and Technology,” Morgan Kaufmann Publishers, San Mateo, CA, 1993, pp. 117–152.
Google Scholar
J. Gray, D. Siewiorek: “High-Availability Computer Systems,” Computer, 24:39–48, Sept. 1991.
Article Google Scholar
T. Haerder, A. Reuter: “Principles of Transaction-Oriented Database Recovery,” ACM Computing Surveys, Vol. 15.4, 1983.
Google Scholar
C. Juszczak: “Improving the Performance and Correctness of an NFS Server,” Proc. of Winter 1989 USENIX Conference.
Google Scholar
J.Katzman: “A Fault-Tolerant Computing System,” Proc. of the Eleventh Hawaii International Conference on System Sciences, Jan. 1978.
Google Scholar
M. Kazar et al.: “DEcorum File System Architectural Overview,” Proc. of Summer 1990 USENIX Conference.
Google Scholar
R. King, et. al.: “Management of a Remote Backup Copy for Disaster Recovery,” ACM Transactions on Database Systems, Vol. 16, No. 2, June 1991, pp. 338–368.
Article Google Scholar
P. Norwood: “Overview of the NonStop-UX Operating System for the Integrity S2,” Tandem Systems Review, April 1991, Tandem Computers, Cupertino, CA.
Google Scholar
R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, B. Lyon: “Design and Implementation of the Sun Network File System,” Proc. of Summer 1985 USENIX Conference.
Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett-Packard Company, USA
Andrea Borr
SunSoft, Inc., USA
Carol Wilhelmy

Authors

Andrea Borr
View author publications
You can also search for this author in PubMed Google Scholar
Carol Wilhelmy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Michel Banâtre Peter A. Lee

Copyright information

About this paper

Cite this paper

Borr, A., Wilhelmy, C. (1994). Highly-available data services for UNIX client-server networks: Why fault-tolerant hardware isn't the answer. In: Banâtre, M., Lee, P.A. (eds) Hardware and Software Architectures for Fault Tolerance. Fault Tolerance 1993. Lecture Notes in Computer Science, vol 774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020041

Download citation

DOI: https://doi.org/10.1007/BFb0020041
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57767-6
Online ISBN: 978-3-540-48330-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics