Skip to main content

Highly-available data services for UNIX client-server networks: Why fault-tolerant hardware isn't the answer

  • Data and Databases
  • Conference paper
  • First Online:
Hardware and Software Architectures for Fault Tolerance (Fault Tolerance 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 774))

Included in the following conference series:

Abstract

High-Availability, or HA, often tops the feature “wish-list” for customers putting mission-critical applications on-line. The meaning of HA is often imprecise, however. There is a common perception that HA is actually secondbest to “fault-tolerance” — identified with hardware redundancy, and perceived to depend on complex, proprietary, costly technology. The perceived requirement for a “fault-tolerant” machine arises from an erroneous focus on single machine availability and on hardware faults as the dominant issues for service availability. On the other hand, empirical studies reveal that software faults and planned administrative procedures are the dominant issues; and that a customer's real HA requirement comes down to a need for HA data access from a client network; i.e. service that is available despite software faults, hardware faults, scheduled maintenance, software upgrade, etc. Recent UNIX-based implementations of an HA configuration, based on dual-hosted disks, have demonstrated that HA data service is achievable using commodity UNIX hardware and software components. We illustrate with a description of a Highly Available Data Facility prototype we implemented in 1992. We compare our approach to other contemporary approaches to HA client-server computing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Baker, M. Sullivan: “The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment,” Proc. of Summer 1992 USENIX Conference.

    Google Scholar 

  2. J. Bartlett:“ANonStopKemel,”EighthSigops. ACM, NewYork, 1981, pp.22–29.

    Google Scholar 

  3. A. Bhide, E. Elnozahy, S.Morgan: “A Highly Available Network File Server,” Proc. of Winter 1991 USENIX Conference.

    Google Scholar 

  4. A. Bhide, S. Shepler: “A Highly Available Lock Manager for HA-NFS,” Proc. of Summer 1992 USENIX Conference.

    Google Scholar 

  5. A. Bhide, E. Elnozahy, S. Morgan, A. Siegel: “A Comparison of Two Approaches to Build Reliable Distributed File Servers,”DCS-91, May 1991, pp. 616–623.

    Google Scholar 

  6. A. Borr: “Robustness to Crash in a Distributed Database: A Non Shared-Memory Multi-Processor Approach,” Tenth International Conference on Very Large Databases, Aug. 1984, pp. 445–453.

    Google Scholar 

  7. A.Borr: “Guardian 90: A Distributed Operating System Optimized Simultaneously for High-Performance OLTP, Parallelized Batch/Query, and Mixed Workloads,” Tandem TR90.8, Tandem Computers, Cupertino, CA, July 1990.

    Google Scholar 

  8. Chutani etal.: “The Episode File System,” Proc. ofWinter 1992 USENIX Conference.

    Google Scholar 

  9. J. Gray: “Why Do Computers Stop and What Can Be Done About It?,” Tandem TR85.7, Tandem Computers, Cupertino, CA, June 1985.

    Google Scholar 

  10. J. Gray: “A Census of Tandem System Availability, 1985–1990,” IEEE Trans. Reliability, Vol. 39, No. 4, Oct. 1990, pp. 409–418.

    Article  Google Scholar 

  11. J. Gray, A. Reuter: “Transaction Processing: Concepts and Technology,” Morgan Kaufmann Publishers, San Mateo, CA, 1993, pp. 117–152.

    Google Scholar 

  12. J. Gray, D. Siewiorek: “High-Availability Computer Systems,” Computer, 24:39–48, Sept. 1991.

    Article  Google Scholar 

  13. T. Haerder, A. Reuter: “Principles of Transaction-Oriented Database Recovery,” ACM Computing Surveys, Vol. 15.4, 1983.

    Google Scholar 

  14. C. Juszczak: “Improving the Performance and Correctness of an NFS Server,” Proc. of Winter 1989 USENIX Conference.

    Google Scholar 

  15. J.Katzman: “A Fault-Tolerant Computing System,” Proc. of the Eleventh Hawaii International Conference on System Sciences, Jan. 1978.

    Google Scholar 

  16. M. Kazar et al.: “DEcorum File System Architectural Overview,” Proc. of Summer 1990 USENIX Conference.

    Google Scholar 

  17. R. King, et. al.: “Management of a Remote Backup Copy for Disaster Recovery,” ACM Transactions on Database Systems, Vol. 16, No. 2, June 1991, pp. 338–368.

    Article  Google Scholar 

  18. P. Norwood: “Overview of the NonStop-UX Operating System for the Integrity S2,” Tandem Systems Review, April 1991, Tandem Computers, Cupertino, CA.

    Google Scholar 

  19. R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, B. Lyon: “Design and Implementation of the Sun Network File System,” Proc. of Summer 1985 USENIX Conference.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michel Banâtre Peter A. Lee

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Borr, A., Wilhelmy, C. (1994). Highly-available data services for UNIX client-server networks: Why fault-tolerant hardware isn't the answer. In: Banâtre, M., Lee, P.A. (eds) Hardware and Software Architectures for Fault Tolerance. Fault Tolerance 1993. Lecture Notes in Computer Science, vol 774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020041

Download citation

  • DOI: https://doi.org/10.1007/BFb0020041

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57767-6

  • Online ISBN: 978-3-540-48330-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics