On Designing Dependable Services with Diverse Off-the-Shelf SQL Servers

  • Ilir Gashi
  • Peter Popov
  • Vladimir Stankovic
  • Lorenzo Strigini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3069)

Abstract

The most important non-functional requirements for an SQL server are performance and dependability. This paper argues, based on empirical results from our on-going research with diverse SQL servers, in favour of diverse redundancy as a way of improving both. We show evidence that current data replication solutions are insufficient to protect against the range of faults documented for database servers; outline possible fault-tolerant architectures using diverse servers; discuss the design problems involved; and offer evidence of the potential for performance improvement through diverse redundancy.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Babbage, C.: On the Mathematical Powers of the Calculating Engine (Unpublished manuscript, December 1837). In: Randell, B. (ed.) The Origins of Digital Computers: Selected Papers, pp. 17–52. Springer, Heidelberg (1974)Google Scholar
  2. 2.
    Traverse, P.J.: AIRBUS and ATR System Architecture and Specification. In: Voges, U. (ed.) Software diversity in computerized control systems, pp. 95–104. Springer, Heidelberg (1988)Google Scholar
  3. 3.
    Randell, B.: System Structure for Software Fault-Tolerance. In: International Conference on Reliable Software, Los Angeles, California (April 1975); ACM SIGPLAN Notices 10(6), 437–449 (June 1975)Google Scholar
  4. 4.
    Lyu, M.R. (ed.): Software Fault Tolerance. Trends in Software Series. Wiley, Chichester (1995)Google Scholar
  5. 5.
    Avizienis, A., Kelly, J.P.J.: Fault Tolerance by Design Diversity: Concepts and Experiments. IEEE Computer 17(8), 67–80 (1984)Google Scholar
  6. 6.
    Laprie, J.C., et al.: Definition and Analysis of Hardware-and-Software Fault-Tolerant Architectures. IEEE Computer 23(7), 39–51 (1990)Google Scholar
  7. 7.
    Voges, U. (ed.): Software diversity in computerized control systems; Avizienis, A., Kopetz, H., Laprie, J.C. (ed.): Dependable Computing and Fault-Tolerance series, vol. 2. Springer, Wien (1988)Google Scholar
  8. 8.
    Avizienis, A., et al.: The UCLA DEDIX System: A Distributed Testbed for Multiple-Version Software. In: Proc. of 15th IEEE International Symposium on Fault-Tolerant Computing (FTCS-15), Ann Arbor, Michigan, USA, pp. 126–134. IEEE Computer Society Press, Los Alamitos (1985)Google Scholar
  9. 9.
    Pullum, L.: Software Fault Tolerance Techniques and Implementation, Artech House (2001)Google Scholar
  10. 10.
    Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley, Reading (1987)Google Scholar
  11. 11.
    Sutter, H.: SQL/Replication Scope and Requirements document, in ISO/IEC JTC 1/SC 32 Data Management and Interchange WG3 Database Languages, p. 7 (2000)Google Scholar
  12. 12.
    Kalyanakrishnam, M., Kalbarczyk, Z., Iyer, R.: Failure Data Analysis of LAN of Windows NT Based Computers. In: Proc. of 18th Symposium on Reliable and Distributed Systems (SRDS 1999), Lausanne, Switzerland, pp. 178–187 (1999)Google Scholar
  13. 13.
    Schneider, F.: Byzantine generals in action: Implementing fail-stop processors. ACM Transactions on Computing Systems 2(2), 145–154 (1984)CrossRefGoogle Scholar
  14. 14.
    Gashi, I., Popov, P., Strigini, L.: Fault diversity among off-the-shelf SQL database servers. In: Proc. of Inter. Conf. on Dependable Systems and Networks (DSN 2004), Florence, Italy, IEEE Computer Society Press, Los Alamitos (2004) (to appear)Google Scholar
  15. 15.
    Chandra, S., Chen, P.M.: How fail-stop are programs. In: Proc. of 28th IEEE International Symposium on Fault-Tolerant Computing (FTCS-28), pp. 240–249. IEEE Computer Society Press, Los Alamitos (1998)Google Scholar
  16. 16.
    Gray, J.: Why do computers stop and what can be done about it? In: Proc. of 5th Symp. on Reliability in Distributed Software and Database Systems (SRDSDS-5), Los Angeles, CA, USA, pp. 3–12. IEEE Computer Society Press, Los Alamitos (1986)Google Scholar
  17. 17.
    Chandra, S., Chen, P.M.: Whither Generic Recovery from Application Faults? In: A Fault Study using Open-Source Software, in Proc. of Inter. Conf. on Dependable Systems and Networks (DSN 2000), NY, USA, pp. 97–106. IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  18. 18.
    Jimenez-Peris, R., et al.: Are Quorums an Alternative for Data Replication? ACM Transactions on Database Systems 28(3), 257–294 (2003)CrossRefGoogle Scholar
  19. 19.
    Jimenez-Peris, R., et al.: How to Select a Replication Protocol According to Scalability, Availability and Communication Overhead. In: Proc. of Int. Symp. on Reliable Distributed Systems (SRDS), New Orleans, Louisiana, pp. 24–33. IEEE Computer Society Press, Los Alamitos (2001)Google Scholar
  20. 20.
    Kemme, B., Alonso, G.: Don’t be lazy, be consistent: Postgres-R, A new way to implement Database Replication. In: Proc. of Int. Conf. on Very Large Databases (VLDB), Cairo, Egypt (2000)Google Scholar
  21. 21.
    Anderson, T., Lee, P.A.: Fault Tolerance: Principles and Practice, 2nd Revised edn. Dependable Computing and Fault Tolerant Systems, vol. 3. Springer, Heidelberg (1990)MATHGoogle Scholar
  22. 22.
    Gray, J., Reuter, A.: Transaction processing: concepts and techniques. Morgan Kaufmann, San Francisco (1993)MATHGoogle Scholar
  23. 23.
    Tso, K.S., Avizienis, A.: Community Error Recovery in N-Version Software: A Design Study with Experimentation. In: Proc. of 17th IEEE International Symposium on Fault- Tolerant Computing (FTCS-17), Pittsburgh, Pennsylvania, July 6-8, pp. 127–133 (1987)Google Scholar
  24. 24.
    Jimenez-Peris, R., Patino-Martinez, Alonso, G.: An Algorithm for Non-Intrusive, Parallel Recovery of Replicated Data and its Correctness. In: Proc. of 21st IEEE Int. Symp. on Reliable Distributed Systems (SRDS 2002), Osaka, Japan, pp. 150–159 (2002)Google Scholar
  25. 25.
    Poledna, S.: Replica Determinism in Distributed Real-Time Systems: A Brief Survey. Real-Time Systems Journal 6, 289–316 (1994)CrossRefGoogle Scholar
  26. 26.
    Powell, D.: Delta-4: A Generic Architecture for Dependable Distributed Computing. Springer-Verlag Research Reports ESPRIT. Springer, Heidelberg (1992)Google Scholar
  27. 27.
    Popov, P., et al.: Software Fault-Tolerance with Off-the-Shelf SQL Servers. In: Kazman, R., Port, D. (eds.) ICCBSS 2004. LNCS, vol. 2959, pp. 117–126. Springer, Heidelberg (2004) (to appear)CrossRefGoogle Scholar
  28. 28.
    Gruber, M.: Mastering SQL. SYBEX (2000)Google Scholar
  29. 29.
  30. 30.
    Microsoft, SQL Server ”Yukon” (2003) http://www.microsoft.com/sql/yukon/productinfo/default.asp
  31. 31.
    Poledna, S.: Fault-Tolerant Real-Time Systems: The Problem of Replica Determinism. Kluwer Academic Publishers, Dordrecht (1996)MATHGoogle Scholar
  32. 32.
    Ammann, P.E., Knight, J.C.: Data Diversity: an Approach to Software Fault-Tolerance. In: Proc. of 17th IEEE International Symposium on Fault-Tolerant Computing (FTCS-17), Pittsburgh, Pennsylvania, USA, pp. 122–126. IEEE Computer Society Press, Los Alamitos (1987)Google Scholar
  33. 33.
    Chen, P.M., et al.: Raid: High-Performance, Reliable Secondary Storage. ACM Computing Surveys 26(2), 145–185 (1994)CrossRefGoogle Scholar
  34. 34.
    TPC, TPC Benchmark C, Standard Specification, Version 5.0 (2002), http://www.tpc.org/tpcc/
  35. 35.
    Weismann, M., Pedone, F., Schiper, A.: Database Replication Techniques: a Three Parameter Classification. In: Proc. of 19th IEEE Symposium on Reliable Distributed Systems (SRDS 2000), Nurnberg, Germany, pp. 206–217. IEEE Computer Society Press, Los Alamitos (2000)CrossRefGoogle Scholar
  36. 36.
    Vaysburd, A.: Fault Tolerance in Three-Tier Applications: Focusing on the Database Tier. In: Proc. of 18th IEEE Symposium on Reliable Distributed Systems (SRDS 1999), Lausanne, Switzerland, pp. 322–327. IEEE Computer Society Press, Los Alamitos (1999)CrossRefGoogle Scholar
  37. 37.
    Pedone, F., Frolund, S.: Pronto: A Fast Failover Protocol for Off-the-shelf Commercial Databases. In: Proc. of 19th IEEE Symposium on Reliable Distributed Systems (SRDS 2000), Nurnberg, Germany, pp. 176–185. IEEE Computer Society Press, Los Alamitos (2000)CrossRefGoogle Scholar
  38. 38.
    Jimenez-Peris, R., Patino-Martinez, M.: D5: Transaction Support, ADAPT Middleware Technologies for Adaptive and Composable Distributed Components, pp. 20 (2003)Google Scholar
  39. 39.
    Patino-Martinez, M., Jimenez-Peris, R., Alonso, G.: Scalable Replication in Database Clusters. In: Herlihy, M.P. (ed.) DISC 2000. LNCS, vol. 1914, pp. 315–329. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  40. 40.
    Jimenez-Peris, R., et al.: Scalable Database Replication Middleware. In: Proc. of 22nd IEEE Int Conf on Distributed Computing Systems, Vienna, Austria, pp. 477–484 (2002)Google Scholar
  41. 41.
    Kemme, B., Bartoli, A., Babaoglu, O.: Online Reconfiguration in Replicated Databases Based on Group Communication. In: Proc. of Int. Conf. on Dependable Systems and Networks (DSN 2001), Goteborg, Sweden, pp. 117–126. IEEE Computer Society Press, Los Alamitos (2001)CrossRefGoogle Scholar
  42. 42.
    Voas, J.: Deriving Accurate Operational Profiles for Mass-Marketed Software (2000), http://www.cigitallabs.com/resources/papers/

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Ilir Gashi
    • 1
  • Peter Popov
    • 1
  • Vladimir Stankovic
    • 1
  • Lorenzo Strigini
    • 1
  1. 1.Centre for Software ReliabilityCity UniversityLondonUK

Personalised recommendations