Advertisement

Efficient and non-blocking agreement protocols

  • Suyash GuptaEmail author
  • Mohammad Sadoghi
Article
Part of the following topical collections:
  1. Special Issue on Extending Database Technology

Abstract

Large scale distributed databases are designed to support commercial and cloud based applications. The minimal expectation from such systems is that they ensure consistency and reliability in case of node failures. The distributed database guarantees reliability through the use of atomic commitment protocols. Atomic commitment protocols help in ensuring that either all the changes of a transaction are applied or none of them exist. To ensure efficient commitment process, the database community has mainly used the two-phase commit (2PC) protocol. However, the 2PC protocol is blocking under multiple failures. This necessitated the development of non-blocking, three-phase commit (3PC) protocol. However, the database community is still reluctant to use the 3PC protocol, as it acts as a scalability bottleneck in the design of efficient transaction processing systems. In this work, we present EasyCommit protocol which leverages the best of both worlds (2PC and 3PC), that is non-blocking (like 3PC) and requires two phases (like 2PC). EasyCommit achieves these goals by ensuring two key observations: (i) first transmit and then commit, and (ii) message redundancy. We present the design of the EasyCommit protocol and prove that it guarantees both safety and liveness. We also present a detailed evaluation of EC protocol and show that it is nearly as efficient as the 2PC protocol. To cater the needs of geographically large scale distributed systems we also design a topology-aware agreement protocol (Geo-scale EasyCommit) that is non-blocking, safe, live and outperforms 3PC protocol.

Keywords

Agreement Node failures Geo-scale 

Notes

Acknowledgements

We would like to acknowledge Thamir Qadah for the valuable discussions that helped us to design ExpoDB system. Further, we acknowledge the anonymous reviewers for their useful inputs and comments.

References

  1. 1.
    Abbadi, A.E., Toueg, S.: Maintaining availability in partitioned replicated databases. ACM Trans Database Syst 14(2), 264–290 (1989).  https://doi.org/10.1145/63500.63501 MathSciNetCrossRefGoogle Scholar
  2. 2.
    Abdallah, M., Guerraoui, R., Pucheral, P.: One-phase commit: does it make sense? ICPADS (1998)Google Scholar
  3. 3.
    Agrawal, D., El Abbadi, A., Mahmoud, H.A., Nawab, F., Salem, K.: Managing geo-replicated data in multi-datacenters. In: Proceedings of the 2013 Databases in Networked Information Systems—8th International Workshop, DNIS’13, pp. 23–43 (2013)Google Scholar
  4. 4.
    Amir, Y., Danilov, C., Dolev, D., Kirsch, J., Lane, J., Nita-Rotaru, C., Olsen, J., Zage, D.: Steward: scaling byzantine fault-tolerant replication to wide area networks. IEEE Trans. Dependable Secur. Comput. 7(1), 80–93 (2010).  https://doi.org/10.1109/TDSC.2008.53 CrossRefGoogle Scholar
  5. 5.
    Bailis, P., Davidson, A., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Highly available transactions: virtues and limitations. Proc VLDB Endow 7(3), 181–192 (2013)CrossRefGoogle Scholar
  6. 6.
    Bailis, P., Fekete, A., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Scalable atomic visibility with RAMP transactions. ACM Trans Database Syst 41(3), 15 (2016)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Baker, J., Bond, C., Corbett, J.C., Furman, J., Khorlin, A., Larson, J., Leon, J.M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the Conference on Innovative Data system Research (CIDR), pp. 223–234 (2011)Google Scholar
  8. 8.
    Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput Surv 13(2), 185–221 (1981)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Bernstein, P.A., Goodman, N.: Multiversion concurrency control—theory and algorithms. ACM TODS 8(4), 465–483 (1983)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bernstein, P.A., Goodman, N.: An algorithm for concurrency control and recovery in replicated distributed databases. ACM Trans Database Syst 9(4), 596–615 (1984).  https://doi.org/10.1145/1994.2207 MathSciNetCrossRefGoogle Scholar
  11. 11.
    Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1987a)Google Scholar
  12. 12.
    Bernstein, P.A., Hadzilacos, V., Goodman, N.: Concurrency Control and Recovery in Database Systems. Addison-Wesley Longman Publishing Co., Boston, MA (1987b)Google Scholar
  13. 13.
    Boutros, B.S., Desai, B.C.: A two-phase commit protocol and its performance. In: IEEE, DEXA, pp. 100–105 (1996)Google Scholar
  14. 14.
    Chen, K., Zhou, Y., Cao, Y.: Online data partitioning in distributed database systems. In: Proceedings of the 18th International Conference on Extending Database Technology, OpenProceeding.org, pp. 1–12 (2015)Google Scholar
  15. 15.
    CockroachDB (2018). https://www.cockroachlabs.com/
  16. 16.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp. 143–154 (2010)Google Scholar
  17. 17.
    Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J., Ghemawat, S., Gubarev, A., Heiser, C., Hochschild, P., Hsieh, W., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L., Saito, Y., Szymaniak, M., Taylor, C., Wang, R., Woodford, D.: Spanner: Google’s globally-distributed database. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), USENIX Association, pp. 261–264 (2012)Google Scholar
  18. 18.
    Council TPP (2010) Tpc benchmark c (revision 5.11)Google Scholar
  19. 19.
    Diaconu, C., Freedman, C., Ismert, E., Larson, P.A., Mittal, P., Stonecipher, R., Verma, N., Zwilling, M.: Hekaton: SQL Server’s Memory-optimized OLTP Engine. ACM, pp. 1243–1254 (2013)Google Scholar
  20. 20.
    Dutta, P., Guerraoui, R., Pochon, B.: Fast non-blocking atomic commit: an inherent trade-off. Inf Process Lett 91(4), 195–200 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    El Abbadi, A., Skeen, D., Cristian, F.: An efficient, fault-tolerant protocol for replicated data management. In: Proceedings of the Fourth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, ACM, New York, PODS ’85, pp 215–229 (1985).  https://doi.org/10.1145/325405.325443
  22. 22.
    Freels, M.: FaunaDB: an architectural overview (2018)Google Scholar
  23. 23.
    Fung, B.: The embarrassing reason behind Amazons huge cloud computing outage this week. The Washington Post, Washington, DC (2017)Google Scholar
  24. 24.
    Gawlick, D., Kinkade, D.: Varieties of concurrency control in IMS/VS fast path. IEEE Database Eng. Bull. 8, 3–10 (1985)Google Scholar
  25. 25.
    Gifford, D.K.: Weighted voting for replicated data. In: Proceedings of the Seventh ACM Symposium on Operating Systems Principles, ACM, New York, NY, SOSP ’79, pp 150–162 (1979).  https://doi.org/10.1145/800215.806583
  26. 26.
    Gray, J.: Notes on data base operating systems. In: Operating Systems, An Advanced Course. Springer, Berlin, pp. 393–481 (1978)Google Scholar
  27. 27.
    Gray, J.: The transaction concept: virtues and limitations (invited paper). In: VLDB, pp. 144–154 (1981)Google Scholar
  28. 28.
    Gray, J.: A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem, pp. 10–17. Springer, New York (1990)Google Scholar
  29. 29.
    Gray, J., Lamport, L.: Consens. Trans. Commit. ACM TODS 31(1), 133–160 (2006)CrossRefGoogle Scholar
  30. 30.
    Gray, J., Reuter, A.: Transaction Processing: Concepts and Techniques, 1st edn. Morgan Kaufmann Publishers Inc., Burlington (1992)zbMATHGoogle Scholar
  31. 31.
    Guerraoui, R.: Revisiting the Relationship Between Non-blocking Atomic Commitment and Consensus, pp. 87–100. Springer, Berlin (1995)Google Scholar
  32. 32.
    Guerraoui, R., Larrea, M., Schiper, A.: Reducing the Cost for Non-blocking in Atomic Commitment. In: IEEE Proceedings of 16th International Conference on Distributed Computing Systems, pp. 692–697 (1996)Google Scholar
  33. 33.
    Gupta, S., Sadoghi, M.: Blockchain Transaction Processing, pp. 1–11. Springer, Cham (2018a)Google Scholar
  34. 34.
    Gupta, S., Sadoghi, M.: EasyCommit: A non-blocking two-phase commit protocol. In: Proceedings of the 21st International Conference on Extending Database Technology, Open Proceedings, EDBT (2018b)Google Scholar
  35. 35.
    Harding, R., Van Aken, D., Pavlo, A., Stonebraker, M.: An evaluation of distributed concurrency control. Proc VLDB Endow 10(5), 553–564 (2017)CrossRefGoogle Scholar
  36. 36.
    Haritsa, J.R., Ramamritham, K., Gupta, R.: The PROMPT real-time commit protocol. IEEE TPDS 11(2), 160–181 (2000)Google Scholar
  37. 37.
    Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM TOPLAS 12(3), 463–492 (1990)CrossRefGoogle Scholar
  38. 38.
    Jiménez-Peris, R., Patiño Martínez, M., Alonso, G., Arévalo, S.: A low-latency non-blocking commit service. Springer, Berlin DISC’01 (2001)Google Scholar
  39. 39.
    Kallman, R., Kimura, H., Natkins, J., Pavlo, A., Rasin, A., Zdonik, S.B., Jones, E.P.C., Madden, S., Stonebraker, M., Zhang, Y., Hugg, J., Abadi, D.J.: H-store: a high-performance, distributed main memory transaction processing system. PVLDB 1, 1496–1499 (2008)Google Scholar
  40. 40.
    Lamport, L.: The part-time parliament. ACM Trans Comput Syst 16(2), 133–169 (1998)CrossRefGoogle Scholar
  41. 41.
    Levy, E., Korth, H.F., Silberschatz, A.: An optimistic commit protocol for distributed transaction management. In: ACM SIGMOD, ACM, pp. 88–97 (1991)Google Scholar
  42. 42.
    Lin, Q., Chang, P., Chen, G., Ooi, B.C., Tan, K.L., Wang, Z.: Towards a non-2PC transaction management in distributed database systems. In: Proceedings of the 2016 International Conference on Management of Data, ACM, New York, NY, SIGMOD ’16, pp 1659–1674 (2016).  https://doi.org/10.1145/2882903.2882923
  43. 43.
    Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Stronger semantics for low-latency geo-replicated storage. In: USENIX Association, NSDI, pp. 313–328 (2013)Google Scholar
  44. 44.
    Mahmoud, H., Nawab, F., Pucher, A., Agrawal, D., El Abbadi, A.: Low-latency multi-datacenter databases using replicated commit. Proc VLDB Endow 6(9), 661–672 (2013).  https://doi.org/10.14778/2536360.2536366 CrossRefGoogle Scholar
  45. 45.
    Mahmoud, H.A., Arora, V., Nawab, F., Agrawal, D., El Abbadi, A.: MaaT: effective and scalable coordination of distributed transactions in the cloud. Proc VLDB Endow 7(5), 329–340 (2014).  https://doi.org/10.14778/2732269.2732270 CrossRefGoogle Scholar
  46. 46.
    Mao, Y., Junqueira, F.P., Marzullo, K.: Mencius: building efficient replicated state machines for WANs. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, pp. 369–384 (2008)Google Scholar
  47. 47.
    MemSQL (2013). http://www.memsql.com
  48. 48.
    Mohan, C., Lindsay, B., Obermarck, R.: Transaction management in the R* distributed database management system. ACM TODS 11(4), 378–396 (1986)CrossRefGoogle Scholar
  49. 49.
    Nawab, F., Sadoghi, M.: Blockplane: A global-scale byzantizing middleware. In: Proceedings of the 35th IEEE International Conference on Data Engineering, IEEE, ICDE ’19 (2019)Google Scholar
  50. 50.
    Nawab, F., Arora, V., Agrawal, D., El Abbadi, A.: Minimizing commit latency of transactions in geo-replicated data stores. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, ACM, SIGMOD ’15, pp 1279–1294 (2015)Google Scholar
  51. 51.
    NuoDB (2010). http://www.nuodb.com
  52. 52.
    O’Brien, S.A.: Facebook. Instagram experience outages Saturday. CNN, GA, USA (2017)Google Scholar
  53. 53.
    Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm. In: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX Association, USENIX ATC’14, pp. 305–320 (2014)Google Scholar
  54. 54.
    Oracle, C.: Oracle 9i real application clusters concepts release 2 (9.2), Part Number A96597-01 (2002)Google Scholar
  55. 55.
    Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, New York (2011)Google Scholar
  56. 56.
    Park, T., Yeom, H.Y.: A distributed group commit protocol for distributed database systems. ICPADS (1991)Google Scholar
  57. 57.
    Patterson, S., Elmore, A.J., Nawab, F., Agrawal, D., El Abbadi, A.: Serializability, not serial: concurrency control and availability in multi-datacenter datastores. Proc VLDB Endow 5(11), (2012)Google Scholar
  58. 58.
    Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems. In: ACM, SIGMOD ’12, pp. 61–72 (2012)Google Scholar
  59. 59.
    Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, OSDI’10, pp. 251–264 (2010)Google Scholar
  60. 60.
    Qadah, T.M., Sadoghi, M.: QueCC: a queue-oriented, control-free concurrency architecture. In: Proceedings of the 19th International Middleware Conference, ACM, New York, NY, Middleware ’18, pp 13–25, (2018).  https://doi.org/10.1145/3274808.3274810
  61. 61.
    Reddy, P.K., Kitsuregawa, M.: Reducing the blocking in two-phase commit protocol employing backup sites. In: IEEE, COOPIS’98, pp. 406–416 (1998)Google Scholar
  62. 62.
    Sadoghi, M., Blanas, S.: Transaction processing on modern hardware. Synth. Lect. Data Manag. 14(2), 1–138 (2019).  https://doi.org/10.2200/S00896ED1V01Y201901DTM058 CrossRefGoogle Scholar
  63. 63.
    Sadoghi, M., Ross, K.A., Canim, M., Bhattacharjee, B.: Making updates disk-I/O friendly using SSDs. Proc VLDB Endow 6(11), 997–1008 (2013)CrossRefGoogle Scholar
  64. 64.
    Sadoghi, M., Canim, M., Bhattacharjee, B., Nagel, F., Ross, K.A.: Reducing database locking contention through multi-version concurrency. Proc VLDB Endow 7(13), 1331–1342 (2014)CrossRefGoogle Scholar
  65. 65.
    Sadoghi, M., Bhattacherjee, S., Bhattacharjee, B., Canim, M.: L-Store: A real-time OLTP and OLAP system (2018). http://www.OpenProceeding.org, EDBT
  66. 66.
    Samaras, G., Britton, K., Citron, A., Mohan, C.: Two-phase commit optimizations in a commercial distributed environment. Distrib. Parallel Databases 3(4), 325–360 (1995)CrossRefGoogle Scholar
  67. 67.
    Shute, J., Vingralek, R., Samwel, B., Handy, B., Whipkey, C., Rollins, E., Oancea, M., Littleeld, K., Menestrina, D., Ellner, S., Apte, H.: F1: A distributed sql database that scales. In: VLDB (2013)Google Scholar
  68. 68.
    Skeen, D.: Nonblocking commit protocols. In: ACM, SIGMOD, pp. 133–142 (1981)Google Scholar
  69. 69.
    Skeen, D.: A quorum-based commit protocol. Tech. rep. (1982)Google Scholar
  70. 70.
    Skeen, D., Stonebraker, M.: A formal model of crash recovery in a distributed system. IEEE Trans. Softw. Eng. 9(3), 219–228 (1983)CrossRefGoogle Scholar
  71. 71.
    Stamos, J., Cristian, F.: A low-cost atomic commit protocol. In: Proceedings of the 9th Symposium on Reliable Distributed Systems, IEEE, pp. 10–17 (1990)Google Scholar
  72. 72.
    Stonebraker, M.: Concurrency control and consistency of multiple copies of data in distributed ingres. IEEE Trans. Softw. Eng. SE–5(3), 188–194 (1979).  https://doi.org/10.1109/TSE.1979.234180 CrossRefzbMATHGoogle Scholar
  73. 73.
    Stonebraker, M.: The case for shared nothing. Database Eng. 9, 4–9 (1986)Google Scholar
  74. 74.
    Sulleyman, A.: Twitter down: social media app and website not working. The Independent, UK (2017)Google Scholar
  75. 75.
    Thomson, A., Diamond, T., Weng, S.C., Ren, K., Shao, P., Abadi, D.J.: Calvin: fast distributed transactions for partitioned database systems. In: SIGMOD (2012)Google Scholar
  76. 76.
  77. 77.
    VoltDB (2010). https://www.voltdb.com/

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of California, DavisDavisUSA

Personalised recommendations