Advertisement

Distributed Computing

, Volume 18, Issue 1, pp 73–84 | Cite as

Active Disk Paxos with infinitely many processes

  • Gregory Chockler
  • Dahlia Malkhi
Special Issue PODC

Abstract

Abstract We present an improvement to the Disk Paxos protocol by Gafni and Lamport which utilizes extended functionality and flexibility provided by Active Disks and supports unmediated concurrent data access by an unlimited number of processes. The solution facilitates coordination by an infinite number of clients using finite shared memory. It is based on a collection of read-modify-write objects with faults, that emulate a new, reliable shared memory abstraction called a ranked register. The required read-modify-write objects are readily available in Active Disks and in Object Storage Device controllers, making our solution suitable for state-of-the-art Storage Area Network (SAN) environments.

Keywords

Shared memory Consensus Paxos Infinitely many processes Non-responsive object faults 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afek, Y., Greenberg, D.S., Merritt, M., Taubenfeld, G.: Computing with faulty shared objects. J. ACM 42(6), 1231-1274 (1995)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Acharya, A., Uysal, M., Saltz, J.: Active Disks: programming model, algorithms and evaluation. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII) (1998)Google Scholar
  3. 3.
    Amiri, K., Gibson, G.A., Golding, R.: Highly concurrent shared storage. In: Proceedings of the International Conference on Distributed Computing Systems (ICDCS'2000) (2000)Google Scholar
  4. 4.
    Anderson, T., Dahlin, M., Neefe, J., Patterson, D., Roselli, D., Wang, R.: Serverless network file systems. ACM Trans. Comput. Syst. 14(1), 41-79 (1996)CrossRefGoogle Scholar
  5. 5.
    Birman, I.K., Joseph, T.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the 11th Annual Symposium on Operating Systems Principles, pp. 123-138 (1987)Google Scholar
  6. 6.
    Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing Paxos. Technical Report DSC ID:200106, Communication Systems Department (DSC), École Polytechnic Fédérale de Lausanne (EPFL) (2001). Available at http://dscwww.epfl.ch/EN/publications/documents/tr01\006.pdfGoogle Scholar
  7. 7.
    Boichat, R., Dutta, P., Frolund, S., Guerraoui, R.: Deconstructing paxos. ACM SIGACT News Distrib. Comput. Column. 34(1), 47-67 (2003)Google Scholar
  8. 8.
    Burns, R.: Data management in a distributed file system for Storage Area Networks. PhD Thesis, Department of Computer Science, University of California, Santa Cruz (2000)Google Scholar
  9. 9.
    Burns, J., Lynch, N.: Bounds on shared memory for mutual exclusion. Inform. Comput. 107(2), 171-184 (1993)MathSciNetGoogle Scholar
  10. 10.
    Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. J. ACM 43(4), 685-722 (1996)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. J. ACM 43(2), 225-267 (1996)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Chockler, G.V., Keidar, I., Vitenberg, R.: Group communication specifications: a comprehensive study. ACM Comput. Surv. 33(4), 1-43 (2001)CrossRefGoogle Scholar
  13. 13.
    Chockler, G.V., Keidar, I., Malkhi, D.: Computing with Byzantine storage. In: Preparation.Google Scholar
  14. 14.
    Chockler, G., Malkhi, D., Dolev, D.: State-machine replication with infinitely many processes: a position paper. In: Proceedings of the International Workshop on Future Directions in Distributed Computing (FuDiCo), Bertinoro, Italy (2002)Google Scholar
  15. 15.
    Chockler, G., Malkhi, D., Reiter, M.K.: Backoff protocols for distributed mutual exclusion and ordering. In: Proceedings of the 21st International Conference on Distributed Computing Systems, pp. 11-20 (2001)Google Scholar
  16. 16.
    Chor, B., Dwork, C.: Randomization in Byzantine agreement. In: Micali, S. (ed.). Advances in Computing Research, Randomness in Computation, vol. 5, pp. 443-497. JAI Press (1989)Google Scholar
  17. 17.
    Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. In: Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing (1998)Google Scholar
  18. 18.
    DePrisco, R., Lampson, B., Lynch, N.: Fundamental study: revisiting the Paxos algorithm. Theoret. Comput. Sci. 243, 35-91 (2000)MathSciNetGoogle Scholar
  19. 19.
    Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. J. ACM 34(1), 77-97 (1987)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. J. ACM 35(2), 288-323 (1988)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2), 374-382 (1985)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Fekete, A., Lynch, N., Shvartsman, A.: Specifying and using a partitionable group communication service. ACM Trans. Comput. Syst. 19(2), 171-216 (2001)CrossRefGoogle Scholar
  23. 23.
    Gafni, E., Lamport, L.: Disk Paxos. Distribut. Comput. 16(1), 1-20 (2003)Google Scholar
  24. 24.
    Gafni, E., Merritt, M., Taubenfeld, G.: The concurrency hierarchy, and algorithms for unbounded concurrency. In: Proceedings of the 20th ACM Symposium on Principles of Distributed Computing (PODC 2001) (2001)Google Scholar
  25. 25.
    Gibson, G.A., Nagle, D.F., Amiri, K., Butler, J., Chang, F.W., Gobioff, H., Hardin, C., Riedel, E., Rochberg, D., Zelenka, J.: A cost-effective high-bandwidth storage architecture. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (1998)Google Scholar
  26. 26.
    Gibson, G.A., Nagle, D.F., Amiri, K., Chang, F.W., Gobioff, H., Riedel, E., Rochberg, D., Zelenka, J.: Filesystems for network-attached secure disks. Technical Report CMU-CS-97-118 (1997)Google Scholar
  27. 27.
    Gobioff, H., Gibson, G.A., Tygar, D.: Security for network attached storage devices. Technical Report CMU-CS-97-185 (1997)Google Scholar
  28. 28.
    Hotz, S.,Van Meter, R., Finn, G.: Internet protocols for network-attached peripherals. In: Proceedings of the Sixth NASA Goddard Conference on Mass Storage Systems and Technologies in conjunction with 15th IEEE Symposium on Mass Storage Systems (1998)Google Scholar
  29. 29.
    Hartman, J.H., Murdock, I., Spalink, T.: The Swarm scalable storage system. In: Proceedings of the 19th IEEE International Conference on Distributed Computing Systems (ICDCS'99) (1999)Google Scholar
  30. 30.
    Herlihy, M.: Wait-free synchronization. ACM Trans. Program. Languag. Syst. 11(1), 124-149 (1991)Google Scholar
  31. 31.
    Jayanti, P., Chandra, T., Toueg, S.: Fault-tolerant wait-free shared objects. J. ACM 45(3), 451-500 (1998)CrossRefMathSciNetGoogle Scholar
  32. 32.
    Keidar, I., Dolev, D.: Totally ordered broadcast in the face of network partitions: exploiting group communication for replication in partitionable networks. In: Avresky, D. (ed.). Dependable Network Computing, Chap. 3. Kluwer Academic Publications (2000)Google Scholar
  33. 33.
    Lamport, L.: Time, clocks, and the ordering of events in distributed systems. Communi. ACM 21(7), 558-565 (1978)MATHGoogle Scholar
  34. 34.
    Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133-169 (1998)CrossRefGoogle Scholar
  35. 35.
    Lamport, L.: Paxos made simple. Distribut. Comput. Column. SIGACT News 32(4), 34-58 (2001)Google Scholar
  36. 36.
    Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Languag. Syst. 4(3), 382-401 (1982)Google Scholar
  37. 37.
    Lampson, B.W.: How to build a highly available system using consensus. In: Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG), LNCS 1151. Springer-Verlag, Berlin (1996)Google Scholar
  38. 38.
    Lee, E.K., Thekkath, C.: Petal: distributed virtual disks. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 84-92 (1996)Google Scholar
  39. 39.
    Lo, W.K., Hadzilacos, V.: Using failure detectors to solve consensus in asynchronous shared-memory systems. In: Proceedings of the 8th International Workshop on Distributed Algorithms (WDAG), LNCS 857, pp. 280-295. Springer-Verlag, Berlin (1994)Google Scholar
  40. 40.
    Loui, M.C., Abu-Amara, H.H.: Memory requirements for agreement among unreliable asynchronous processes, In: Franco, P.P. (ed.). Parallel and Distributed Computing: vol. 4 of Advances in Computing Research, pp. 163-183. JAI Press, Greenwich, Conn. (1987)Google Scholar
  41. 41.
    Malkhi, D.: From Byzantine agreement to practical survivability. In: The International Workshop on Self-Repairing and Self-Configurable Distributed Systems (RCDS'2002) Osaka, Japan (2002)Google Scholar
  42. 42.
    Malkhi, D., Reiter, M.K.: An architecture for survivable coordination in large-scale systems. IEEE Transact. Knowledge Data Eng. 12(2), 187-202 (2000)Google Scholar
  43. 43.
    Merritt, M., Taubenfeld, G.: Computing with infinitely many processes. In: Proceedings of 14th International Symposium on Distributed Computing (DISC'2000), pp. 164-178 (2000)Google Scholar
  44. 44.
    Mostéfaoui, A., Raynal, M.: Leader-based consensus. Parallel Process. Lett. 11(1), 95-107 (2001)MathSciNetGoogle Scholar
  45. 45.
    National Storage Industry Consortium. http://www.nsic.org/nasd
  46. 46.
    Powell, D. (ed.): Group communication. Commun. ACM 39(4), 50-97 (1996)Google Scholar
  47. 47.
    Riedel, E., Faloutsos, C., Gibson, G.A., Nagle, D.: Active disks for large-scale data processing. IEEE Comput. 68-74 (2001)Google Scholar
  48. 48.
    Skeen, M.D.: Nonblocking commit protocols. In: SIGMOD International Conference Management of Data (1981)Google Scholar
  49. 49.
    Skeen, M.D.: Crash recovery in a distributed database system. PhD Thesis, UC Berkeley (1982)Google Scholar
  50. 50.
    Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299-319 (1990)CrossRefGoogle Scholar
  51. 51.
    Thekkath, C., Mann, T., Lee, E.K.: Frangipani: a scalable distributed file system. In: Proceedings of the 16th ACM Symposium on Operating Systems Principles, pp. 224-237 (1997)Google Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  1. 1.MIT Computer Science and Artificial Intelligence LaboratoryCambridgeUSA

Personalised recommendations