Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas

Cachin, Christian; Dobre, Dan; Vukolić, Marko

doi:10.1007/978-3-319-11764-5_1

Christian Cachin¹⁷,
Dan Dobre¹⁸ &
Marko Vukolić¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8756))

Included in the following conference series:

Symposium on Self-Stabilizing Systems

564 Accesses
8 Citations

Abstract

The overhead of Byzantine fault tolerant (BFT) storage is a primary concern that prevents its adoption in practice. The cost stems from the need to maintain at least 3t + 1 copies of the data at different storage replicas in the asynchronous model, so that t Byzantine replica faults can be tolerated. This paper presents MDStore, the first fully asynchronous BFT storage protocol that reduces the number of replicas that store the payload data to as few as 2t + 1 and maintains metadata at 3t + 1 replicas on (possibly) different servers. At the heart of MDStore lies a metadata service built upon a new abstraction called “timestamped storage.” Timestamped storage allows for conditional writes (facilitating the implementation of the metadata service) and has consensus number one (making it implementable with wait-free semantics in an asynchronous system despite faults). In addition to its low replication overhead, MDStore offers strong guarantees by emulating a multi-writer multi-reader atomic register, providing wait-free termination, and tolerating any number of Byzantine readers and crash-faulty writers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abraham, I., Chockler, G., Keidar, I., Malkhi, D.: Byzantine disk Paxos: Optimal resilience with Byzantine shared memory. Distributed Computing 18(5), 387–408 (2006)
Article MATH Google Scholar
Adya, A., Bolosky, W.J., Castro, M., et al.: FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In: Proc. 5th Symp. Operating Systems Design and Implementation, OSDI (2002)
Google Scholar
Aiyer, A.S., Alvisi, L., Bazzi, R.A.: Bounded wait-free implementation of optimally resilient Byzantine storage without (unproven) cryptographic assumptions. In: Pelc, A. (ed.) DISC 2007. LNCS, vol. 4731, pp. 7–19. Springer, Heidelberg (2007)
Chapter Google Scholar
Androulaki, E., Cachin, C., Dobre, D., Vukolić, M.: Erasure-coded Byzantine storage with separate metadata. Report ArXiv:1402.4958, CoRR (2014)
Google Scholar
Attiya, H., Welch, J.: Distributed Computing: Fundamentals, Simulations and Advanced Topics. McGraw-Hill, London (1998)
Google Scholar
Cachin, C., Dobre, D., Vukolić, M.: BFT storage with 2t + 1 data replicas. Report ArXiv:1305.4868, CoRR (2013)
Google Scholar
Cachin, C., Guerraoui, R., Rodrigues, L.: Introduction to Reliable and Secure Distributed Programming, 2nd edn. Springer (2011)
Google Scholar
Cachin, C., Junker, B., Sorniotti, A.: On limitations of using cloud storage for data replication. In: Proc. 6th Workshop on Recent Advances in Intrusion Tolerance and reSilience, WRAITS 2012 (2012)
Google Scholar
Cachin, C., Tessaro, S.: Optimal resilience for erasure-coded Byzantine distributed storage. In: Proc. International Conference on Dependable Systems and Networks (DSN-DCCS), pp. 115–124 (2006)
Google Scholar
Cho, B., Aguilera, M.K.: Surviving congestion in geo-distributed storage systems. In: Proc. USENIX Annual Technical Conference, pp. 439–451 (2012)
Google Scholar
Chun, B.-G., Maniatis, P., Shenker, S., Kubiatowicz, J.: Attested append-only memory: Making adversaries stick to their word. In: Proc. 21st ACM Symposium on Operating Systems Principles (SOSP), pp. 189–204 (2007)
Google Scholar
Correia, M., Neves, N.F., Veríssimo, P.: How to tolerate half less one Byzantine nodes in practical distributed systems. In: Proc. 23rd Symposium on Reliable Distributed Systems (SRDS), pp. 174–183 (2004)
Google Scholar
Dobre, D., Karame, G., Li, W., Majuntke, M., Suri, N., Vukolić, M.: PoWerStore: Proofs of writing for efficient and robust storage. In: Proc. ACM Conference on Computer and Communications Security, CCS (2013)
Google Scholar
Dobre, D., Viotti, P., Vukolić, M.: Hybris: Consistency hardening in robust hybrid cloud storage. Research Report RR-13-291, Eurécom (2013)
Google Scholar
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35(2), 288–323 (1988)
Article MathSciNet Google Scholar
Fan, R., Lynch, N.A.: Efficient replication of large data objects. In: Fich, F.E. (ed.) DISC 2003. LNCS, vol. 2848, pp. 75–91. Springer, Heidelberg (2003)
Chapter Google Scholar
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2), 374–382 (1985)
Article MathSciNet MATH Google Scholar
Guerraoui, R., Vukolić, M.: How fast can a very robust read be? In: Proc. 25th ACM Symposium on Principles of Distributed Computing (PODC), pp. 248–257 (2006)
Google Scholar
Guerraoui, R., Vukolić, M.: Refined quorum systems. Distributed Computing 23(1), 1–42 (2010)
Article MATH Google Scholar
Herlihy, M.: Wait-free synchronization. ACM Transactions on Programming Languages and Systems 11(1), 124–149 (1991)
Article Google Scholar
Herlihy, M.P., Wing, J.M.: Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems 12(3), 463–492 (1990)
Article Google Scholar
Kapitza, R., Behl, J., Cachin, C., Distler, T., Kuhnle, S., Mohammadi, S.V., Schröder-Preikschat, W., Stengel, K.: CheapBFT: Resource-efficient Byzantine fault tolerance. In: Proc. 7th European Conference on Computer Systems (EuroSys), pp. 295–308 (April 2012)
Google Scholar
Katz, J., Lindell, Y.: Introduction to Modern Cryptography: Principles and Protocols. Chapman & Hall/CRC (2007)
Google Scholar
Lamport, L.: On interprocess communication. Distributed Computing 1(2), 77–85, 86–101 (1986)
Google Scholar
Lamport, L.: The part-time parliament. ACM Transactions on Computer Systems 16(2), 133–169 (1998)
Article Google Scholar
Lamport, L.: Paxos made simple. SIGACT News 32(4), 51–58 (2001)
Google Scholar
Lamport, L.: Lower bounds for asynchronous consensus. In: Schiper, A., Shvartsman, M.M.A.A., Weatherspoon, H., Zhao, B.Y. (eds.) Future Directions in Distributed Computing. LNCS, vol. 2584, pp. 22–23. Springer, Heidelberg (2003)
Chapter Google Scholar
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Lynch, N.A., Tuttle, M.R.: An introduction to input/output automata. CWI Quaterly 2(3), 219–246 (1989)
MathSciNet MATH Google Scholar
Malkhi, D., Reiter, M.: Secure and scalable replication in Phalanx. In: Proc. 17th Symposium on Reliable Distributed Systems, SRDS (1998)
Google Scholar
Malkhi, D., Reiter, M.K.: Byzantine quorum systems. Distributed Computing 11(4), 203–213 (1998)
Article Google Scholar
Martin, J.-P., Alvisi, L., Dahlin, M.: Minimal Byzantine storage. In: Malkhi, D. (ed.) DISC 2002. LNCS, vol. 2508, pp. 311–325. Springer, Heidelberg (2002)
Chapter Google Scholar
Veronese, G.S., Correia, M., Bessani, A., Lung, L.C., Veríssimo, P.: Efficient Byzantine fault tolerance. IEEE Transactions on Computers 62(1), 16–30 (2011)
Article Google Scholar
Vukolić, M.: Quorum Systems: With Applications to Storage and Consensus. Synthesis Lectures on Distributed Computing Theory. Morgan & Claypool Publishers (2012)
Google Scholar
Wang, Y., Alvisi, L., Dahlin, M.: Gnothi: Separating data and metadata for efficient and available storage replication. In: Proc. USENIX Annual Technical Conference, pp. 413–424 (2012)
Google Scholar
Wilkes, J., Hoover, C., Keer, B., Mehra, P., Veitch, A.: Storage, Data, and Information Systems. HP Laboratories (2008)
Google Scholar
Yin, J., Martin, J.-P., Venkataramani, A., Alvisi, L., Dahlin, M.: Separating agreement from execution for Byzantine fault-tolerant services. In: Proc. 19th ACM Symposium on Operating Systems Principles (SOSP), pp. 253–268 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research - Zurich, Switzerland
Christian Cachin
Work done at NEC Labs Europe, Germany
Dan Dobre
Eurécom, France
Marko Vukolić

Authors

Christian Cachin
View author publications
You can also search for this author in PubMed Google Scholar
Dan Dobre
View author publications
You can also search for this author in PubMed Google Scholar
Marko Vukolić
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut d’informatique, Université de Neuchâtel, Rue Emile-Argand 11, 2000, Neuchâtel, Switzerland
Pascal Felber
Electrical and Computer Engineering Department, University of Texas at Austin, 1 University Station, 78712-0240, Austin, TX, USA
Vijay Garg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cachin, C., Dobre, D., Vukolić, M. (2014). Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas. In: Felber, P., Garg, V. (eds) Stabilization, Safety, and Security of Distributed Systems. SSS 2014. Lecture Notes in Computer Science, vol 8756. Springer, Cham. https://doi.org/10.1007/978-3-319-11764-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-11764-5_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11763-8
Online ISBN: 978-3-319-11764-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Separating Data and Control: Asynchronous BFT Storage with 2t + 1 Data Replicas