Skip to main content
Log in

Sparsity exploiting erasure coding for distributed storage of versioned data

  • Published:
Computing Aims and scope Submit manuscript

Abstract

In this paper we study the problem of storing reliably an archive of versioned data. Specifically, we focus on systems where the differences (deltas) between subsequent versions rather than the whole objects are stored—a typical model for storing versioned data. For reliability, we propose erasure encoding techniques that exploit the sparsity of information in the deltas while storing them reliably in a distributed back-end storage system, resulting in improved I/O read performance to retrieve the whole versioned archive. Along with the basic techniques, we propose a few optimization heuristics, and evaluate the techniques’ efficacy analytically and with numerical simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Esmaili KS, Chiniah A, Datta A (2013) Efficient updates in cross-object erasure-coded storage systems. In: IEEE international conference on big data

  2. Ford D, Labelle F, Popovici FI, Stokely M, Truong V-A, Barroso L, Grimes C, Quinlan S (2010) Availability in globally distributed storage systems. In: The 9th USENIX conference on operating systems designand implementation (OSDI)

  3. Han S, Pai H-T, Zheng R, Varshney PK (2013) Update-efficient regenerating codes with minimum per-node storage. In: Proceedings of the Int. Symp. Inf. Theory

  4. Harshan J, Oggier F, Datta A (2015) Sparsity exploiting erasure coding for resilient storage and efficient i/o access in delta based versioning systems. In: ICDCS 2015

  5. Lacan J, Fimes J (2003) A construction of matrices with no singular square submatrices. In: International conference on finite fields and applications

  6. Mazumdar A, Wornell GW, Chandar V (2012) Update efficient codes for error correction. In: Proceedings of the Int. Symp. Inf. Theory

  7. Oggier F, Datta A (2013) Coding techniques for repairability in networked distributed storage systems. In: Foundations and Trends in Communications and Information Theory. Now Publishers, Breda

  8. Rawat A, Vishwanath S, Bhowmick A, Soljanin E (2011) Update efficient codes for distributed storage. In: Proceedings of the Int. Symp. Inf. Theory

  9. Rouayheb S, Goparaju S, Kiah H, Milenkovic O (2015) Synchronising edits in distributed storage networks. In: Proceedings of the Int. Symp. Inf. Theory

  10. SVN. http://subversion.apache.org/. Accessed 15 Dec 2015

  11. Thusoo A, Shao Z, Anthony S, Borthakur D, Jain N, Sarma JS, Murthy R, Liu H (2010) Data warehousing and analytics infrastructure at facebook. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ser. SIGMOD 10

  12. Tarasov V, Mudrankit A, Buik W, Shilane P, Kuenning G, Zadok E (2012) Generating realistic datasets for deduplication analysis. In Proceedings of the 2012 USENIX conference on Annual Technical Conference

  13. Wang Z, Cadambe V (2014) Multi-version coding for distributed storage. In Proceedings of the Int. Symp. Inf. Theory

  14. Zhang F, Pfister HD (2008) Compressed sensing and linear codes over real numbers. In: Information theory and applications workshop (ITA)

Download references

Acknowledgments

This work is supported by the MoE Tier-2 grant MOE2013-T2-1-068 “eCode: erasure codes for data center environments”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Harshan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harshan, J., Oggier, F. & Datta, A. Sparsity exploiting erasure coding for distributed storage of versioned data. Computing 98, 1305–1329 (2016). https://doi.org/10.1007/s00607-016-0485-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-016-0485-x

Keywords

Mathematics Subject Classification

Navigation