Advertisement

Computing

, Volume 97, Issue 2, pp 171–201 | Cite as

Self-repairing codes

Local repairability for cheap and fast maintenance of erasure coded data
  • Frédérique Oggier
  • Anwitaman DattaEmail author
Article

Abstract

Networked distributed data storage systems are essential to deal with the needs of storing massive volumes of data. Dependability of such a system relies on its fault tolerance (data should be available in case of node failures) as well as its maintainability (its ability to repair lost data to ensure redundancy replenishment over time). Erasure codes provide a storage efficient alternative to replication based redundancy in storage systems, ensuring the same fault tolerance at a lower storage overhead cost. Traditional erasure codes however have the drawback of entailing high communication overhead for maintenance, when encoded fragments are lost due to storage device failures, and need to be replenished in new nodes. We propose a new family of erasure codes called self-repairing codes (SRC) taking into account the peculiarities of distributed storage systems, specifically to improve its maintainability by ‘localizing’ the repairs. SRC have the property that encoded fragments can be repaired directly from other small subsets of (typically 2 or 3) encoded fragments. These code properties allow bandwidth efficient and fast recovery even in the presence of multiple failures, in turn translating into better system robustness. A concrete family of such locally repairable codes, namely, homomorphic SRC are proposed and various aspects and properties of the same are studied in detail and compared—quantitatively or qualitatively (as may be suitable) with respect to other codes including traditional erasure codes as well as some recent representative codes designed specifically for storage applications.

Keywords

Erasure codes Local repair Networked storage 

Mathematics Subject Classification

94Bxx 

Notes

Acknowledgments

This work has been supported by the MoE Tier-2 grant MOE2013-T2-1-068 “eCODE: Erasure Codes for Datacenter Environments”.

References

  1. 1.
    Buttyán L, Czap L, Vajda I (2011) Detection and recovery from pollution attacks in coding-based distributed storage schemes. In: IEEE transactions on dependable and secure computing, vol 8, issue no 6Google Scholar
  2. 2.
    Bhagwan R, Tati K, Cheng Y, Savage S, Voelker G (2004) Total recall: system support for automated availability management. In: Networked systems design and implementation (NSDI), 2004Google Scholar
  3. 3.
  4. 4.
    Dimakis AG, Brighten Godfrey P, Wu Y, Wainwright MO, Ramchandran K (2010) Network coding for distributed storage systems. IEEE Trans Inf Theory 56(9):4539–4551Google Scholar
  5. 5.
    Datta A, Aberer K (2006) Internet-scale storage systems under churn—a study of the steady-state using Markov models. In: Peer-to-peer computing (P2P)Google Scholar
  6. 6.
    Duminuco A, Biersack E (2008) Hierarchical codes: how to make erasure codes attractive for peer-to-peer storage systems. In: Peer-to-peer computing (P2P)Google Scholar
  7. 7.
    Duminuco A, Biersack EW (2009) A practical study of regenerating codes for peer-to-peer backup systems. In: International conference on distributed computing systems (ICDCS) Google Scholar
  8. 8.
    El Rouayheb S, Ramchandran K (2010) Fractional repetition codes for repair in distributed storage systems. In: Allerton conference on communication control and computingGoogle Scholar
  9. 9.
    Gaonkar S, Keeton K, Merchant A, Sanders WH (2010) Designing dependable storage solutions for shared application environments. IEEE Trans Dependable Secur Comput 7(4)Google Scholar
  10. 10.
    Kermarrec A-M, Le Scouarnec N, Straub G (2010) Beyong regenerating codes. In: Technical report, 10 Sept 2010Google Scholar
  11. 11.
    Liu X, Datta A (2009) Redundancy maintenance and garbage collection strategies in peer-to-peer storage systems. In: International symposium on stabilization, safety, and security of distributed systems (SSS)Google Scholar
  12. 12.
    Gopalan P, Huang C, Simitchi H, Yekhanin S (2012) On the locality of codeword symbols. IEEE Trans Inf Theory 58(11):6925–6934CrossRefGoogle Scholar
  13. 13.
    Grolimund D (2007) Wuala—a distributed file system. Google Tech Talk http://www.youtube.com/watch?v=3xKZ4KGkQY8. Accessed 11 Aug 2013
  14. 14.
    Haytaoglu E, Dalkilic ME (2013) Homomorphic minimum bandwidth repairing codes. In: Information sciences and systems 2013. Lecture notes in electrical engineering, vol 264, pp 339–348Google Scholar
  15. 15.
    Hollmann H (2013) Storage codes—coding rate and repair locality. In: ICNC 2013Google Scholar
  16. 16.
    Huang C, Chen M, Li J (2007) Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. In: NCA 2007Google Scholar
  17. 17.
    Kamath GM, Prakash N, Lalitha V, Vijay Kumar P (2013) Codes with Local Regeneration. In: The proceedings of the workshop on information theory and applications (ITA), 2013. Available at arXiv:1211.1932. Accessed Nov 2012
  18. 18.
    Oggier F, Datta A (2011) Self-repairing homomorphic codes for distributed storage systems. In: INFOCOM 2011Google Scholar
  19. 19.
    Oggier F, Datta A (2011) Self-repairing codes for distributed storage—a projective geometric construction. In: ITW 2011Google Scholar
  20. 20.
    Papailiopoulos DS, Dimakis AG (2012) Locally repairable codes. In: ISIT 2012Google Scholar
  21. 21.
    Rashmi KV, Shah NB, Kumar PV, Ramchandran K (2009) Explicit construction of optimal exact regenerating codes for distributed storage. In: Allerton conference on control, computing and communication 2009Google Scholar
  22. 22.
    Rashmi KV, Shah NB, Kumar PV (2011) Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction. IEEE Trans Inf Theory 57(8):5227–5239Google Scholar
  23. 23.
    Rashmi KV, Shah NB, Kumar PV (2011) Enabling node repair in any erasure code for distributed storage. In: IEEE International Symposium on Information Theory, 2011. IEEE, New YorkGoogle Scholar
  24. 24.
    Reed IS, Solomon G (1960) Polynomial Codes Over Certain Finite Fields. J Soc Ind Appl Math 8(2):300–304Google Scholar
  25. 25.
    Rodrigues R, Liskov B (2005) High availability in DHTs: erasure coding vs. replication. In: Workshop on peer-to-peer systems (IPTPS)Google Scholar
  26. 26.
    Shum KW (2011) Cooperative regenerating codes for distributed storage systems. ICC 2011Google Scholar
  27. 27.
    Shum KW, Hu Y (2013) Cooperative regenerating codes. arXiv:1207.6762
  28. 28.
    Silberstein N, Rawat AS, Koyluoglu O, Vishwanath S (2012) Optimal locally repairable codes via rank-metric codes. arXiv:1301.6331
  29. 29.
    Tamo I, Wang Z, Bruck J (2011) MDS array codes with optimal rebuilding. In: IEEE ISIT 2011Google Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  1. 1.School of Physical and Mathematical SciencesNanyang Technological UniversitySingaporeSingapore
  2. 2.School of Computer EngineeringNanyang Technological UniversitySingaporeSingapore

Personalised recommendations