A study of the performance of novel storage-centric repairable codes

Datta, Anwitaman; Pamies-Juarez, Lluis; Oggier, Frédérique

doi:10.1007/s00607-015-0468-3

A study of the performance of novel storage-centric repairable codes

Published: 29 July 2015

Volume 98, pages 319–341, (2016)
Cite this article

Computing Aims and scope Submit manuscript

Anwitaman Datta¹,
Lluis Pamies-Juarez² &
Frédérique Oggier³

331 Accesses
1 Citation
Explore all metrics

Abstract

Erasure coding has become an integral part of the storage infrastructure in data-centers and cloud backends—since it provides significantly higher fault tolerance for substantially lower storage overhead compared to a naive approach like n-way replication. Fault tolerance refers to the ability to achieve very high availability despite (temporary) failures, but for long term data durability, the redundancy provided by erasure coding needs to be replenished as storage nodes fail or are retired. Traditional erasure codes are not easily amenable to repairs, and their repair process is usually both expensive and slow. Consequently, in recent years, numerous novel codes tailor-made for distributed storage have been proposed to optimize the repair process. Broadly, most of these codes belong to either of the two following families: network coding inspired regenerating codes that aim at minimizing the per repair traffic, and locally repairable codes (LRC) which minimize the number of nodes contacted per repair (which in turn leads to the reduction of repair traffic and latency). Existing studies of these codes however restrict themselves to the repair of individual data objects in isolation. They ignore many practical issues that a real system storing multiple objects needs to take into account. Our goal is to explore a subset of such issues, particularly pertaining to the scenario where multiple objects are stored in the system. We use a simulation based approach, which models the network bottlenecks at the edges of a distributed storage system, and the nodes’ load and (un)availability. Specifically, we abstract the key features of both regenerating and LRC, and examine the effect of data placement and the corresponding de/correlation of failures, and the competition for limited network resources when multiple objects need to be repaired simultaneously by exploring the interplay of code parameters and trade-offs of bandwidth usage and speed of repairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A discrete heuristic algorithm with swarm and evolutionary features for data replication problem in distributed systems

Article 30 August 2023

Bahman Arasteh, Tofigh Allahviranloo, … Muammer Catak

Cloud storage tier optimization through storage object classification

Article Open access 03 April 2024

Akif Quddus Khan, Mihhail Matskin, … Ahmet Soylu

Serverless Computing: Current Trends and Open Problems

References

Ahlswede R, Cai N, Li SYR, Yeung RW (2000) Network information flow. IEEE Trans Inf Theory 46(4):1204–1216
Article MathSciNet MATH Google Scholar
Amazon.com. Amazon S3. http://aws.amazon.com/s3. Accessed 21 July 2015
Apache.org. HDFS. http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 21 July 2015
Apache.org. HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID. Accessed 21 July 2015
Calder B et al (2011) Windows azure storage: a highly available cloud storage service with strong consistency. In: 23rd ACM symposium on operating systems principles (SOSP)
Dalle O, Giroire F, Monteiro J, Prennes S (2009) Analysis of failure correlation impact on peer-to-peer storage systems. In: Proceedings of the 9th international conference on peer-to-peer computing (P2P)
Dimakis AG, Godfrey PB, Wu Y, Wainwright M, Ramchandran K (2010) Network coding for distributed storage systems. IEEE Trans Inf Theory 56(9):4539–4551
Fan B, Tantisiriroj W, Xiao L, Gibson G (2009) Diskreduce: raid for data-intensive scalable computing. In: The 4th annual workshop on petascale data storage (PDSW), pp 6–10
Ford D, Labelle F, Popovici FI, Stokely M, Truong V-A, Barroso L, Grimes C, Quinlan S (2010) Availability in globally distributed storage systems. In: The 9th USENIX conference on operating systems design and implementation (OSDI)
Ghemawat S, Gobioff H, Leung ST (2003) The google file system. In: Proceedings of the ACM symposium on operating systems principles (SOSP), pp 29–43
Gopalan P, Huang C, Simitci H, Yekhanin S (2012) On the locality of codewords symbols. IEEE Trans Inf Theory 58(11):6925–6934
Greenan KM, Li X, Wylie JJ (2010) Flat xor-based erasure codes in storage systems: constructions, efficient recovery, and tradeoffs. In: Proceedings of the 26th symposium on mass storage systems and technologies (MSST)
Huang C, Simitci H, Xu Y, Ogus A, Calder B, Gopalan P, Li J, Yekhanin S (2012) Erasure coding in windows azure storage. In: Proceedings of the USENIX annual technical conference (ATC)
Kamath GM, Prakash N, Lalitha V, Vijay Kumar P (2013) Codes with local regeneration. In: Information theory and applications workshop (ITA)
Kermarrec A-M, LeScouarnec N, Straub G (2011) Repairing multiple failures with coordinated and adaptive regenerating codes. In: International symposium on network coding (NetCod)
Khan O, Burns R, Plank JS, Huang C (2011) In search of i/o-optimal recovery from disk failures. In: Proceedings of the 3rd USENIX workshop on hot topics in storage and file systems (HotStorage)
Kubiatowicz J, Bindel D, Chen Y, Czerwinski S, Eaton P, Geels D, Gummadi R, Rhea S, Weatherspoon H, Weimer W, Wells C, Zhao B (2000) Oceanstore: an architecture for global-scale persistent storage. In: The 9th international conference on architectural support for programming languages and operating systems (ASPLOS)
Li J, Yang S, Wang X, Li B (2010) Tree-structured data regeneration in distributed storage systems with regenerating codes. In: The 29th IEEE international conference on computer communications (INFOCOM)
Oggier F, Datta A (2011) Self-repairing codes for distributed storage—a projective geometric construction. In: Information theory workshop (ITW)
Oggier F, Datta A (2013) Coding techniques for repairability in networked distributed storage systems. Foundations and trends in communications and information theory, vol 9. Now Publishers, Delft, The Netherlands
Oggier F, Datta A (2015) Self-repairing codes: local repairability for cheap and fast maintenance of erasure coded data. Computing 97(2):171–201
Pamies-Juarez L, Hollmann HDL, Oggier F (2013) Locally repairable codes with multiple repair alternatives. In: IEEE international symposium on information theory
Papailiopoulos DS, Luo J, Dimakis AG, Huang C, Li J (2011) Simple regenerating codes: network coding for cloud storage. CoRR. arXiv:1109.0264
Papailiopoulos DS, Luo J, Dimakis AG, Huang C, Li J (2012) Simple regenerating codes: network coding for cloud storage. In: The 30th IEEE international conference on computer communications (INFOCOM)
Papailiopoulos DS, Dimakis AG (2012) Locally repairable codes. In: IEEE international symposium on information theory proceedings (ISIT). IEEE, pp 2771–2775
Plank JS (2009) The RAID-6 Liber8Tion code. Int J High Perform Comput Appl 23(3):242–251
Article Google Scholar
Rawat AS, Vishwanath S (2012) On locality in distributed storage systems. In: International workshop on information theory
Shum KW (2011) Cooperative regenerating codes for distributed storage systems. In: IEEE international conference on communications (ICC)
Silberstein N, Rawat AS, Koyluoglu OO, Vishwanath S (2013) Optimal locally repairable codes via rank-metric codes. In: IEEE international symposium on information theory
Venkatesan V, Iliadis I, Hu X-Y, Haas R, Fragouli C (2010) Effect of replica placement on the reliability of large-scale data storage systems. In: The 18th annual IEEE/ACM international symposium on modeling, analysis and simulation of computer and telecommunication systems (MASCOTS)
Wang G, Butt AR, Pandey P, Gupta K (2009) Using realistic simulation for performance analysis of mapreduce setups. In: Proceedings of the 1st ACM workshop on large-scale system and application performance (LSAP)
Weatherspoon H, Kubiatowicz JD (2002) Erasure coding vs. replication: a quantitative comparison. In: The 1st international workshop on peer-to-peer systems (IPTPS)
Weil SA, Leung AW, Brandt SA, Maltzahn C (2007) Rados: a scalable, reliable storage service for petabyte-scale storage clusters. In: Proceedings of the 2nd international workshop on petascale data storage: held in conjunction with supercomputing (PDSW’07)
You GW, Hwang SW, Jain N (2011) Scalable load balancing in cluster storage systems. In: 12th international conference on middleware
Zhang Z, Deshpande A, Ma X, Thereska E, Narayanan D (2010) Does erasure coding have a role to play in my data center? In: Microsoft research MSR-TR-2010-52

Download references

Acknowledgments

This work has been funded by Singapore MoE Tier-2 Grant No. MOE2013-T2-1-068 for the project titled ‘eCode: Erasure Codes for Data-center Environments’.

Author information

Authors and Affiliations

School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
Anwitaman Datta
HGST Research, San Jose, USA
Lluis Pamies-Juarez
School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
Frédérique Oggier

Authors

Anwitaman Datta
View author publications
You can also search for this author in PubMed Google Scholar
Lluis Pamies-Juarez
View author publications
You can also search for this author in PubMed Google Scholar
Frédérique Oggier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anwitaman Datta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Datta, A., Pamies-Juarez, L. & Oggier, F. A study of the performance of novel storage-centric repairable codes. Computing 98, 319–341 (2016). https://doi.org/10.1007/s00607-015-0468-3

Download citation

Received: 24 December 2014
Accepted: 16 July 2015
Published: 29 July 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s00607-015-0468-3

Keywords

Mathematics Subject Classification

94Bxx

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study of the performance of novel storage-centric repairable codes

Abstract

Access this article

Similar content being viewed by others

A discrete heuristic algorithm with swarm and evolutionary features for data replication problem in distributed systems

Cloud storage tier optimization through storage object classification

Serverless Computing: Current Trends and Open Problems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A study of the performance of novel storage-centric repairable codes

Abstract

Access this article

Similar content being viewed by others

A discrete heuristic algorithm with swarm and evolutionary features for data replication problem in distributed systems

Cloud storage tier optimization through storage object classification

Serverless Computing: Current Trends and Open Problems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation