Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

Huang, Zhen; Lin, Yisong; Peng, Yuxing

doi:10.1007/s10723-012-9221-8

Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

Published: 03 August 2012

Volume 10, pages 579–597, (2012)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

Zhen Huang¹,
Yisong Lin² &
Yuxing Peng¹

136 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

High performance computing can be well supported by the Grid or cloud computing systems. However, these systems have to overcome the failure risks, where data is stored in the “unreliable” storage nodes that can leave the system at any moment and the nodes’ network bandwidth is limited. In this case, the basic way to assure data reliability is to add redundancy using either replication or erasure codes. As compared to replication, erasure codes are more space efficient. Erasure codes break data into blocks, encode these blocks and distribute them into different storage nodes. When storage nodes permanently or temporarily abandon the system, new redundant blocks must be created to guarantee the data reliability, which is referred to as repair. Later when the churn nodes rejoin the system, the blocks stored in these nodes can reintegrate the data group to enhance the data reliability. For “classical” erasure codes, generating a new block requires to transmit a number of k blocks over the network, which brings lots of repair traffic, high computation complexity and high failure probability for the repair process. Then a near-optimal erasure code named Hierarchical Codes, has been proposed that can significantly reduce the repair traffic by reducing the number of nodes participating in the repair process, which is referred to as the repair degree d. To overcome the complexity of reintegration and provide an adaptive reliability for Hierarchical Codes, we refine two concepts called location and relocation, and then propose an integrated maintenance scheme for the repair process. Our experiments show that Hierarchical Code is the most robust redundancy scheme for the repair process as compared to other famous coding schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Article 12 April 2024

Flexible fingerprint cuckoo filter for information retrieval optimization in distributed network

Article 11 April 2024

Constacyclic locally recoverable codes from their duals

Article Open access 18 April 2024

References

Acedacnski, S., Deb, S., Medard, M., Koetter, R.: How good is random linear coding based distributed networked storage? In: NETCOD (2005)
Adya, A., Bolosky, W., Castro, M., Cermak, G., Chaiken, R., Douceur, J., Howell, J., Lorch, J., Theimer, M., Wattenhofer, R.: Farsite: federated, available and reliable storage for an incompletely trusted environment. In: 5th Symposium on OSDI (2002)
Bhagwan, R., Tati, K., Cheng, Y.C., Savage, S., Voelker, G.M.: Total recall: system support for automated availability management. In: NSDI’04: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, p. 25. USENIX Association, Berkeley (2004)
Blake, C., Rodrigues, R.: High availability, scalable storage, dynamic peer networks: pick two. In: HotOs (03) (2003)
Chandra, A., Weissman, J.: Nebulas: using distributed voluntary resources to build clouds. In: Proceedings of the Usenix Workshop on Hot Topics on Cloud Computing (HotCloud’09) (2009)
Chun, B.G., Dabek, F., Haeberlen, A., Sit, E., Weatherspoon, H., Kaashoek, M., Kubiatowicz, J.: Efficient replica maintenance for distributed storage systems. In: 3rd USENIX Symposium on Networked Systems Design and Implementation (2006)
CleverSafe: Official website. http://www.cleversafe.com (2012)
Dabek, F., et al.: Wide-area cooperative storage with CFS. In: Proc. SOSP (2001)
Datta, A., Aberer, K.: Internet-scale storage systems under churn—a study of the steady state using Markov models. In: IEEE International Conference on Peer-to-Peer Computing (P2P) (2006)
Demers, A., Keshav, S., Shenker, S.: Analysis and simulation of a fair queueing algorithm. Journal of Internetworking: Research and Experience (Internetworking) 1, 3–26 (1990)
Google Scholar
Deng, Y., Wang, F.: A heterogeneous storage Grid enabled by Grid service. In: Proceedings of SIGOPS Operating System Review, vol. 41, pp. 7–13 (2007)
Dimakis, A.G., Godfrey, B., Wu, Y., Wainwright, M.J., Ramchandran, K.: Network coding for distributed storage systems. Computer Research Repository (CoRR). arXiv:0803.0632v1 (2008)
Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network codes for distributed storage. Proc. IEEE 99, 476–489 (2011)
Article Google Scholar
Dischinger, M., Gummadi, K.P., Haeberlen, A., Saroiu, S.: Characterizing residential broadband networks. In: Proc. of ACM IMC (2007)
Druschel, P., Rowstron, A.: PAST: a large-scale, persistent peer-to-peer storage utility. In: Proc. HotOS VIII (2001)
Duminuco, A.: Data redundancy and maintenance for peer-to-peer file backup systems. Ph.D. thesis, TelcomParis (2009)
Duminuco, A., Biersack, E.: A practical study of regenerating codes for peer-to-peer backup systems. In: ICDCS ’09: Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, pp. 376–384. IEEE Computer Society, Washington, DC (2009)
Duminuco, A., Biersack, E.W.: Hierarchical codes: a flexible trade-off for erasure codes. Journal of Peer-to-Peer Networks and Applications (PPNA) 3(1), 52–66 (2010). doi:10.1007/s12083-009-0044-8
Article Google Scholar
Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: Diskreduce: raid for data-intensive scalable computing. In: Petascale Data Storage Workshop Supercomputing (PDSW) (2009)
Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: OSDI (2010)
Ghemawat, S., Gobioff, H., Leung, S.: The google file system. In: Proc. 19th ACM Symposium on Operating Systems Principles (SOSP-19) (2003)
Godfrey, B.: Repository of availability traces. http://www.cs.berkeley.edu/~pbg/availability/ (2006)
Haeberlen, A., Mislove, A., Druschel, P.: Glacier: highly durable, decentralized storage despite massive correlated failures. In: NSDI (2005)
Kondo, D., Javadi, B., Iosup, A., Epema, D.: The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. In: IEEE International Symposium on Cluster Computing and the Grid, pp. 398–407 (2010)
Kubiatowicz, J., et al.: Oceanstore: an architecture for global-scale persistent storage. In: Proc. ASPLOS. ACM (2000)
Lin, W.K., Chiu, D.M., Lee, Y.B.: Erasure code replication revisited. In: P2P ’04: Proceedings of the Fourth International Conference on Peer-to-Peer Computing, pp. 90–97. IEEE Computer Society, Washington, DC (2004)
Miller, C., Butler, P., Shah, A., Butt, A.R.: Peerstripe: a p2p-based large-file storage for desktop Grids. In: Proceedings of the 16th International Symposium on High Performance Distributed Computing, HPDC ’07, pp. 221–222. ACM, New York (2007). doi:10.1145/1272366.1272400
Monteiro, J.G.: Modeling and analysis of reliable peer-to-peer storage systems. Ph.D. thesis, INRIA (2010)
Pamies-Juarez, L., Garcia-Lopez, P., Sanchez-Artigas, M.: Availability and redundancy in harmony: measuring retrieval times in P2P storage systems. In: Proceedings of the 10th IEEE International Conference on Peer-to-Peer Computing (P2P) (2010)
Pitkanen, M., Moussa, R., Swany, M., Niemi, T.: Erasure codes for increasing the availability of Grid data storage. In: International Conference on Internet and Web Applications and Services/Advanced International Conference on Telecommunications. AICT-ICIW ’06, p. 185 (2006)
Rodrigues, R., Liskov, B.: High availability in dhts: Erasure coding vs. replication. In: Peer-to-Peer Systems IV 4th International Workshop IPTPS 2005. Ithaca, New York (2005)
standford.edu. Folding@home: Distributing computing project. http://folding.stanford.edu (2009)
Vazhkudai, S.: Distributed downloads of bulk, replicated Grid data. J. Grid Computing 2, 31–42 (2004). doi:10.1007/s10723-004-5877-z
Article MATH Google Scholar
Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: Proceedings of IPTPS’02. Cambridge, MA (2002)
Wu, D., Tian, Y., Ng, K.W., Datta, A.: Stochastic analysis of the interplay between object maintenance and churn. Elsevier Computer Communications 31(2), 220–239 (2008)
Google Scholar
Wuala: Official website. http://www.wuala.com (2012)
Zhang, Z., Deshpande, A., Ma, X., Thereska, E., Narayanan, D.: Does erasure coding have a role to play in my data center? Microsoft Research Technical Report MSR-TR-2010-52 (2010)
Zhao, H., Liu, X., Li, X.: A taxonomy of peer-to-peer desktop Grid paradigms. Journal of Cluster Computing 14(2), 129–144 (2011). doi:10.1007/s10586-010-0138-3
Article Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory of Parallel and Distributed Processing, Department of Computer, National University of Defense Technology, 410073, Changsha, Hunan, China
Zhen Huang & Yuxing Peng
Logistics Science and Research Institute, GLD, 100071, Beijing, China
Yisong Lin

Authors

Zhen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yisong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Yuxing Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Z., Lin, Y. & Peng, Y. Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems. J Grid Computing 10, 579–597 (2012). https://doi.org/10.1007/s10723-012-9221-8

Download citation

Received: 30 September 2011
Accepted: 05 July 2012
Published: 03 August 2012
Issue Date: September 2012
DOI: https://doi.org/10.1007/s10723-012-9221-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

Abstract

Access this article

Similar content being viewed by others

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Flexible fingerprint cuckoo filter for information retrieval optimization in distributed network

Constacyclic locally recoverable codes from their duals

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

Abstract

Access this article

Similar content being viewed by others

Accelerating erasure coding by exploiting multiple repair paths in distributed storage systems

Flexible fingerprint cuckoo filter for information retrieval optimization in distributed network

Constacyclic locally recoverable codes from their duals

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation