Skip to main content
Log in

Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

High performance computing can be well supported by the Grid or cloud computing systems. However, these systems have to overcome the failure risks, where data is stored in the “unreliable” storage nodes that can leave the system at any moment and the nodes’ network bandwidth is limited. In this case, the basic way to assure data reliability is to add redundancy using either replication or erasure codes. As compared to replication, erasure codes are more space efficient. Erasure codes break data into blocks, encode these blocks and distribute them into different storage nodes. When storage nodes permanently or temporarily abandon the system, new redundant blocks must be created to guarantee the data reliability, which is referred to as repair. Later when the churn nodes rejoin the system, the blocks stored in these nodes can reintegrate the data group to enhance the data reliability. For “classical” erasure codes, generating a new block requires to transmit a number of k blocks over the network, which brings lots of repair traffic, high computation complexity and high failure probability for the repair process. Then a near-optimal erasure code named Hierarchical Codes, has been proposed that can significantly reduce the repair traffic by reducing the number of nodes participating in the repair process, which is referred to as the repair degree d. To overcome the complexity of reintegration and provide an adaptive reliability for Hierarchical Codes, we refine two concepts called location and relocation, and then propose an integrated maintenance scheme for the repair process. Our experiments show that Hierarchical Code is the most robust redundancy scheme for the repair process as compared to other famous coding schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acedacnski, S., Deb, S., Medard, M., Koetter, R.: How good is random linear coding based distributed networked storage? In: NETCOD (2005)

  2. Adya, A., Bolosky, W., Castro, M., Cermak, G., Chaiken, R., Douceur, J., Howell, J., Lorch, J., Theimer, M., Wattenhofer, R.: Farsite: federated, available and reliable storage for an incompletely trusted environment. In: 5th Symposium on OSDI (2002)

  3. Bhagwan, R., Tati, K., Cheng, Y.C., Savage, S., Voelker, G.M.: Total recall: system support for automated availability management. In: NSDI’04: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, p. 25. USENIX Association, Berkeley (2004)

  4. Blake, C., Rodrigues, R.: High availability, scalable storage, dynamic peer networks: pick two. In: HotOs (03) (2003)

  5. Chandra, A., Weissman, J.: Nebulas: using distributed voluntary resources to build clouds. In: Proceedings of the Usenix Workshop on Hot Topics on Cloud Computing (HotCloud’09) (2009)

  6. Chun, B.G., Dabek, F., Haeberlen, A., Sit, E., Weatherspoon, H., Kaashoek, M., Kubiatowicz, J.: Efficient replica maintenance for distributed storage systems. In: 3rd USENIX Symposium on Networked Systems Design and Implementation (2006)

  7. CleverSafe: Official website. http://www.cleversafe.com (2012)

  8. Dabek, F., et al.: Wide-area cooperative storage with CFS. In: Proc. SOSP (2001)

  9. Datta, A., Aberer, K.: Internet-scale storage systems under churn—a study of the steady state using Markov models. In: IEEE International Conference on Peer-to-Peer Computing (P2P) (2006)

  10. Demers, A., Keshav, S., Shenker, S.: Analysis and simulation of a fair queueing algorithm. Journal of Internetworking: Research and Experience (Internetworking) 1, 3–26 (1990)

    Google Scholar 

  11. Deng, Y., Wang, F.: A heterogeneous storage Grid enabled by Grid service. In: Proceedings of SIGOPS Operating System Review, vol. 41, pp. 7–13 (2007)

  12. Dimakis, A.G., Godfrey, B., Wu, Y., Wainwright, M.J., Ramchandran, K.: Network coding for distributed storage systems. Computer Research Repository (CoRR). arXiv:0803.0632v1 (2008)

  13. Dimakis, A.G., Ramchandran, K., Wu, Y., Suh, C.: A survey on network codes for distributed storage. Proc. IEEE 99, 476–489 (2011)

    Article  Google Scholar 

  14. Dischinger, M., Gummadi, K.P., Haeberlen, A., Saroiu, S.: Characterizing residential broadband networks. In: Proc. of ACM IMC (2007)

  15. Druschel, P., Rowstron, A.: PAST: a large-scale, persistent peer-to-peer storage utility. In: Proc. HotOS VIII (2001)

  16. Duminuco, A.: Data redundancy and maintenance for peer-to-peer file backup systems. Ph.D. thesis, TelcomParis (2009)

  17. Duminuco, A., Biersack, E.: A practical study of regenerating codes for peer-to-peer backup systems. In: ICDCS ’09: Proceedings of the 2009 29th IEEE International Conference on Distributed Computing Systems, pp. 376–384. IEEE Computer Society, Washington, DC (2009)

  18. Duminuco, A., Biersack, E.W.: Hierarchical codes: a flexible trade-off for erasure codes. Journal of Peer-to-Peer Networks and Applications (PPNA) 3(1), 52–66 (2010). doi:10.1007/s12083-009-0044-8

    Article  Google Scholar 

  19. Fan, B., Tantisiriroj, W., Xiao, L., Gibson, G.: Diskreduce: raid for data-intensive scalable computing. In: Petascale Data Storage Workshop Supercomputing (PDSW) (2009)

  20. Ford, D., Labelle, F., Popovici, F.I., Stokely, M., Truong, V.A., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: OSDI (2010)

  21. Ghemawat, S., Gobioff, H., Leung, S.: The google file system. In: Proc. 19th ACM Symposium on Operating Systems Principles (SOSP-19) (2003)

  22. Godfrey, B.: Repository of availability traces. http://www.cs.berkeley.edu/~pbg/availability/ (2006)

  23. Haeberlen, A., Mislove, A., Druschel, P.: Glacier: highly durable, decentralized storage despite massive correlated failures. In: NSDI (2005)

  24. Kondo, D., Javadi, B., Iosup, A., Epema, D.: The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems. In: IEEE International Symposium on Cluster Computing and the Grid, pp. 398–407 (2010)

  25. Kubiatowicz, J., et al.: Oceanstore: an architecture for global-scale persistent storage. In: Proc. ASPLOS. ACM (2000)

  26. Lin, W.K., Chiu, D.M., Lee, Y.B.: Erasure code replication revisited. In: P2P ’04: Proceedings of the Fourth International Conference on Peer-to-Peer Computing, pp. 90–97. IEEE Computer Society, Washington, DC (2004)

  27. Miller, C., Butler, P., Shah, A., Butt, A.R.: Peerstripe: a p2p-based large-file storage for desktop Grids. In: Proceedings of the 16th International Symposium on High Performance Distributed Computing, HPDC ’07, pp. 221–222. ACM, New York (2007). doi:10.1145/1272366.1272400

  28. Monteiro, J.G.: Modeling and analysis of reliable peer-to-peer storage systems. Ph.D. thesis, INRIA (2010)

  29. Pamies-Juarez, L., Garcia-Lopez, P., Sanchez-Artigas, M.: Availability and redundancy in harmony: measuring retrieval times in P2P storage systems. In: Proceedings of the 10th IEEE International Conference on Peer-to-Peer Computing (P2P) (2010)

  30. Pitkanen, M., Moussa, R., Swany, M., Niemi, T.: Erasure codes for increasing the availability of Grid data storage. In: International Conference on Internet and Web Applications and Services/Advanced International Conference on Telecommunications. AICT-ICIW ’06, p. 185 (2006)

  31. Rodrigues, R., Liskov, B.: High availability in dhts: Erasure coding vs. replication. In: Peer-to-Peer Systems IV 4th International Workshop IPTPS 2005. Ithaca, New York (2005)

  32. standford.edu. Folding@home: Distributing computing project. http://folding.stanford.edu (2009)

  33. Vazhkudai, S.: Distributed downloads of bulk, replicated Grid data. J. Grid Computing 2, 31–42 (2004). doi:10.1007/s10723-004-5877-z

    Article  MATH  Google Scholar 

  34. Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: Proceedings of IPTPS’02. Cambridge, MA (2002)

  35. Wu, D., Tian, Y., Ng, K.W., Datta, A.: Stochastic analysis of the interplay between object maintenance and churn. Elsevier Computer Communications 31(2), 220–239 (2008)

    Google Scholar 

  36. Wuala: Official website. http://www.wuala.com (2012)

  37. Zhang, Z., Deshpande, A., Ma, X., Thereska, E., Narayanan, D.: Does erasure coding have a role to play in my data center? Microsoft Research Technical Report MSR-TR-2010-52 (2010)

  38. Zhao, H., Liu, X., Li, X.: A taxonomy of peer-to-peer desktop Grid paradigms. Journal of Cluster Computing 14(2), 129–144 (2011). doi:10.1007/s10586-010-0138-3

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Huang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Z., Lin, Y. & Peng, Y. Robust Redundancy Scheme for the Repair Process: Hierarchical Codes in the Bandwidth-Limited Systems. J Grid Computing 10, 579–597 (2012). https://doi.org/10.1007/s10723-012-9221-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-012-9221-8

Keywords

Navigation