Advertisement

EH-Code: An Extended MDS Code to Improve Single Write Performance of Disk Arrays for Correcting Triple Disk Failures

  • Yanbing Jiang
  • Chentao WuEmail author
  • Jie Li
  • Minyi Guo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9528)

Abstract

In the information explosion era, with the sharp increasing requirements of storage devices, concurrent multiple disk failures are not rare. In large data centers, erasure code is one of the most efficient ways to protect user data with low monetary cost. One class of erasure codes is called Maximum Distance Separable (MDS) codes, which aims to offer data protection with minimal storage overhead. However, existing Triple Disk Failure Tolerant arrays (3DFTs) based on MDS codes suffer from low single write performance, because the corresponding codes have high computational cost and low encoding performance. To address this problem, in this paper, we propose a novel MDS coding scheme called EH-Code, which is an extension of H-Code. It has three different parities, horizontal, diagonal and anti-diagonal parities, which can tolerate concurrent disk failures of any triple disks. Our mathematical analysis shows that EH-Code offers optimal storage efficiency and encoding computational complexity. Specifically, compared to STAR code, Triple-Star code and Cauchy-RS codes, EH-Code can improve the single write performance by up to \(16.13\,\%\), \(14.53\,\%\) and \(26.27\,\%\), respectively.

Keywords

RAID Erasure code Triple disk failures MDS Code Performance evaluation 

Notes

Acknowledgments

We thank anonymous reviewers for their insightful comments. This work is partially sponsored by the National 863 Program of China (No. 2015AA015302), the National 973 Program of China No.2015CB352403), the National Natural Science Foundation of China (NSFC) (No. 61332001, No. 61303012, No. 61261160502, No. 61272099, and No. 61572323), the Shanghai Natural Science Foundation (No. 13ZR1421900), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, the EU FP7 CLIMBER project (No. PIRSES-GA-2012-318939), and the CCF-Tencent Open Fund.

References

  1. 1.
    Patterson, D., Gibson, G., Katz, R.: A case for redundant arrays of inexpensive disks (RAID). In: Proceedings of the ACM SIGMOD 1988, Chicago, June 1988Google Scholar
  2. 2.
    Schroeder, B., Gibson, G.: Disk failures in the real world: what does an MTTF of 1, 000, 000 hours mean to you? In: Proceedings of the USENIX FAST 2007, San Jose, February 2007Google Scholar
  3. 3.
    Pinheiro, E., Weber, W., Barroso, L.: Failure trends in a large disk drive population. In: Proceedings of the USENIX FAST 2007, San Jose, February 2007Google Scholar
  4. 4.
    Ma, A., Douglis, F., Lu, G., Sawyer, D., Chandra, S., Hsu, W.: Raidshield: characterizing, monitoring, and proactively protecting against disk failures. In: Proceedings of the USENIX FAST 2015, Santa Clara, February 2015Google Scholar
  5. 5.
    Western Digital Technologies, Inc., Thermal Reliability: Cool-Running WD Hard Drives Demonstrate Exceptional Reliability in High Duty Cycle Environments, August 2005. http://www.wdc.com/wdproducts/library/other/2579-001134.pdf/
  6. 6.
    Wang, Y., Li, G., Zhong, X.: Triple-Star: a coding scheme with optimal encoding complexity for tolerating triple disk failures in RAID. Int. J. Innovative Comput. Inf. Control 8(3), 1731 (2012)Google Scholar
  7. 7.
    Reed, I., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Blomer, J., Kalfane, M., Karp, R., Karpinski, M., Luby, M., Zuckerman, D.: An XOR-based Erasure-Resilient coding scheme. Technical Report TR-95-048, International Computer Science Institute, August 1995Google Scholar
  9. 9.
    Plank, J., Xu, L.: Optimizing cauchy reed-solomon codes for fault-tolerant network storage applications. In: Proceedings of the IEEE NCA 2006, Cambridge, July 2006Google Scholar
  10. 10.
    Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Tau, C., Wang, T.: Efficient parity placement schemes for tolerating triple disk failures in RAID architectures. In: Proceedings of the AINA 2003, Xi’an, March 2003Google Scholar
  12. 12.
    Tang, D., Wang, X., Cao, S., Chen, Z.: A new class of highly fault tolerant erasure code for the disk array. In: Proceedings of the PEITS 2008, Guang Zhou, August 2008Google Scholar
  13. 13.
    Hafner, J.: WEAVER codes: highly fault tolerant erasure codes for storage systems. In Proceedings of the USENIX FAST 2005, San Francisco, December 2005Google Scholar
  14. 14.
    Hafner, J.: HoVer erasure codes for disk arrays. In: Proceedings of the IEEE/IFIP DSN 2006, Philadelphia, June 2006Google Scholar
  15. 15.
    Huang, C., Chen, M., Li, J.: Pyramid codes: flexible schemes to trade space for access efficiency in reliable data storage systems. In: Proceedings of the IEEE NCA 2007, Cambridge, July 2007Google Scholar
  16. 16.
    Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al.: Erasure coding in windows azure storage. In: Proceedings of the USENIX ATC 2012, Boston, June 2012Google Scholar
  17. 17.
    Blaum, M., Hafner, J., Hetzler, S.: Partial-MDS codes and their application to RAID type of architectures. IEEE Trans. Inf. THEORY 59(7), 4510–4519 (2013)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Sathiamoorthy, W., Asteris, M., Papailiopoulos, D., Dimakis, A., Vadali, R., Chen, S., Borthakur, D.: Xoring elephants: novel erasure codes for big data. In: Proceedings of the VLDB 2013, Riva del Garda, August 2013Google Scholar
  19. 19.
    Calder, B., et al.: Windows azure storage: a highly available cloud storage service with strong consistency. In: Proceedings of the ACM SOSP 2011, Cascais, October 2011Google Scholar
  20. 20.
    Ford, D., Labelle, F., Popovici, F., Stokely, M., Truong, V., Barroso, L., Grimes, C., Quinlan, S.: Availability in globally distributed storage systems. In: Proceedings of the USENIX OSDI 2010, Vancouver, October 2010Google Scholar
  21. 21.
    Borthakur, D., Schmidt, R., Vadali, R., Chen, S., Kling, P.: HDFS RAID. In: Hadoop User Group Meeting (2010)Google Scholar
  22. 22.
    Subedi, P., He, X.: A comprehensive analysis of XOR-based erasure codes tolerating 3 or more concurrent failures. In; Proceedings of the IPDPSW 2013, Cambridge, May 2013Google Scholar
  23. 23.
    Wu, C., He, X., Wu, G., Wan, S., Liu, X., Cao, Q., Xie, C.: HDP code: a horizontal-diagonal parity code to optimize I/O load balancing in RAID-6. In; Proceedings of the IEEE/IFIP DSN 2011, Hong Kong, June 2011Google Scholar
  24. 24.
    Stodolsky, D., Gibson, G., Holland, M.: Parity logging overcoming the small write problem in redundant disk arrays. In: Proceedings of the ACM ISCA 1993, San Diego, May 1993Google Scholar
  25. 25.
    Wu, C., Wan, S., He, X., Cao, Q., Xie, C.: H-code: a hybrid MDS array code to optimize partial stripe writes in RAID-6. In: Proceedings of the IPDPS 2011, Anchorage, May 2011Google Scholar
  26. 26.
    Lee, S.S., Lee, B., Koh, K., Bahn, H.: A lifespan-aware reliability scheme for RAID-based flash storage. In: Proceedings of the ACM SAC 2011, TaiChung, March 2011Google Scholar
  27. 27.
    Grawinkel, M., Schafer, T., Brinkmann, A., Hagemeyer, J., Porrmann, M.: Evaluation of applied intra-disk redundancy schemes to improve single disk reliability. In: Proceedings of the IEEE MASCOTS 2011, Singapore, July 2011Google Scholar
  28. 28.
    Luo, X., Shu, J.: Summary of research for erasure code in storage system. J. Comput. Res. Dev. 49(1), 1–11 (2012)MathSciNetGoogle Scholar
  29. 29.
    Plank, J.: The RAID-6 liberation codes. In: Proceedings of the USENIX FAST 2008, San Jose, February 2008Google Scholar
  30. 30.
    Jin, C., Jiang, H., Feng, D., Tian, L.: P-code: a new RAID-6 code with optimal properties. In: Proceedings of the ICS 2009, Yorktown Heights, June 2009Google Scholar
  31. 31.
    Li, M., Lee. P.: Stair codes: a general family of erasure codes for tolerating device and sector failures in practical storage systems. In: Proceedings of the USENIX FAST 2014, Santa Clara, February 2014Google Scholar
  32. 32.
    Plank, J., Luo, J., Schuman, C., Xu, L., Wilcox-O’Hearn, Z., et al.: A performance evaluation and examination of open-source erasure coding libraries for storage. In: Proceedings of the USENIX FAST 2009, San Francisco, February 2009Google Scholar
  33. 33.
    Narayanan, D., Donnelly, A., Rowstron, A.: Write off-loading: practical power management for enterprise storage. ACM Trans. Storage 4(3), 10 (2008)CrossRefGoogle Scholar
  34. 34.
    Goyal, P., Modha, D., Tewari, R.: Cachecow: providing QOS for storage system caches. In: ACM SIGMETRICS Performance Evaluation Review, San Diego (2003)Google Scholar
  35. 35.
    Bucy, J., Schindler, J., Schlosser S., Ganger, G.: The disksim simulation environment version 4.0 reference manual (cmu-pdl-08-101). Parallel Data Laboratory (2008)Google Scholar
  36. 36.
    EMC Corporation. EMC CLARiiON RAID 6 Technology: a detailed review, July 2007. http://www.emc.com/collateral/hardware/white-papers/h2891-clariion-raid-6.pdf

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and EngineeringShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Department of Computer ScienceUniversity of TsukubaTsukubaJapan

Personalised recommendations