Advertisement

P-Schedule: Erasure Coding Schedule Strategy in Big Data Storage System

  • Chao Yin
  • Haitao Lv
  • Tongfang Li
  • Yan Liu
  • Xiaoping Qu
  • Sihao Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11336)

Abstract

Erasure coding technology is one of the key technologies in big data storage system. A well designed erasure coding can not only improve the reliability of the big data storage system, but also greatly improve the performance. Most of the existing big data storage systems use replica strategy, which can provide good availability and real-time, but it has caused a lot of data redundancy and waste of storage space. A large part of the data stored in the storage system exists in the form of cold data. In this paper, we aim at the cold data which doesn’t require highly on data availability and real-time in the big data storage system. We have proposed a scheme to support both replica strategy and coding strategy, and designed the node scheduling and data addressing scheme. We selected Liberation code which is excellent in writing operation, and developed P-Schedule scheme to optimize the decoding speed. Through a series of designs, we can effectively improve the disk utilization and write speed of the cold data in the big data system. The test results show that the sequential write performance of erasure coding is better than that of the replica strategy. The larger the data block is, the better the performance is.

Keywords

Big data Erasure coding Liberation P-Schedule 

Notes

Acknowledgements

This work was supported by National Natural Science Foundation of China (No. 61662038), Science and technology project of Jiangxi Provincial Department of Education (No. GJJ151081), the Visiting Scholar Funds by China Scholarship Council, the JiangXi Association for Science and Technology.

References

  1. 1.
    Morris, R.J.T., Truskowski, B.J.: The evolution of storage systems. IBM Syst. J. 42(2), 205–217 (2003)CrossRefGoogle Scholar
  2. 2.
    Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1–21 (2015)CrossRefGoogle Scholar
  3. 3.
    Schermann, M., Hemsen, H., Buchmüller, C., Bitter, T., Krcmar, H., Markl, V., Hoeren, T.: Big data. Bus. Inf. Syst. Eng. 6(5), 261–266 (2014)CrossRefGoogle Scholar
  4. 4.
    Chen, Y., Chen, H., Gorkhali, A., Lu, Y., Ma, Y., Li, L.: Big data analytics and big data science: a survey. J. Manag. Anal. 3(1), 1–42 (2016)Google Scholar
  5. 5.
    Li, S., Cao, Q., Wan, S., Qian, L., Xie, C.: HRSPC: a hybrid redundancy scheme via exploring computational locality to support fast recovery and high reliability in distributed storage systems. J. Netw. Comput. Appl. (2015)Google Scholar
  6. 6.
    Calder, B., Wang, J., Ogus, A., et al.: Windows Azure storage: a highly available cloud storage service with strong consistency. In: Proceeding of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 143–157 (2011)Google Scholar
  7. 7.
    Chun, B.G., Dabek, F., Haeberlen, A., et al.: Efficient replica maintenance for distributed storage systems. In: Proceedings of NSDI, pp. 225–264 (2006)Google Scholar
  8. 8.
    Chen, P.M., Lee, E.K., Gibson, G.A., et al.: RAID: high-performance, reliable secondary storage. ACM Comput. Surv.–CSUR 26(2), 145–185 (1994)CrossRefGoogle Scholar
  9. 9.
    Corbett, P., English, B., Goel, A., et al.: Row-diagonal parity for double disk failure correction. In: FAST 2004: Proceedings of the 3rd USENIX Conference on File and Storage Technologies, pp. 1–14 (2004)Google Scholar
  10. 10.
    Xiang, L., Xu, Y., Lui, J., et al.: Optimal recovery of single disk failure in RDP code storage systems. In: SIGMETRICS 2010 Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 119–130 (2010)Google Scholar
  11. 11.
    Blaum, M., Brady, J., Bruck, J., et al.: EVENODD: an efficient scheme for tolerating double disk failures in RAID architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)CrossRefGoogle Scholar
  12. 12.
    Huang, C., Xu, L.: STAR: an efficient coding scheme for correcting triple storage node failures. IEEE Trans. Comput. 57(7), 889–901 (2008)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1996)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Rodrigues, R., Liskov, B.: High availability in DHTs: erasure coding vs. replication. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 226–239. Springer, Heidelberg (2005).  https://doi.org/10.1007/11558989_21CrossRefGoogle Scholar
  15. 15.
    Luo, J., Bowers, K.D., Oprea, A., Xu, L.: Efficient software implementations of large finite fields GF(2n) for secure storage applications. ACM Trans. Storage 8(2) (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Chao Yin
    • 1
  • Haitao Lv
    • 1
  • Tongfang Li
    • 1
  • Yan Liu
    • 1
  • Xiaoping Qu
    • 1
  • Sihao Yuan
    • 1
  1. 1.Jiujiang UniversityJiujiangChina

Personalised recommendations