Advertisement

Optimizing the restoration performance of deduplication systems through an energy-saving data layout

  • Fang Yan
  • Xi Yang
  • Jiamou Liu
  • HengLiang Tang
  • Yu-An Tan
  • YuanZhang LiEmail author
Article
  • 40 Downloads

Abstract

While data deduplication is an important data compression technique that removes copies of repeated data to enhance storage utilization, security and privacy risks arise since sensitive or delicate user data are at risk to both insider and outsider attacks. A distinct negative factor to performance of the technique is data fragmentation, which not only slows down the restoration process but also leads to massive power consumption. In this paper, we address this problem from the perspective of data layout. The kernel point of our method is a novel RAID-5-based cross grouping data layout (CGDL). We introduce a selective deduplication algorithm (SDD) to perform data replication and restoration. A new CGDL-based disk scheduling algorithm (LDP) is also proposed that predicts location dependence to save energy by eliminating the redundant disk read/write operations. We evaluate our new method on the Linux MD (multiple device) driver modules. The experiments show that, under a 10 disks 3 groups storage configuration, our method drastically (by 20%) improves restoration efficiency with only 7.6% reduction on the deduplication ratio, while reducing 23% power consumption.

Keywords

Data deduplication Data layout Data restoration Energy saving 

Notes

Funding

This work is supported by the National Key R&D Program of China (no. 2018YFB1004402), the Beijing Municipal Natural Science Foundation (no. 4172053), the National Natural Science Foundation of China (no. U1636213), and China State Key Laboratory of Virtual Reality Technology and Systems (2016–2018) .

References

  1. 1.
    Yinjin F, Non X, Fang L (2012) Research and development on key techniques of data deduplication. J Comput Res Development 49(1):12–20Google Scholar
  2. 2.
    Fang Y, YuAn T, QuanXin Z et al (2016) An effective RAID data layout for object-based de-duplication backup system. Chin J Electron 25(5):832–840CrossRefGoogle Scholar
  3. 3.
    Wen X, Hong J, Dan F (2015) Similarity and locality based indexing for high performance data deduplication. IEEE Trans Comput 64(4):1–10MathSciNetzbMATHGoogle Scholar
  4. 4.
    Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Computing 20(4):1437–1448CrossRefGoogle Scholar
  5. 5.
    Min F, Dan F, Yu H et al (2015) Design tradeoffs for data deduplication performance in backup workloads. In: Proceedings of the 13th USENIX conference on file and storage techonogies, Santa Clara, CA, pp 331–344Google Scholar
  6. 6.
    Xiao Y, Yu-an T, Zhizhuo S et al (2018) A fault-tolerant and energy-efficient continuous data protection system. Journal of Ambient Intelligence and Humanized Computing.  https://doi.org/10.1007/s12652-018-0726-2
  7. 7.
    Eshghi K, Tang HK (2005) A framework for analyzing and improving content-based chunking algorithms. Technical Report HPL-2005-30(R, vol 1. Hewlett Packard Laboratories, Palo AltoGoogle Scholar
  8. 8.
    Srinivasan K, Bisson T, Goodson G et al (2012) iDedup: Latency-aware, inline data deduplication for primary storage. In: Proceedings Of the 10th USENIX conference on file and storage technologies. San Jose, CA, pp 299–312Google Scholar
  9. 9.
    Jin NY, Dongchul P, HC DD (2012) Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 20th IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication systems, Washington, DC, USA, pp 201–208Google Scholar
  10. 10.
    Kaczmarczyk M, Barczynski M, Kilian W et al (2012) Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th annual international systems and storage conference, Haifa, Israel, pp 1–12Google Scholar
  11. 11.
    Lillibridge M, Eshghi K, Bhagwat D (2013) Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings Of the 11th USENIX conference on file and storage technologies, San Jose, CA, pp 183–189Google Scholar
  12. 12.
    Kaczmarczyk M, Dubnicki C (2015) Reducing fragmentation impact with forward knowledge in backup systems with deduplication. In: Proceedings of the 8th ACM international systems and storage conference, Haifa, Israel, 1–12Google Scholar
  13. 13.
    Ng C-H, Lee PPC (2013) RevDedup: A reverse deduplication storage system optimized for reads to latest backups. In: Proceedings of the 4th Asia-Pacific workshop on systems, Singapore, pp 1–18Google Scholar
  14. 14.
    Bo M, Hong J, SuZhen W et al (2012) SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In: Proceedings of the 7th international conference on networking, architecture and storages, Xiamen, Fujian, China, pp 328–337Google Scholar
  15. 15.
    Jian L, YunPeng C, Chang Y et al (2016) A delayed container organization approach to improve restore speed for deduplication systems. IEEE Trans Parallel Distrib Syst 27(9):2477–2491CrossRefGoogle Scholar
  16. 16.
    JingLi Z, XueJun N, LeiHua Q et al (2011) Optimization for data de-duplication algorithm based on storage environment aware. Comput Sci 38(2):308–316Google Scholar
  17. 17.
    Gracia-Tinedo R, Sànchez-Artigas M, García-López P (2014) eWave: Leveraging energy-awareness for in-line deduplication clusters. In: Proceedings of the 2014 international conference on systems and storage, Haifa, Israel, pp 1–11Google Scholar
  18. 18.
    Zhizhuo S, Quanxin Z, Yuanzhan L et al (2018) DPPDL: A dynamic partial-parallel data layout for green video surveillance storage. IEEE Trans Circuits Syst Video Technol 28(1):193–205Google Scholar
  19. 19.
    Xiao Y, Chang-you Z, Yuan X et al (2018) An extra-parity energy saving data layout for video surveillance?. Multimed Tools Appl 77(1):4563–4583CrossRefGoogle Scholar
  20. 20.
    Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Comput 20(4):1437–1448CrossRefGoogle Scholar
  21. 21.
    Lin W, Xu S, Li J, Xu L, Peng Z (2017) Design and theoretical analysis of virtual machine placement algorithm based on peak workload characteristics. Soft Comput 21(5):1301–1314CrossRefzbMATHGoogle Scholar
  22. 22.
    Liang C, Tan Y-A, Zhang X, Wang X, Zheng J, Zhang Q (2018) Building packet length covert channel over mobile VoIP traffics. J Netw Comput Appl 118:144–153CrossRefGoogle Scholar
  23. 23.
    Guan Z, Zhang Y, Wu L, Wu J, Ma Y, Hu J (2019) APPA: An anonymous and privacy preserving data aggregation scheme for fog-enhanced IoT. J Netw Comput Appl 125:82–92CrossRefGoogle Scholar
  24. 24.
    Liang C, Wang X, Zhang X, Zhang Y, Sharif K, Tan Y-A (2018) A payload-dependent packet rearranging covert channel for mobile VoIP traffic. Inform Sci 465:162–173CrossRefGoogle Scholar
  25. 25.
    Lin W, Xu S, He L, Li J (2017) Multi-resource scheduling and power simulation for cloud computing. Inf Sc 397:168–186CrossRefGoogle Scholar
  26. 26.
    Guan Z, Zhang Y, Zhu L, Wu L, Yu S (2019) Effect: An efficient flexible privacy-preserving data aggregation scheme with authentication in smart grid. Science China Information Sciences.  https://doi.org/10.1007/s11432-018-9451-y
  27. 27.
    Tan Y-A, Xue Y, Liang C, Zheng J, Zhang Q, Zheng J, Li Y (2018) A root privilege management scheme with revocable authorization for Android devices. J Netw Comput Appl 107(4):69–82CrossRefGoogle Scholar
  28. 28.
    Zhang X, Zhu L, Wang X, Zhang C, Zhu H, Tan Y-A (2019) A packet-reordering covert channel over VoLTE voice and video traffics. J Netw Comput Appl 126:29–38CrossRefGoogle Scholar
  29. 29.
    Li Y, Hu J, Wu Z, Liu C, Peng F, Zhang Y (2018) Research on QoS service composition based on coevolutionary genetic algorithm. Soft Comput 22(23):7865–7874CrossRefzbMATHGoogle Scholar
  30. 30.
    Zhang Q, Wang X, Yuan J, Liu L, Wang R, Huang H, Li Y (2019) A hierarchical group key agreement protocol using orientable attributes for cloud computing. Inform Sci 480:55–69CrossRefGoogle Scholar
  31. 31.
    Tan Y-A, Zhang X, Sharif K, Liang C, Zhang Q, Li Y (2018) Covert timing channels for IoT over mobile networks. IEEE Wirel Commun 25(6):38–44CrossRefGoogle Scholar

Copyright information

© Institut Mines-Télécom and Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Fang Yan
    • 1
    • 2
  • Xi Yang
    • 1
  • Jiamou Liu
    • 2
  • HengLiang Tang
    • 1
  • Yu-An Tan
    • 3
  • YuanZhang Li
    • 3
    Email author
  1. 1.Information SchoolBeijing Wuzi UniversityBeijingChina
  2. 2.Department of Computer ScienceThe University of AucklandAucklandNew Zealand
  3. 3.Department of Computer ScienceBeijing Institute of TechnologyBeijingChina

Personalised recommendations