Optimizing the restoration performance of deduplication systems through an energy-saving data layout

Abstract

While data deduplication is an important data compression technique that removes copies of repeated data to enhance storage utilization, security and privacy risks arise since sensitive or delicate user data are at risk to both insider and outsider attacks. A distinct negative factor to performance of the technique is data fragmentation, which not only slows down the restoration process but also leads to massive power consumption. In this paper, we address this problem from the perspective of data layout. The kernel point of our method is a novel RAID-5-based cross grouping data layout (CGDL). We introduce a selective deduplication algorithm (SDD) to perform data replication and restoration. A new CGDL-based disk scheduling algorithm (LDP) is also proposed that predicts location dependence to save energy by eliminating the redundant disk read/write operations. We evaluate our new method on the Linux MD (multiple device) driver modules. The experiments show that, under a 10 disks 3 groups storage configuration, our method drastically (by 20%) improves restoration efficiency with only 7.6% reduction on the deduplication ratio, while reducing 23% power consumption.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. 1.

    Yinjin F, Non X, Fang L (2012) Research and development on key techniques of data deduplication. J Comput Res Development 49(1):12–20

    Google Scholar 

  2. 2.

    Fang Y, YuAn T, QuanXin Z et al (2016) An effective RAID data layout for object-based de-duplication backup system. Chin J Electron 25(5):832–840

    Article  Google Scholar 

  3. 3.

    Wen X, Hong J, Dan F (2015) Similarity and locality based indexing for high performance data deduplication. IEEE Trans Comput 64(4):1–10

    MathSciNet  MATH  Google Scholar 

  4. 4.

    Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Computing 20(4):1437–1448

    Article  Google Scholar 

  5. 5.

    Min F, Dan F, Yu H et al (2015) Design tradeoffs for data deduplication performance in backup workloads. In: Proceedings of the 13th USENIX conference on file and storage techonogies, Santa Clara, CA, pp 331–344

  6. 6.

    Xiao Y, Yu-an T, Zhizhuo S et al (2018) A fault-tolerant and energy-efficient continuous data protection system. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-018-0726-2

  7. 7.

    Eshghi K, Tang HK (2005) A framework for analyzing and improving content-based chunking algorithms. Technical Report HPL-2005-30(R, vol 1. Hewlett Packard Laboratories, Palo Alto

    Google Scholar 

  8. 8.

    Srinivasan K, Bisson T, Goodson G et al (2012) iDedup: Latency-aware, inline data deduplication for primary storage. In: Proceedings Of the 10th USENIX conference on file and storage technologies. San Jose, CA, pp 299–312

  9. 9.

    Jin NY, Dongchul P, HC DD (2012) Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 20th IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication systems, Washington, DC, USA, pp 201–208

  10. 10.

    Kaczmarczyk M, Barczynski M, Kilian W et al (2012) Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th annual international systems and storage conference, Haifa, Israel, pp 1–12

  11. 11.

    Lillibridge M, Eshghi K, Bhagwat D (2013) Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings Of the 11th USENIX conference on file and storage technologies, San Jose, CA, pp 183–189

  12. 12.

    Kaczmarczyk M, Dubnicki C (2015) Reducing fragmentation impact with forward knowledge in backup systems with deduplication. In: Proceedings of the 8th ACM international systems and storage conference, Haifa, Israel, 1–12

  13. 13.

    Ng C-H, Lee PPC (2013) RevDedup: A reverse deduplication storage system optimized for reads to latest backups. In: Proceedings of the 4th Asia-Pacific workshop on systems, Singapore, pp 1–18

  14. 14.

    Bo M, Hong J, SuZhen W et al (2012) SAR: SSD assisted restore optimization for deduplication-based storage systems in the cloud. In: Proceedings of the 7th international conference on networking, architecture and storages, Xiamen, Fujian, China, pp 328–337

  15. 15.

    Jian L, YunPeng C, Chang Y et al (2016) A delayed container organization approach to improve restore speed for deduplication systems. IEEE Trans Parallel Distrib Syst 27(9):2477–2491

    Article  Google Scholar 

  16. 16.

    JingLi Z, XueJun N, LeiHua Q et al (2011) Optimization for data de-duplication algorithm based on storage environment aware. Comput Sci 38(2):308–316

    Google Scholar 

  17. 17.

    Gracia-Tinedo R, Sànchez-Artigas M, García-López P (2014) eWave: Leveraging energy-awareness for in-line deduplication clusters. In: Proceedings of the 2014 international conference on systems and storage, Haifa, Israel, pp 1–11

  18. 18.

    Zhizhuo S, Quanxin Z, Yuanzhan L et al (2018) DPPDL: A dynamic partial-parallel data layout for green video surveillance storage. IEEE Trans Circuits Syst Video Technol 28(1):193–205

    Google Scholar 

  19. 19.

    Xiao Y, Chang-you Z, Yuan X et al (2018) An extra-parity energy saving data layout for video surveillance?. Multimed Tools Appl 77(1):4563–4583

    Article  Google Scholar 

  20. 20.

    Li X, Li J, Huang F (2016) A secure cloud storage system supporting privacy-preserving fuzzy deduplication. Soft Comput 20(4):1437–1448

    Article  Google Scholar 

  21. 21.

    Lin W, Xu S, Li J, Xu L, Peng Z (2017) Design and theoretical analysis of virtual machine placement algorithm based on peak workload characteristics. Soft Comput 21(5):1301–1314

    Article  MATH  Google Scholar 

  22. 22.

    Liang C, Tan Y-A, Zhang X, Wang X, Zheng J, Zhang Q (2018) Building packet length covert channel over mobile VoIP traffics. J Netw Comput Appl 118:144–153

    Article  Google Scholar 

  23. 23.

    Guan Z, Zhang Y, Wu L, Wu J, Ma Y, Hu J (2019) APPA: An anonymous and privacy preserving data aggregation scheme for fog-enhanced IoT. J Netw Comput Appl 125:82–92

    Article  Google Scholar 

  24. 24.

    Liang C, Wang X, Zhang X, Zhang Y, Sharif K, Tan Y-A (2018) A payload-dependent packet rearranging covert channel for mobile VoIP traffic. Inform Sci 465:162–173

    Article  Google Scholar 

  25. 25.

    Lin W, Xu S, He L, Li J (2017) Multi-resource scheduling and power simulation for cloud computing. Inf Sc 397:168–186

    Article  Google Scholar 

  26. 26.

    Guan Z, Zhang Y, Zhu L, Wu L, Yu S (2019) Effect: An efficient flexible privacy-preserving data aggregation scheme with authentication in smart grid. Science China Information Sciences. https://doi.org/10.1007/s11432-018-9451-y

  27. 27.

    Tan Y-A, Xue Y, Liang C, Zheng J, Zhang Q, Zheng J, Li Y (2018) A root privilege management scheme with revocable authorization for Android devices. J Netw Comput Appl 107(4):69–82

    Article  Google Scholar 

  28. 28.

    Zhang X, Zhu L, Wang X, Zhang C, Zhu H, Tan Y-A (2019) A packet-reordering covert channel over VoLTE voice and video traffics. J Netw Comput Appl 126:29–38

    Article  Google Scholar 

  29. 29.

    Li Y, Hu J, Wu Z, Liu C, Peng F, Zhang Y (2018) Research on QoS service composition based on coevolutionary genetic algorithm. Soft Comput 22(23):7865–7874

    Article  MATH  Google Scholar 

  30. 30.

    Zhang Q, Wang X, Yuan J, Liu L, Wang R, Huang H, Li Y (2019) A hierarchical group key agreement protocol using orientable attributes for cloud computing. Inform Sci 480:55–69

    Article  Google Scholar 

  31. 31.

    Tan Y-A, Zhang X, Sharif K, Liang C, Zhang Q, Li Y (2018) Covert timing channels for IoT over mobile networks. IEEE Wirel Commun 25(6):38–44

    Article  Google Scholar 

Download references

Funding

This work is supported by the National Key R&D Program of China (no. 2018YFB1004402), the Beijing Municipal Natural Science Foundation (no. 4172053), the National Natural Science Foundation of China (no. U1636213), and China State Key Laboratory of Virtual Reality Technology and Systems (2016–2018) .

Author information

Affiliations

Authors

Corresponding author

Correspondence to YuanZhang Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yan, F., Yang, X., Liu, J. et al. Optimizing the restoration performance of deduplication systems through an energy-saving data layout. Ann. Telecommun. 74, 461–471 (2019). https://doi.org/10.1007/s12243-019-00711-z

Download citation

Keywords

  • Data deduplication
  • Data layout
  • Data restoration
  • Energy saving