Skip to main content

Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation

  • Conference paper
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 159))

Abstract

This paper focuses on the solution to overcome the problems caused due to physically scattered chunks of data. Fragmentation can occur in the form of sparse containers or containers that are not in order. Restore speed and garbage collection efficiency are compromised due to these containers. The disordered container triggers decline in restore speed owing to the decrease in restore cache. The idea of diminishing fragmentation is showcased by the proposal of History-Aware Rewriting (HAR) algorithm. HAR uses some of the historical information of the backups that have occurred to recognize and reduce sparse containers. Each of the chunks is given a unique hash code by the hash code generator Message Digest 5 (MD5). The logical block address is used to merge all the blocks and obtain the original single file. The Data Encryption Standard (DES) is used to generate a secret key file which is given to the user when the user is created by the data owner. Collectively using the above-mentioned algorithms, the proposed system aims to minimize fragmentation problem for in-line deduplication backup storage system. The amount of improvement of restore performance will depend on the amount of duplicate data. Simulation results show that if the same data is uploaded twice, the write performance rises up to 80% and further rises up to 90% for third instance of same data. This value varies in accordance with the deduplication technique. In case of no duplicate data at all, this model does not affect the system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of USENIX FAST (2013)

    Google Scholar 

  2. Nam, Y., Lu, G., Park, N., Xiao, W., Du, D.H.: Chunk fragmentation level: an effective indicator for read performance degradation in deduplication storage. In: Proceedings of IEEE High Performance Computing and Communications, pp. 581–586 (2013)

    Google Scholar 

  3. Mao, B., Jiang, H., Wu, S., Fu, Y., Tian, L.: Read performance optimization for deduplication-based storage systems in the cloud. ACM Trans. Storage 10(2), 6:1–6:22 (2014)

    Article  Google Scholar 

  4. Lai, R., Hua, Y., Feng, D., Xia, W., Fu, M., Yang, Y.: A Near-Exact Defragmentation Scheme to Improve Restore Performance for Cloud Backup Systems. Springer International Publishing Switzerland (2014)

    Google Scholar 

  5. Li, Y.-K., Xu, M., Ng, C.-H., Lee, P.P.C.: Efficient hybrid inline and out-of-line deduplication for backup storage. ACM Trans. Storage 11(1), 2:1–2:21 (2014)

    Article  Google Scholar 

  6. Wei, J., Jiang, H., Zhou, K., Feng, D.: MAD2: scalable high throughput exact deduplication approach for network backup services. In: Proceedings of IEEE Mass Storage Systems Technology, pp. 1–14 (2010)

    Google Scholar 

  7. Guo, F., Efstathopoulos, P.: Building a high performance deduplication system. In Proceedings of USENIX ATC (2011)

    Google Scholar 

  8. Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Liu, J., Liu, Q.: Reducing Fragmentation for In-line Deduplication Backup Storage via Exploiting Backup History and Cache Knowledge. IEEE Trans. Parallel Distrib. Syst. 27(3) (2016)

    Article  Google Scholar 

  9. Meister, D., Brinkmann, A., Suß, T.: File recipe compression in data deduplication systems. In Proceedings of USENIX FAST (2013)

    Google Scholar 

  10. DuBois, L., Amaldas, M., Sheppard, E.: Key considerations as deduplication evolves into primary storage. White Paper 223310 (2011)

    Google Scholar 

  11. Nam, Y.J., Park, D., Du, D.H.: Assuring demanded read performance of data deduplication storage with backup datasets. In Proceedings of IEEE MASCOTS, pp. 201–208 (2012)

    Google Scholar 

  12. Botelho, F.C., Shilane, P., Garg, N., Hsu, W.: Memory efficient sanitization of a deduplicated storage system. In Proceedings of USENIX FAST (2013)

    Google Scholar 

  13. Lin, X., Lu, G., Douglis, F., Shilane, P., Wallace, G.: Migratory compression: coarse-grained data reordering to improve compressibility. In: Proceedings of USENIX FAST (2014)

    Google Scholar 

  14. Fu, M., Feng, D., Hua, Y., He, X., Chen, Z., Xia, W., Zhang, Y., Tan, Y.: Design tradeoffs for data deduplication tradeoffs in backup workloads. In: Proceedings USENIX FAST (2015)

    Google Scholar 

  15. Machine learning dataset download. http://archive.ics.uci.edu/ml/machine-learning-databases/00217/C50.zip

  16. Machine learning dataset download. http://archive.ics.uci.edu/ml/machine-learning-databases/00311/SentenceCorpus.zip

  17. GitHub Alphabet Recognition datasets download. https://github.com/MinhasKamal/AlphabetRecognizer/tree/master/src/res/trainingData/10

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Gayathri Devi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gayathri Devi, K., Raksha, S., Sooda, K. (2020). Enhancing Restore Speed of In-line Deduplication Cloud-Based Backup Systems by Minimizing Fragmentation. In: Satapathy, S., Bhateja, V., Mohanty, J., Udgata, S. (eds) Smart Intelligent Computing and Applications . Smart Innovation, Systems and Technologies, vol 159. Springer, Singapore. https://doi.org/10.1007/978-981-13-9282-5_2

Download citation

Publish with us

Policies and ethics