Towards Optimizing Deduplication on Persistent Memory

Li, Yichen; He, Kewen; Wang, Gang; Liu, Xiaoguang

doi:10.1007/978-3-030-79478-1_39

Yichen Li¹¹,
Kewen He¹¹,
Gang Wang¹¹ &
…
Xiaoguang Liu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12639))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1152 Accesses

Abstract

Data deduplication is an effective method to reduce data storage requirements. In data deduplication process, fingerprint identification may cause frequent on-disk fingerprint lookups which hurt performance seriously. Some locality-aware approaches were proposed to tackle this issue. Recently, the Persistent Memory (PM) brings low latency and high bandwidth, and has become a hotspot in data storage. Deduplication systems with fingerprints stored on PM will provide extremely fast on-disk fingerprint lookup, and therefore traditional locality-aware approaches designed for slow devices are likely no longer valid.

In this paper, we model the traditional locality-aware approaches and analyze their performance on PM. Inspired by the analysis, we propose an optimized PM-based fingerprint identification scheme in which the fingerprint cache is replaced with a simple, low-cost read buffer, and the order of the Bloom filter and the read buffer is swapped. The experimental results on real PM devices show that, compared with the traditional locality-aware approaches, the proposed scheme improves the fingerprint identification throughput by 1.2–2.3 times.

This work is partially supported by National Science Foundation of China (U1833114, 61872201, 61702521); Science and Technology Development Plan of Tianjin (18ZXZNGX00140, 18ZXZNGX00200).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Zhu, B., Li, K., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In Fast 8, 1–14 (2008)
Google Scholar
Xia, W., Jiang, H., Feng, D., Hua, Y.: SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In USENIX ATC, pp. 26–30 (2011)
Google Scholar
Bhagwat, D., Eshghi, K., Long, D.D., Lillibridge, M.: Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the MASCOTS 2009, pp. 1–9. IEEE (2009)
Google Scholar
Debnath, B.K., Sengupta, S., Li, J.: ChunkStash: speeding up inline storage deduplication using flash memory. In: USENIX ATC, pp. 1–16 (2010)
Google Scholar
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezis, G., Camble, P.: Sparse indexing: large scale, inline deduplication using sampling and locality. In Fast 9, 111–123 (2009)
Google Scholar
Ma, J., Stones, R.J., Ma, Y., Wang, J., Ren, J., Wang, G., Liu, X.: Lazy exact deduplication. ACM Trans. Storage (TOS) 13(2), 1–26 (2017)
Article Google Scholar
Meister, D., Kaiser, J., Brinkmann, A.: Block locality caching for data deduplication. In: Proceedings of the Fast., pp. 1–12 (2013)
Google Scholar
Yang, J., et al.: An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the FAST (2020)
Google Scholar
Rudoff, A.: Persistent memory programming. Login: Usenix Mag. 42(2), 34–40 (2017)
Google Scholar
Xu, J., Swanson, S.: NOVA: a log-structured file system for hybrid volatile/non-volatile main memories. In: 14th USENIX Conference on File and Storage Technologies (FAST 2016), pp. 323–338 (2016)
Google Scholar
Nam, M., Cha, H., Choi, Y. R., Noh, S. H., Nam, B.: Write-optimized dynamic hashing for persistent memory. In: 17th USENIX Conference on File and Storage Technologies (FAST 2019), pp. 31–44 (2019)
Google Scholar
Lepers, B., Balmau, O., Gupta, K., Zwaenepoel, W.: KVell: the design and implementation of a fast persistent key-value store. In: Proceedings of the 27th ACM SOSP, pp. 447–461 (2019)
Google Scholar
Beeler, B.: Intel optane dc persistent memory module (pmm) (2019)
Google Scholar
Wang, C., et al.: Nv-dedup: high-performance inline deduplication for non-volatile memory. IEEE Trans. Comput. 67(5), 658–671 (2017)
Article MathSciNet Google Scholar
Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: USENIX ATC, pp. 261–272 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Nankai University, Tianjin, China
Yichen Li, Kewen He, Gang Wang & Xiaoguang Liu

Authors

Yichen Li
View author publications
You can also search for this author in PubMed Google Scholar
Kewen He
View author publications
You can also search for this author in PubMed Google Scholar
Gang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoguang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Gang Wang or Xiaoguang Liu .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Xin He
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
En Shao
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Guangming Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., He, K., Wang, G., Liu, X. (2021). Towards Optimizing Deduplication on Persistent Memory. In: He, X., Shao, E., Tan, G. (eds) Network and Parallel Computing. NPC 2020. Lecture Notes in Computer Science(), vol 12639. Springer, Cham. https://doi.org/10.1007/978-3-030-79478-1_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-79478-1_39
Published: 23 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-79477-4
Online ISBN: 978-3-030-79478-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Towards Optimizing Deduplication on Persistent Memory