Skip to main content
Log in

CA-Dedupe: content-aware deduplication in SSDs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Flash memories have been around for many years because of their high performance compared to HDDs. But flash memories have a limited lifespan, and they will wear prematurely if used in write-intensive usages. Solutions such as wear leveling, compression and deduplication have been proposed to address this issue. Deduplication is a proper way to improve flash memories’ lifespan, but deduplication methods proposed in previous works usually impose a significant delay on write operations. This paper provides an intelligent method for data deduplication on flash memories which works by categorizing write requests based on their contents and types. In this scheme, calculated metadata for write requests is placed in separate categories and during deduplication procedure, the search operation is performed in one category. As a result, the proposed method improves the search delay and the deduplication rate significantly. Simulation results show that the proposed method improves delay of write operations by 32%, when compared to other methods, and achieves the deduplication rate of 69.8%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Mittal S, Vetter JS (2016) A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans Parallel Distrib Syst 27(5):1537–1550

    Article  Google Scholar 

  2. Tavakkol A, Arjomand M, Sarbazi-Azad H (2014) Design for scalability in enterprise SSDs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp 417–429

  3. Ramasamy AS, Karantharaj P (2015) RFFE: a buffer cache management algorithm for flash-memory-based SSD to improve write performance. Can J Electr Comput Eng 38(3):219–231

    Article  Google Scholar 

  4. Tavakkol A, Arjomand M, Sarbazi-Azad H (2013) Network-on-SSD: a scalable and high-performance communication design paradigm for SSDs. IEEE Comput Archit Lett 12(1):5–8

    Article  Google Scholar 

  5. Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733

    Article  MathSciNet  Google Scholar 

  6. Chen F, Luo T, Zhang X (2011) CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies, February 15–17, 2011, San Jose, California, p 6

  7. Yang M-C, Chang Y-H, Kuo T-W, Huang P-C (2016) Capacity-independent address mapping for flash storage devices with explosively growing capacity. IEEE Trans Comput 65(2):448–465

    Article  MathSciNet  Google Scholar 

  8. Tsao CW, Chang YH, Yang MC, Huang PC (2015) Efficient victim block selection for flash storage devices. IEEE Trans Comput 64(12):3444–3460

    Article  MathSciNet  Google Scholar 

  9. Xu Z, Li R, Xu C-Z (2012) CAST: a page-level FTL with compact address mapping and parallel data blocks. In: Proceedings of IEEE International Performance Computing and Communication Conference, pp 142–151

  10. Park Y, Kim JS (2011) zFTL: power-efficient data compression support for NAND flash-based consumer electronics devices. IEEE Trans Consum Electron 57(3):1148–1156

    Article  Google Scholar 

  11. Ji C, Chang L-P, Shi L, Gao C, Wu C, Wang Y, Xue CJ (2017) Lightweight data compression for mobile flash storage. ACM Trans Embed Comput Syst 16(5s):1–18

    Article  Google Scholar 

  12. Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126

    Article  Google Scholar 

  13. Lee S, Park J, Fleming K, Kim J (2011) Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Trans Consum Electron 57(4):1732–1739

    Article  Google Scholar 

  14. Xie N, Dong G, Zhang T (2011) Using lossless data compression in data storage systems: not for saving space. IEEE Trans Comput 60(3):335–345

    Article  MathSciNet  Google Scholar 

  15. Liu J, Chai YP, Qin X, Liu YH (2018) Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems. J Comput Sci Technol 33(1):58–78

    Article  Google Scholar 

  16. Freudenberger J, Rajab M, Rohweder D, Safieh M (2018) A codec architecture for the compression of short data blocks. J Circuits Syst Comput 27(2):1850019

    Article  Google Scholar 

  17. Park J, Lee S, Kim J (2017) DAC: dedup-assisted compression scheme for improving lifetime of NAND storage systems. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp 1249–1252. https://doi.org/10.23919/DATE.2017.7927181.

  18. Chernov I, Ivashko E, Rumiantsev A, Ponomarev V, Shabaev A (2018) Survey on deduplication techniques in flash-based storage. In: Proceedings of the XXth Conference of Open Innovations Association FRUCT, vol 426, pp 25–33. https://doi.org/10.23919/FRUCT.2018.8468295

  19. Li WJ, Wang K, Stolfo S, Herzog B (2005) Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE Systems, Man and Cybernetics (SMC) Information Assurance Workshop, pp 64–71

  20. Mcdaniel M, Heydari M (2003) Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003, p 10

  21. Karresand M, Shahmehri N (2006) File type identification of data fragments by their binary structure. In: 2006 IEEE Information Assurance Workshop, pp 140–147

  22. Calhoun WC, Coles D (2008) Predicting the types of file fragments. Digit Investig 5:S14–S20

    Article  Google Scholar 

  23. Magic Number Definition (2019) Magic number definition by The Linux Information Project (LINFO). https://www.linfo.org/magic_number.html. Accessed 29 Aug 2019

  24. Chen Z, Chen Z, Xiao N, Liu F (2015) NF-Dedupe: a novel no-fingerprint deduplication scheme for flash-based SSDs. In: 2015 IEEE symposium on computers and communication (ISCC), 2015

  25. Ha JY, Lee YS, Kim JS (2013) Deduplication with block-level content-aware chunking for solid state drives (SSDs). In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing

  26. Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126

    Article  Google Scholar 

  27. Hu Y, Jiang H, Feng D, Tian L, Luo H, Zhang S (2011) Performance impact and interplay of SSD parallelism through advanced commands. In: Proceedings of the ICS'11, pp 96–107

  28. Nazari M, Taghizadeh R, Asghari SA, Marvasti MB, Rahmani AM (2019) FRCD: fast recovery of compressible data in flash memories. Comput Electr Eng 78:520–535

    Article  Google Scholar 

  29. Bucy JS et al (2008) The DiskSim simulation environment version 4.0 reference manual. Technical Report CMU-PDL-08-101, Parallel Data Laboratory, Carnegie Mellon University, May 2008

  30. https://docs.microsoft.com/en-us/sysinternals/downloads/diskmon. Accessed Jan 2019

  31. Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammadreza Binesh Marvasti.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gholami Taghizadeh, R., Gholami Taghizadeh, R., Khakpash, F. et al. CA-Dedupe: content-aware deduplication in SSDs. J Supercomput 76, 8901–8921 (2020). https://doi.org/10.1007/s11227-020-03188-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03188-z

Keywords

Navigation