Abstract
Flash memories have been around for many years because of their high performance compared to HDDs. But flash memories have a limited lifespan, and they will wear prematurely if used in write-intensive usages. Solutions such as wear leveling, compression and deduplication have been proposed to address this issue. Deduplication is a proper way to improve flash memories’ lifespan, but deduplication methods proposed in previous works usually impose a significant delay on write operations. This paper provides an intelligent method for data deduplication on flash memories which works by categorizing write requests based on their contents and types. In this scheme, calculated metadata for write requests is placed in separate categories and during deduplication procedure, the search operation is performed in one category. As a result, the proposed method improves the search delay and the deduplication rate significantly. Simulation results show that the proposed method improves delay of write operations by 32%, when compared to other methods, and achieves the deduplication rate of 69.8%.
Similar content being viewed by others
References
Mittal S, Vetter JS (2016) A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans Parallel Distrib Syst 27(5):1537–1550
Tavakkol A, Arjomand M, Sarbazi-Azad H (2014) Design for scalability in enterprise SSDs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp 417–429
Ramasamy AS, Karantharaj P (2015) RFFE: a buffer cache management algorithm for flash-memory-based SSD to improve write performance. Can J Electr Comput Eng 38(3):219–231
Tavakkol A, Arjomand M, Sarbazi-Azad H (2013) Network-on-SSD: a scalable and high-performance communication design paradigm for SSDs. IEEE Comput Archit Lett 12(1):5–8
Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733
Chen F, Luo T, Zhang X (2011) CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies, February 15–17, 2011, San Jose, California, p 6
Yang M-C, Chang Y-H, Kuo T-W, Huang P-C (2016) Capacity-independent address mapping for flash storage devices with explosively growing capacity. IEEE Trans Comput 65(2):448–465
Tsao CW, Chang YH, Yang MC, Huang PC (2015) Efficient victim block selection for flash storage devices. IEEE Trans Comput 64(12):3444–3460
Xu Z, Li R, Xu C-Z (2012) CAST: a page-level FTL with compact address mapping and parallel data blocks. In: Proceedings of IEEE International Performance Computing and Communication Conference, pp 142–151
Park Y, Kim JS (2011) zFTL: power-efficient data compression support for NAND flash-based consumer electronics devices. IEEE Trans Consum Electron 57(3):1148–1156
Ji C, Chang L-P, Shi L, Gao C, Wu C, Wang Y, Xue CJ (2017) Lightweight data compression for mobile flash storage. ACM Trans Embed Comput Syst 16(5s):1–18
Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126
Lee S, Park J, Fleming K, Kim J (2011) Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Trans Consum Electron 57(4):1732–1739
Xie N, Dong G, Zhang T (2011) Using lossless data compression in data storage systems: not for saving space. IEEE Trans Comput 60(3):335–345
Liu J, Chai YP, Qin X, Liu YH (2018) Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems. J Comput Sci Technol 33(1):58–78
Freudenberger J, Rajab M, Rohweder D, Safieh M (2018) A codec architecture for the compression of short data blocks. J Circuits Syst Comput 27(2):1850019
Park J, Lee S, Kim J (2017) DAC: dedup-assisted compression scheme for improving lifetime of NAND storage systems. In: Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, pp 1249–1252. https://doi.org/10.23919/DATE.2017.7927181.
Chernov I, Ivashko E, Rumiantsev A, Ponomarev V, Shabaev A (2018) Survey on deduplication techniques in flash-based storage. In: Proceedings of the XXth Conference of Open Innovations Association FRUCT, vol 426, pp 25–33. https://doi.org/10.23919/FRUCT.2018.8468295
Li WJ, Wang K, Stolfo S, Herzog B (2005) Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE Systems, Man and Cybernetics (SMC) Information Assurance Workshop, pp 64–71
Mcdaniel M, Heydari M (2003) Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003, p 10
Karresand M, Shahmehri N (2006) File type identification of data fragments by their binary structure. In: 2006 IEEE Information Assurance Workshop, pp 140–147
Calhoun WC, Coles D (2008) Predicting the types of file fragments. Digit Investig 5:S14–S20
Magic Number Definition (2019) Magic number definition by The Linux Information Project (LINFO). https://www.linfo.org/magic_number.html. Accessed 29 Aug 2019
Chen Z, Chen Z, Xiao N, Liu F (2015) NF-Dedupe: a novel no-fingerprint deduplication scheme for flash-based SSDs. In: 2015 IEEE symposium on computers and communication (ISCC), 2015
Ha JY, Lee YS, Kim JS (2013) Deduplication with block-level content-aware chunking for solid state drives (SSDs). In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing
Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126
Hu Y, Jiang H, Feng D, Tian L, Luo H, Zhang S (2011) Performance impact and interplay of SSD parallelism through advanced commands. In: Proceedings of the ICS'11, pp 96–107
Nazari M, Taghizadeh R, Asghari SA, Marvasti MB, Rahmani AM (2019) FRCD: fast recovery of compressible data in flash memories. Comput Electr Eng 78:520–535
Bucy JS et al (2008) The DiskSim simulation environment version 4.0 reference manual. Technical Report CMU-PDL-08-101, Parallel Data Laboratory, Carnegie Mellon University, May 2008
https://docs.microsoft.com/en-us/sysinternals/downloads/diskmon. Accessed Jan 2019
Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gholami Taghizadeh, R., Gholami Taghizadeh, R., Khakpash, F. et al. CA-Dedupe: content-aware deduplication in SSDs. J Supercomput 76, 8901–8921 (2020). https://doi.org/10.1007/s11227-020-03188-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03188-z