Brushing—An Algorithm for Data Deduplication
Abstract
Deduplication is mainly used to solve the problem of space and is known as a space-efficient technique. A two step algorithm called ‘brushing’ has been proposed in this paper to solve individual file deduplication. The main aim of the algorithm is to overcome the space related problem, at the same time the algorithm also takes care of time complexity problem. The proposed algorithm has extremely low RAM overhead. The first phase of the algorithm checks the similar entities and removes them thus grouping only unique entities and in the second phase while the unique file is hashed, the unique entities are represented as index values thereby reducing the size of the file to a great extent. Test results shows that if a file contains 40–50 % duplicate data, then this technique reduces the size up to 2/3 of the file. This algorithm has a high deduplication throughput on the file system.
Keywords
Deduplication Hashing Bloom filter File system Storage spaceReferences
- 1.Suprativ Saha, Avik Samanta, A brief review along with a New Proposed Approach of Data De duplication, ACER 2013, pp. 223–231 (2013).Google Scholar
- 2.Srivatsa Maddodi, GirijaV.Attigeri, Dr.Karunakar A.K, Data de duplication techniques and analysis, Third International Conference on Emerging Trends in Engineering and Technology, IEEE computer Society (2010).Google Scholar
- 3.Yoshihiro Tsuchiya, Takashi Watanabe, DBLK: De duplication for Primary Block Storage, IEEE (2011).Google Scholar
- 4.Jingwei Ma, Bin Zhao, Gang Wang, Xiaoguang Liu, Adaptive Pipeline for De duplication, IEEE 2012 (2012).Google Scholar
- 5.Amrita Upadhyay, Pratibha R Balihalli, ShashibhushanIvaturi, Shrisha Rao, De duplication and Compression Techniques in Cloud Design, IEEE (2012).Google Scholar
- 6.Jingxin Feng, Jiri Schindler, A De duplication Study for Host-side Caches in Virtualized Data Center Environments, IEEE (2013).Google Scholar