Brushing—An Algorithm for Data Deduplication

  • Prasun Dutta
  • Pratik Pattnaik
  • Rajesh Kumar Sahu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 433)

Abstract

Deduplication is mainly used to solve the problem of space and is known as a space-efficient technique. A two step algorithm called ‘brushing’ has been proposed in this paper to solve individual file deduplication. The main aim of the algorithm is to overcome the space related problem, at the same time the algorithm also takes care of time complexity problem. The proposed algorithm has extremely low RAM overhead. The first phase of the algorithm checks the similar entities and removes them thus grouping only unique entities and in the second phase while the unique file is hashed, the unique entities are represented as index values thereby reducing the size of the file to a great extent. Test results shows that if a file contains 40–50 % duplicate data, then this technique reduces the size up to 2/3 of the file. This algorithm has a high deduplication throughput on the file system.

Keywords

Deduplication Hashing Bloom filter File system Storage space 

References

  1. 1.
    Suprativ Saha, Avik Samanta, A brief review along with a New Proposed Approach of Data De duplication, ACER 2013, pp. 223–231 (2013).Google Scholar
  2. 2.
    Srivatsa Maddodi, GirijaV.Attigeri, Dr.Karunakar A.K, Data de duplication techniques and analysis, Third International Conference on Emerging Trends in Engineering and Technology, IEEE computer Society (2010).Google Scholar
  3. 3.
    Yoshihiro Tsuchiya, Takashi Watanabe, DBLK: De duplication for Primary Block Storage, IEEE (2011).Google Scholar
  4. 4.
    Jingwei Ma, Bin Zhao, Gang Wang, Xiaoguang Liu, Adaptive Pipeline for De duplication, IEEE 2012 (2012).Google Scholar
  5. 5.
    Amrita Upadhyay, Pratibha R Balihalli, ShashibhushanIvaturi, Shrisha Rao, De duplication and Compression Techniques in Cloud Design, IEEE (2012).Google Scholar
  6. 6.
    Jingxin Feng, Jiri Schindler, A De duplication Study for Host-side Caches in Virtualized Data Center Environments, IEEE (2013).Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  • Prasun Dutta
    • 1
  • Pratik Pattnaik
    • 2
  • Rajesh Kumar Sahu
    • 1
  1. 1.Department of Computer Science & EngineeringNational Institute of Science and TechnologyBerhampurIndia
  2. 2.Department of Information TechnologyNational Institute of Science and TechnologyBerhampurIndia

Personalised recommendations