Skip to main content

Design and Implementation of Clustering File Backup Server Using File Fingerprint

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 149))

Summary

In this paper we proposes a clustering backup system that exploits file fingerprint mechanism for multi-level deduplication of redundant data. Our approach differs from the traditional file server system. First, we avoid the data redundancy by block-level fingerprint. The proposed approach enables the efficient use of the storage capacity and network bandwidth without the transmission of the duplicate data block. Second, we applied clustering technology because data transfer and I/O time is reduced a fraction of a percent for each node. In this paper, we made several experiments to verify performance of our proposed method. Experiments result shows that storage space is used efficiently and the performance is noticeably improved.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ajtai, M., Burns, R., et al.: Compactly encoding unstructured inputs with differential compression. Journal of the Association for Computing Machinery (2002)

    Google Scholar 

  2. Annapureddy, S., Freedman, M.J., Mazires, D.: Shark: Scaling file servers via cooperative caching. In: 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation, Boston, MA (2005)

    Google Scholar 

  3. Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of the annual conference on USENIX Annual Technical Conference, Berkeley, CA, USA (2004)

    Google Scholar 

  4. Cates, J.: Robust and Efficient Data Management for a Distributed Hash Table. Master’s thesis, Massachusetts Institute of Technology (May 2003)

    Google Scholar 

  5. Centos homepage, http://www.centos.org/

  6. Cox, L.P., Noble, B.D.: Pastiche: making backup cheap and easy. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation (2002)

    Google Scholar 

  7. Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. In: Trans. Storage, New York, NY, USA (2006)

    Google Scholar 

  8. Forman, G., Eshghi, K., Chiocchetti, S.: Finding similar files in large document repositories. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, New York, NY, USA (2005)

    Google Scholar 

  9. Han, B., Keleher, P.: Implementation and performance evaluation of fuzzy file block matching. In: USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (2007)

    Google Scholar 

  10. Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of the annual conference on USENIX Annual Technical Conference (2004)

    Google Scholar 

  11. Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the Evolution of Peer-to-Peer Systems. In: ACM Conf. on Principles of Distributed Computing (PODC), Monterey, CA (July 2002)

    Google Scholar 

  12. Mogul, J.C., Chan, Y.M., Kelly, T.: Design, implementation, and evaluation of duplicate transfer detection in HTTP. In: Proceedings of the 1st Symposium on Networked Systems Design and Implementation (2004)

    Google Scholar 

  13. Nath, P., Kozuch, M.A., O’Hallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In: Proceedings of the annual conference on USENIX 2006 Annual Technical Conference, Berkeley, CA, USA (2006)

    Google Scholar 

  14. Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting Practical Content-Addressable Caching with CZIP Compression. In: Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA (2007)

    Google Scholar 

  15. Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of USENIX Annual Technical Conference (2004)

    Google Scholar 

  16. Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) (2002)

    Google Scholar 

  17. Rabin, M.O.: Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology. Harvard University (1981)

    Google Scholar 

  18. US Secure Hash Algorithm 1 (SHA-1). Request for Comments(RFC) 3174, http://www.faqs.org/rfcs/rfc3174.html

  19. Rhea, S., Godfrey, B., Karp, B., Kubiatowicz, J., Ratnasamy, S., Shenker, S., Stoica, I., Yu, H.: OpenDHT: A public DHT service and its uses. In: SIGCOMM (2005)

    Google Scholar 

  20. Rivest, R.L.: The MD5 Message Digest Algorithm. Request for Comments(RFC) 1321, Internet Activities Board (1992)

    Google Scholar 

  21. rsync homepage, http://samba.anu.edu.au/rsync/

  22. tivoli homepage, http://www.ibm.com/tivoli

  23. Tolia, N., Kozuch, M., Satyanarayanan, M., Karp, B., Bressoud, T., Perrig, A.: Opportunistic use of content addressable storage for distributed file systems. In: Proc. of USENIX Technical Conference, pp. 127–140 (June 2003)

    Google Scholar 

  24. vmware homepage, http://www.vmware.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roger Lee

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Sung, H.M., Lee, W.y., Kim, J., Ko, Y.W. (2008). Design and Implementation of Clustering File Backup Server Using File Fingerprint. In: Lee, R. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Studies in Computational Intelligence, vol 149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70560-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70560-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70559-8

  • Online ISBN: 978-3-540-70560-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics