Design and Implementation of Clustering File Backup Server Using File Fingerprint

Sung, Ho Min; Lee, Wan yeon; Kim, Jin; Ko, Young Woong

doi:10.1007/978-3-540-70560-4_6

Design and Implementation of Clustering File Backup Server Using File Fingerprint

Ho Min Sung¹,
Wan yeon Lee¹,
Jin Kim¹ &
…
Young Woong Ko¹

Chapter

801 Accesses
1 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 149))

Summary

In this paper we proposes a clustering backup system that exploits file fingerprint mechanism for multi-level deduplication of redundant data. Our approach differs from the traditional file server system. First, we avoid the data redundancy by block-level fingerprint. The proposed approach enables the efficient use of the storage capacity and network bandwidth without the transmission of the duplicate data block. Second, we applied clustering technology because data transfer and I/O time is reduced a fraction of a percent for each node. In this paper, we made several experiments to verify performance of our proposed method. Experiments result shows that storage space is used efficiently and the performance is noticeably improved.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ajtai, M., Burns, R., et al.: Compactly encoding unstructured inputs with differential compression. Journal of the Association for Computing Machinery (2002)
Google Scholar
Annapureddy, S., Freedman, M.J., Mazires, D.: Shark: Scaling file servers via cooperative caching. In: 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation, Boston, MA (2005)
Google Scholar
Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of the annual conference on USENIX Annual Technical Conference, Berkeley, CA, USA (2004)
Google Scholar
Cates, J.: Robust and Efficient Data Management for a Distributed Hash Table. Master’s thesis, Massachusetts Institute of Technology (May 2003)
Google Scholar
Centos homepage, http://www.centos.org/
Cox, L.P., Noble, B.D.: Pastiche: making backup cheap and easy. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation (2002)
Google Scholar
Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. In: Trans. Storage, New York, NY, USA (2006)
Google Scholar
Forman, G., Eshghi, K., Chiocchetti, S.: Finding similar files in large document repositories. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, New York, NY, USA (2005)
Google Scholar
Han, B., Keleher, P.: Implementation and performance evaluation of fuzzy file block matching. In: USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (2007)
Google Scholar
Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of the annual conference on USENIX Annual Technical Conference (2004)
Google Scholar
Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the Evolution of Peer-to-Peer Systems. In: ACM Conf. on Principles of Distributed Computing (PODC), Monterey, CA (July 2002)
Google Scholar
Mogul, J.C., Chan, Y.M., Kelly, T.: Design, implementation, and evaluation of duplicate transfer detection in HTTP. In: Proceedings of the 1st Symposium on Networked Systems Design and Implementation (2004)
Google Scholar
Nath, P., Kozuch, M.A., O’Hallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In: Proceedings of the annual conference on USENIX 2006 Annual Technical Conference, Berkeley, CA, USA (2006)
Google Scholar
Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting Practical Content-Addressable Caching with CZIP Compression. In: Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA (2007)
Google Scholar
Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of USENIX Annual Technical Conference (2004)
Google Scholar
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) (2002)
Google Scholar
Rabin, M.O.: Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology. Harvard University (1981)
Google Scholar
US Secure Hash Algorithm 1 (SHA-1). Request for Comments(RFC) 3174, http://www.faqs.org/rfcs/rfc3174.html
Rhea, S., Godfrey, B., Karp, B., Kubiatowicz, J., Ratnasamy, S., Shenker, S., Stoica, I., Yu, H.: OpenDHT: A public DHT service and its uses. In: SIGCOMM (2005)
Google Scholar
Rivest, R.L.: The MD5 Message Digest Algorithm. Request for Comments(RFC) 1321, Internet Activities Board (1992)
Google Scholar
rsync homepage, http://samba.anu.edu.au/rsync/
tivoli homepage, http://www.ibm.com/tivoli
Tolia, N., Kozuch, M., Satyanarayanan, M., Karp, B., Bressoud, T., Perrig, A.: Opportunistic use of content addressable storage for distributed file systems. In: Proc. of USENIX Technical Conference, pp. 127–140 (June 2003)
Google Scholar
vmware homepage, http://www.vmware.com

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Hallym University, Cuncheon, Gangwon-do, Korea
Ho Min Sung, Wan yeon Lee, Jin Kim & Young Woong Ko

Authors

Ho Min Sung
View author publications
You can also search for this author in PubMed Google Scholar
Wan yeon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jin Kim
View author publications
You can also search for this author in PubMed Google Scholar
Young Woong Ko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roger Lee

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sung, H.M., Lee, W.y., Kim, J., Ko, Y.W. (2008). Design and Implementation of Clustering File Backup Server Using File Fingerprint. In: Lee, R. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Studies in Computational Intelligence, vol 149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70560-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-70560-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70559-8
Online ISBN: 978-3-540-70560-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics