Summary
In this paper we proposes a clustering backup system that exploits file fingerprint mechanism for multi-level deduplication of redundant data. Our approach differs from the traditional file server system. First, we avoid the data redundancy by block-level fingerprint. The proposed approach enables the efficient use of the storage capacity and network bandwidth without the transmission of the duplicate data block. Second, we applied clustering technology because data transfer and I/O time is reduced a fraction of a percent for each node. In this paper, we made several experiments to verify performance of our proposed method. Experiments result shows that storage space is used efficiently and the performance is noticeably improved.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ajtai, M., Burns, R., et al.: Compactly encoding unstructured inputs with differential compression. Journal of the Association for Computing Machinery (2002)
Annapureddy, S., Freedman, M.J., Mazires, D.: Shark: Scaling file servers via cooperative caching. In: 2nd USENIX/ACM Symposium on Networked Systems Design and Implementation, Boston, MA (2005)
Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of the annual conference on USENIX Annual Technical Conference, Berkeley, CA, USA (2004)
Cates, J.: Robust and Efficient Data Management for a Distributed Hash Table. Master’s thesis, Massachusetts Institute of Technology (May 2003)
Centos homepage, http://www.centos.org/
Cox, L.P., Noble, B.D.: Pastiche: making backup cheap and easy. In: Proceedings of the 5th Symposium on Operating Systems Design and Implementation (2002)
Bobbarjung, D.R., Jagannathan, S., Dubnicki, C.: Improving duplicate elimination in storage systems. In: Trans. Storage, New York, NY, USA (2006)
Forman, G., Eshghi, K., Chiocchetti, S.: Finding similar files in large document repositories. In: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, New York, NY, USA (2005)
Han, B., Keleher, P.: Implementation and performance evaluation of fuzzy file block matching. In: USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference (2007)
Kulkarni, P., Douglis, F., LaVoie, J., Tracey, J.M.: Redundancy elimination within large collections of files. In: Proceedings of the annual conference on USENIX Annual Technical Conference (2004)
Liben-Nowell, D., Balakrishnan, H., Karger, D.: Analysis of the Evolution of Peer-to-Peer Systems. In: ACM Conf. on Principles of Distributed Computing (PODC), Monterey, CA (July 2002)
Mogul, J.C., Chan, Y.M., Kelly, T.: Design, implementation, and evaluation of duplicate transfer detection in HTTP. In: Proceedings of the 1st Symposium on Networked Systems Design and Implementation (2004)
Nath, P., Kozuch, M.A., O’Hallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In: Proceedings of the annual conference on USENIX 2006 Annual Technical Conference, Berkeley, CA, USA (2006)
Park, K., Ihm, S., Bowman, M., Pai, V.S.: Supporting Practical Content-Addressable Caching with CZIP Compression. In: Proceedings of the USENIX Annual Technical Conference, Santa Clara, CA (2007)
Policroniades, C., Pratt, I.: Alternatives for detecting redundancy in storage systems data. In: Proceedings of USENIX Annual Technical Conference (2004)
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST) (2002)
Rabin, M.O.: Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology. Harvard University (1981)
US Secure Hash Algorithm 1 (SHA-1). Request for Comments(RFC) 3174, http://www.faqs.org/rfcs/rfc3174.html
Rhea, S., Godfrey, B., Karp, B., Kubiatowicz, J., Ratnasamy, S., Shenker, S., Stoica, I., Yu, H.: OpenDHT: A public DHT service and its uses. In: SIGCOMM (2005)
Rivest, R.L.: The MD5 Message Digest Algorithm. Request for Comments(RFC) 1321, Internet Activities Board (1992)
rsync homepage, http://samba.anu.edu.au/rsync/
tivoli homepage, http://www.ibm.com/tivoli
Tolia, N., Kozuch, M., Satyanarayanan, M., Karp, B., Bressoud, T., Perrig, A.: Opportunistic use of content addressable storage for distributed file systems. In: Proc. of USENIX Technical Conference, pp. 127–140 (June 2003)
vmware homepage, http://www.vmware.com
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sung, H.M., Lee, W.y., Kim, J., Ko, Y.W. (2008). Design and Implementation of Clustering File Backup Server Using File Fingerprint. In: Lee, R. (eds) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing. Studies in Computational Intelligence, vol 149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70560-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-70560-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70559-8
Online ISBN: 978-3-540-70560-4
eBook Packages: EngineeringEngineering (R0)