Abstract
Data deduplication is widely used in storage systems to prevent duplicated data blocks. In this paper, we suggest a dynamic chunking approach using fixed-length chunking and file similarity technique. The fixed-length chunking struggles with boundary shift problem and shows poor performance when handling duplicated data files. The key idea of this work is to utilize duplicated data information in the file similarity information. We can easily find several duplicated point by comparing hash key value and file offset within file similarity information. We consider these duplicated points as a hint for starting position of chunking. With this approach, we can significantly improve the performance of data deduplication system using fixed-length chunking. In experiment result, the proposed dynamic chunking results in significant performance improvement for deduplication processing capability and shows fast processing time comparable to that of fixed length chunking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mokadem, R., Hameurlain, A.: An efficient resource discovery while minimizing maintenance overhead in sdds based hierarchical dht systems. International Journal of Grid and Distributed Computing 4(3), 1–23 (2011)
Bagchi, S.: Vmdfs: Virtual memory based mobile distributed file system. International Journal of Multimedia and Ubiquitous Engineering 2(3), 1–14 (2007)
Jiang, H., Li, J., Li, Z., Bai, X.: Efficient large-scale content distribution with combination of cdn and p2p networks. International Journal of Hybrid Information Technology 2(2), 4 (2009)
Tridgell, A.: Efficient algorithms for sorting and synchronization. PhD thesis, The Australian National University (1999)
Clements, A., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in san cluster file systems. In: Proceedings of the 2009 Conference on USENIX Annual Technical Conference, p. 8. USENIX Association (2009)
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies, vol. 4 (2002)
Muthitacharoen, A., Chen, B., Mazieres, D.: A low-bandwidth network file system. ACM SIGOPS Operating Systems Review 35(5), 174–187 (2001)
Jung, H.M., Park, W.V., Lee, W.Y., Lee, J.G., Ko, Y.W.: Data Deduplication System for Supporting Multi-mode. In: Nguyen, N.T., Kim, C.-G., Janiak, A. (eds.) ACIIDS 2011, Part I. LNCS, vol. 6591, pp. 78–87. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moon, Y.C., Jung, H.M., Yoo, C., Ko, Y.W. (2012). Data Deduplication Using Dynamic Chunking Algorithm. In: Nguyen, NT., Hoang, K., JÈ©drzejowicz, P. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2012. Lecture Notes in Computer Science(), vol 7654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34707-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-34707-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34706-1
Online ISBN: 978-3-642-34707-8
eBook Packages: Computer ScienceComputer Science (R0)