Abstract
Deduplication is an approach of avoiding storing data blocks with identical content, and has been shown to effectively reduce the disk space for storing multi-gigabyte virtual machine (VM) images. However, it remains challenging to deploy deduplication in a real system, such as a cloud platform, where VM images are regularly inserted and retrieved. We propose LiveDFS, a live deduplication file system that enables deduplication storage of VM images in an open-source cloud that is deployed under low-cost commodity hardware settings with limited memory footprints. LiveDFS has several distinct features, including spatial locality, prefetching of metadata, and journaling. LiveDFS is POSIX-compliant and is implemented as a Linux kernel-space file system. We deploy our LiveDFS prototype as a storage layer in a cloud platform based on OpenStack, and conduct extensive experiments. Compared to an ordinary file system without deduplication, we show that LiveDFS can save at least 40% of space for storing VM images, while achieving reasonable performance in importing and retrieving VM images. Our work justifies the feasibility of deploying LiveDFS in an open-source cloud.
Chapter PDF
Similar content being viewed by others
Keywords
References
Amazon EC2, http://aws.amazon.com/ec2
Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A View of Cloud Computing. Comm. of the ACM 53(4), 50–58 (2010)
Bhagwat, D., Eshghi, K., Long, D., Lillibridge, M.: Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: Proc. IEEE MASCOTS, pp. 1–9. IEEE (2009)
Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (1970)
Btrfs, http://btrfs.wiki.kernel.org
Cao, M., Tso, T., Pulavarty, B., Bhattacharya, S., Dilger, A., Tomas, A.: State of the art: Where we are with the ext3 filesystem. In: Proc. of the Ottawa Linux Symposium, OLS (2005)
Clements, A., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in SAN cluster file systems. In: Proc. USENIX ATC (2009)
Debnath, B., Sengupta, S., Li, J.: ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory. In: Proc. USENIX ATC (2010)
Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: Hydrastor: A scalable secondary storage. In: Proc. USENIX FAST (2009)
Hansen, J.G., Jul, E.: Lithium: Virtual Machine Storage for the Cloud. In: Proc. of ACM SOCC (2010)
Jin, K., Miller, E.L.: The effectiveness of deduplication on virtual machine disk images. In: Proc. ACM SYSTOR (2009)
Kruus, E., Ungureanu, C., Dubnicki, C.: Bimodal content defined chunking for backup streams. In: Proc. USENIX FAST, page 18. USENIX Association (2010)
Liguori, A., Van Hensbergen, E.: Experiences with Content Addressable Storage and Virtual Disks. In: WIOV 2008 (2008)
Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., Camble, P.: Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. In: Proc. USENIX FAST (2009)
Meister, D., Brinkmann, A.: Dedupv1: Improving deduplication throughput using solid state drives (SSD). In: Proc. IEEE MSST (2010)
Muthitacharoen, A., Chen, B., Mazières, D.: A low-bandwidth network file system. In: Proc. of ACM SOSP (2001)
Nath, P., Kozuch, M.A., O’Hallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines. In: Proc. USENIX ATC (2006)
Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The Eucalyptus Open-source Cloud Computing System. In: Proc. of IEEE CCGrid (2009)
Offline Deduplication for Btrfs, http://www.spinics.net/lists/linux-btrfs/msg07818.html
OpenDedup. NBU for vSphere (December 2010), http://code.google.com/p/opendedup/downloads/detail?name=SDFScture.pdf
OpenSolaris. ZFS Dedup FAQ (Community Group zfs.dedup) - XWiki (December 2010), http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup
OpenStack, http://www.openstack.org
Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proc. USENIX FAST (2002)
Rhea, S., Cox, R., Pesterev, A.: Fast, inexpensive content-addressed storage in foundation. In: Proc. USENIX ATC (2008)
Ristenpart, T., Tromer, E., Shacham, H., Savage, S.: Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. In: Proc. of ACM CCS (2009)
Seagate. 7200-RPM Drive Specification Comparison, http://www.seagate.com/docs/pdf/whitepaper/mb578_7200_drive_specification_comparison.pdf
Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., Bohra, A.: HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System. In: Proc. of USENIX FAST (2010)
Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proc. USENIX FAST (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Ng, CH., Ma, M., Wong, TY., Lee, P.P.C., Lui, J.C.S. (2011). Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud. In: Kon, F., Kermarrec, AM. (eds) Middleware 2011. Middleware 2011. Lecture Notes in Computer Science, vol 7049. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25821-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-25821-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25820-6
Online ISBN: 978-3-642-25821-3
eBook Packages: Computer ScienceComputer Science (R0)