Live Deduplication Storage of Virtual Machine Images in an Open-Source Cloud

  • Chun-Ho Ng
  • Mingcao Ma
  • Tsz-Yeung Wong
  • Patrick P. C. Lee
  • John C. S. Lui
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7049)


Deduplication is an approach of avoiding storing data blocks with identical content, and has been shown to effectively reduce the disk space for storing multi-gigabyte virtual machine (VM) images. However, it remains challenging to deploy deduplication in a real system, such as a cloud platform, where VM images are regularly inserted and retrieved. We propose LiveDFS, a live deduplication file system that enables deduplication storage of VM images in an open-source cloud that is deployed under low-cost commodity hardware settings with limited memory footprints. LiveDFS has several distinct features, including spatial locality, prefetching of metadata, and journaling. LiveDFS is POSIX-compliant and is implemented as a Linux kernel-space file system. We deploy our LiveDFS prototype as a storage layer in a cloud platform based on OpenStack, and conduct extensive experiments. Compared to an ordinary file system without deduplication, we show that LiveDFS can save at least 40% of space for storing VM images, while achieving reasonable performance in importing and retrieving VM images. Our work justifies the feasibility of deploying LiveDFS in an open-source cloud.


Deduplication virtual machine image storage open-source cloud file system implementation experimentation 


  1. 1.
  2. 2.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: A View of Cloud Computing. Comm. of the ACM 53(4), 50–58 (2010)CrossRefGoogle Scholar
  3. 3.
    Bhagwat, D., Eshghi, K., Long, D., Lillibridge, M.: Extreme binning: Scalable, parallel deduplication for chunk-based file backup. In: Proc. IEEE MASCOTS, pp. 1–9. IEEE (2009)Google Scholar
  4. 4.
    Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM (1970)Google Scholar
  5. 5.
  6. 6.
    Cao, M., Tso, T., Pulavarty, B., Bhattacharya, S., Dilger, A., Tomas, A.: State of the art: Where we are with the ext3 filesystem. In: Proc. of the Ottawa Linux Symposium, OLS (2005)Google Scholar
  7. 7.
    Clements, A., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in SAN cluster file systems. In: Proc. USENIX ATC (2009)Google Scholar
  8. 8.
    Debnath, B., Sengupta, S., Li, J.: ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory. In: Proc. USENIX ATC (2010)Google Scholar
  9. 9.
    Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: Hydrastor: A scalable secondary storage. In: Proc. USENIX FAST (2009)Google Scholar
  10. 10.
    Hansen, J.G., Jul, E.: Lithium: Virtual Machine Storage for the Cloud. In: Proc. of ACM SOCC (2010)Google Scholar
  11. 11.
    Jin, K., Miller, E.L.: The effectiveness of deduplication on virtual machine disk images. In: Proc. ACM SYSTOR (2009)Google Scholar
  12. 12.
    Kruus, E., Ungureanu, C., Dubnicki, C.: Bimodal content defined chunking for backup streams. In: Proc. USENIX FAST, page 18. USENIX Association (2010)Google Scholar
  13. 13.
    Liguori, A., Van Hensbergen, E.: Experiences with Content Addressable Storage and Virtual Disks. In: WIOV 2008 (2008)Google Scholar
  14. 14.
    Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar, V., Trezise, G., Camble, P.: Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. In: Proc. USENIX FAST (2009)Google Scholar
  15. 15.
    Meister, D., Brinkmann, A.: Dedupv1: Improving deduplication throughput using solid state drives (SSD). In: Proc. IEEE MSST (2010)Google Scholar
  16. 16.
    Muthitacharoen, A., Chen, B., Mazières, D.: A low-bandwidth network file system. In: Proc. of ACM SOSP (2001)Google Scholar
  17. 17.
    Nath, P., Kozuch, M.A., O’Hallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design Tradeoffs in Applying Content Addressable Storage to Enterprise-scale Systems Based on Virtual Machines. In: Proc. USENIX ATC (2006)Google Scholar
  18. 18.
    Nurmi, D., Wolski, R., Grzegorczyk, C., Obertelli, G., Soman, S., Youseff, L., Zagorodnov, D.: The Eucalyptus Open-source Cloud Computing System. In: Proc. of IEEE CCGrid (2009)Google Scholar
  19. 19.
  20. 20.
  21. 21.
    OpenSolaris. ZFS Dedup FAQ (Community Group zfs.dedup) - XWiki (December 2010),
  22. 22.
  23. 23.
    Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proc. USENIX FAST (2002)Google Scholar
  24. 24.
    Rhea, S., Cox, R., Pesterev, A.: Fast, inexpensive content-addressed storage in foundation. In: Proc. USENIX ATC (2008)Google Scholar
  25. 25.
    Ristenpart, T., Tromer, E., Shacham, H., Savage, S.: Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. In: Proc. of ACM CCS (2009)Google Scholar
  26. 26.
  27. 27.
    Ungureanu, C., Atkin, B., Aranya, A., Gokhale, S., Rago, S., Calkowski, G., Dubnicki, C., Bohra, A.: HydraFS: a High-Throughput File System for the HYDRAstor Content-Addressable Storage System. In: Proc. of USENIX FAST (2010)Google Scholar
  28. 28.
    Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proc. USENIX FAST (2008)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2011

Authors and Affiliations

  • Chun-Ho Ng
    • 1
  • Mingcao Ma
    • 1
  • Tsz-Yeung Wong
    • 1
  • Patrick P. C. Lee
    • 1
  • John C. S. Lui
    • 1
  1. 1.Dept. of Computer Science and EngineeringThe Chinese University of Hong KongHong Kong

Personalised recommendations