Skip to main content

DEDISbench: A Benchmark for Deduplicated Storage Systems

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2012 (OTM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7566))

Abstract

Deduplication is widely accepted as an effective technique for eliminating duplicated data in backup and archival systems. Nowadays, deduplication is also becoming appealing in cloud computing, where large-scale virtualized storage infrastructures hold huge data volumes with a significant share of duplicated content. There have thus been several proposals for embedding deduplication in storage appliances and file systems, providing different performance trade-offs while targeting both user and application data, as well as virtual machine images.

It is however hard to determine to what extent is deduplication useful in a particular setting and what technique will provide the best results. In fact, existing disk I/O micro-benchmarks are not designed for evaluating deduplication systems, following simplistic approaches for generating data written that lead to unrealistic amounts of duplicates.

We address this with DEDISbench, a novel micro-benchmark for evaluating disk I/O performance of block based deduplication systems. As the main contribution, we introduce the generation of a realistic duplicate distribution based on real datasets. Moreover, DEDISbench also allows simulating access hotspots and different load intensities for I/O operations. The usefulness of DEDISbench is shown by comparing it with Bonnie++ and IOzone open-source disk I/O micro-benchmarks on assessing two open-source deduplication systems, Opendedup and Lessfs, using Ext4 as a baseline. As a secondary contribution, our results lead to novel insight on the performance of these file systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, N., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H.: Generating realistic impressions for file-system benchmarking. In: Conference on File and Storage Technologies (2009)

    Google Scholar 

  2. Al-Rfou, R., Patwardhan, N., Bhagavatula, P.: Deduplication and compression benchmarking in filebench. Tech. rep. (2010)

    Google Scholar 

  3. Anderson, D.: Fstress: A flexible network file service benchmark. Tech. rep. (2002)

    Google Scholar 

  4. Clements, A.T., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in san cluster file systems. In: USENIX Annual Technical Conference (2009)

    Google Scholar 

  5. Coker, R.: Bonnie++ web page (May 2012), http://www.coker.com.au/bonnie++/

  6. Filebench: Filebench web page (May 2012), http://filebench.sourceforge.net

  7. Ganger, G.R., Wilkes, J.: A study of practical deduplication. In: Conference on File and Storage Technologies (2011)

    Google Scholar 

  8. White paper - complete storage and data protection architecture for vmware vsphere. Tech. rep. (2011), http://www.ea-data.com/HP_StoreOnce.pdf

  9. Katcher, J.: Postmark: a new file system benchmark. Tech. rep. (1997)

    Google Scholar 

  10. Koller, R., Rangaswami, R.: I/o deduplication: utilizing content similarity to improve i/o performance. In: Conference on File and Storage Technologies (2010)

    Google Scholar 

  11. Lessfs: Lessfs web page (May 2012), http://www.lessfs.com/wordpress/

  12. Muthitacharoen, A., Chen, B., Mazieres, D., Eres, D.M.: A low-bandwidth network file system. In: Symposium on Operating Systems Principles (2001)

    Google Scholar 

  13. Nath, P., Kozuch, M.A., Ohallaron, D.R., Harkes, J., Satyanarayanan, M., Tolia, N., Toups, M.: Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In: USENIX Annual Technical Conference (2006)

    Google Scholar 

  14. Norcott, W.D.: Iozone web page (May 2012), http://www.iozone.org/

  15. Opendedup: Opendedup web page (May 2012), http://opendedup.org

  16. Paulo, J.: Efficient storage of data in cloud computing. Master’s thesis (2009), http://gsd.di.uminho.pt/members/jtpaulo/pg10903-tese.pdf

  17. Quinlan, S., Dorward, S.: Venti: A new approach to archival storage. In: Conference on File and Storage Technologies (2002)

    Google Scholar 

  18. Tarasov, V., Mudrankit, A., Buik, W., Shilane, P., Kuenning, G., Zadok, E.: Generating realistic datasets for deduplication analysis. In: USENIX Annual Technical Conference. Poster Session (2012)

    Google Scholar 

  19. Transaction processing performance council: TPC-C standard specification, revision 5.5, http://www.tpc.org/tpcc/spec/tpcc_current.pdf

  20. Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Conference on File and Storage Technologies (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paulo, J., Reis, P., Pereira, J., Sousa, A. (2012). DEDISbench: A Benchmark for Deduplicated Storage Systems. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2012. OTM 2012. Lecture Notes in Computer Science, vol 7566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33615-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33615-7_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33614-0

  • Online ISBN: 978-3-642-33615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics