Skip to main content

An Efficient and Performance-Aware Big Data Storage System

  • Conference paper
Cloud Computing and Services Science (CLOSER 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 367))

Included in the following conference series:

Abstract

Recent escalations in Internet development and volume of data have created a growing demand for large-capacity storage solutions. Although Cloud storage has yielded new ways of storing, accessing and managing data, there is still a need for an inexpensive, effective and efficient storage solution especially suited to big data management and analysis. In this paper, we take our previous work one step further and present an in-depth analysis of the key features of future big data storage services for both unstructured and semi-structured data, and discuss how such services should be constructed and deployed. We also explain how different technologies can be combined to provide a single, highly scalable, efficient and performance-aware big data storage system. We especially focus on the issues of data de-duplication for enterprises and private organisations. This research is particularly valuable for inexperienced solution providers like universities and research organisations, and will allow them to swiftly set up their own big data storage services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amazon. Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/

  2. Google. Google Cloud Storage Service, http://code.google.com/apis/storage/

  3. AWS Case Study: SmugMug (2013)

    Google Scholar 

  4. http://aws.amazon.com/solutions/case-studies/elephantdrive/

  5. AWS Case Study: Jungle Disk

    Google Scholar 

  6. Amazon, Amazon S3 - The First Trillion Objects (2012)

    Google Scholar 

  7. Gohring, N.: Amazon’s S3 Down for Several Hours

    Google Scholar 

  8. Brodkin, J.: Outage hits Amazon S3 storage service (2008)

    Google Scholar 

  9. Li, Y., Guo, L., Guo, Y.: CACSS: Towards a Generic Cloud Storage Service. In: CLOSER 2012, pp. 27–36. SciTePress (2012)

    Google Scholar 

  10. Garfinkel, S.L.: An evaluation of amazon’s grid computing services: EC2, S3, and SQS. Citeseer (2007)

    Google Scholar 

  11. Rackspace. Cloud Files, http://www.rackspace.co.uk

  12. Barr, J.: (2011)

    Google Scholar 

  13. Wang, G., Ng, T.E.: The impact of virtualization on network performance of amazon ec2 data center. In: 2010 Proceedings of the IEEE INFOCOM. IEEE (2010)

    Google Scholar 

  14. Garfinkel, S.L.: An evaluation of amazon’s grid computing services: EC2, S3, and SQS. in Center for. 2007. Citeseer (2007)

    Google Scholar 

  15. Openstack, http://openstack.org

  16. Nurmi, D., et al.: The eucalyptus open-source cloud-computing system. IEEE (2009)

    Google Scholar 

  17. Abe, Y., Gibson, G.: pWalrus: Towards better integration of parallel file systems into cloud storage. IEEE (2010)

    Google Scholar 

  18. Bresnahan, J., et al.: Cumulus: an open source storage cloud for science. SC10 Poster (2010)

    Google Scholar 

  19. Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Project Website (2007)

    Google Scholar 

  20. HBase, A.: http://hbase.apache.org/

  21. Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop Hbase-0.20.2 performance evaluation. In: 2010 4th International Conference on New Trends in Information Science and Service Science, NISS (2010)

    Google Scholar 

  22. Khetrapal, A., Ganesh, V.: HBase and Hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University (2006)

    Google Scholar 

  23. Saab, P.: Scaling memcached at Facebook. Facebook Engineering Note (2008)

    Google Scholar 

  24. Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)

    Article  Google Scholar 

  25. Chang, F., et al.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)

    Google Scholar 

  26. Ongaro, D., et al.: Fast crash recovery in RAMCloud. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM (2011)

    Google Scholar 

  27. Tianming, Y., et al.: DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, IPDPS (2010)

    Google Scholar 

  28. Yujuan, T., et al.: SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup. In: 2010 39th International Conference on Parallel Processing, ICPP (2010)

    Google Scholar 

  29. Chuanyi, L., et al.: ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System. In: Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008 (2008)

    Google Scholar 

  30. Quinlan, S., Dorward, S.: Venti: A new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies (2002)

    Google Scholar 

  31. You, L.L., Pollack, K.T., Long, D.D.: Deep Store: An archival storage system architecture. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. IEEE (2005)

    Google Scholar 

  32. Dubnicki, C., et al.: Hydrastor: A scalable secondary storage. In: Procedings of the 7th Conference on File and Storage Technologies. USENIX Association (2009)

    Google Scholar 

  33. Jiansheng, W., et al.: MAD2: A scalable high-throughput exact deduplication approach for network backup services. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST (2010)

    Google Scholar 

  34. Guo, Y.-K., Guo, L.: IC cloud: Enabling compositional cloud. International Journal of Automation and Computing 8(3), 269–279 (2011)

    Article  Google Scholar 

  35. Sandberg, R., et al.: Design and implementation of the Sun network filesystem (1985)

    Google Scholar 

  36. Carns, P.H., et al.: PVFS: A parallel file system for Linux clusters. USENIX Association (2000)

    Google Scholar 

  37. Schwan, P.: Lustre: Building a file system for 1000-node clusters (2003)

    Google Scholar 

  38. Gilbert, H., Handschuh, H.: Security analysis of SHA-256 and sisters. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 175–193. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  39. Apache. Hadoop MapReduce, http://hadoop.apache.org/mapreduce/

  40. Borthakur, D.: Hadoop avatarnode high availability (2010)

    Google Scholar 

  41. Doclo, L.: Clustering Tomcat Servers with High Availability and Disaster Fallback (2011)

    Google Scholar 

  42. Mulesoft, Tomcat Clustering - A Step By Step Guide

    Google Scholar 

  43. Amazon. Route 53, http://aws.amazon.com/route53/

  44. JetS3t. JetS3t, http://jets3t.s3.amazonaws.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, Y., Guo, L., Guo, Y. (2013). An Efficient and Performance-Aware Big Data Storage System. In: Ivanov, I.I., van Sinderen, M., Leymann, F., Shan, T. (eds) Cloud Computing and Services Science. CLOSER 2012. Communications in Computer and Information Science, vol 367. Springer, Cham. https://doi.org/10.1007/978-3-319-04519-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04519-1_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04518-4

  • Online ISBN: 978-3-319-04519-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics