An Efficient and Performance-Aware Big Data Storage System

Li, Yang; Guo, Li; Guo, Yike

doi:10.1007/978-3-319-04519-1_7

Yang Li⁵,
Li Guo⁵ &
Yike Guo⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 367))

Included in the following conference series:

International Conference on Cloud Computing and Services Science

950 Accesses
9 Citations
2 Altmetric

Abstract

Recent escalations in Internet development and volume of data have created a growing demand for large-capacity storage solutions. Although Cloud storage has yielded new ways of storing, accessing and managing data, there is still a need for an inexpensive, effective and efficient storage solution especially suited to big data management and analysis. In this paper, we take our previous work one step further and present an in-depth analysis of the key features of future big data storage services for both unstructured and semi-structured data, and discuss how such services should be constructed and deployed. We also explain how different technologies can be combined to provide a single, highly scalable, efficient and performance-aware big data storage system. We especially focus on the issues of data de-duplication for enterprises and private organisations. This research is particularly valuable for inexperienced solution providers like universities and research organisations, and will allow them to swiftly set up their own big data storage services.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amazon. Amazon Simple Storage Service (S3), http://aws.amazon.com/s3/
Google. Google Cloud Storage Service, http://code.google.com/apis/storage/
AWS Case Study: SmugMug (2013)
Google Scholar
http://aws.amazon.com/solutions/case-studies/elephantdrive/
AWS Case Study: Jungle Disk
Google Scholar
Amazon, Amazon S3 - The First Trillion Objects (2012)
Google Scholar
Gohring, N.: Amazon’s S3 Down for Several Hours
Google Scholar
Brodkin, J.: Outage hits Amazon S3 storage service (2008)
Google Scholar
Li, Y., Guo, L., Guo, Y.: CACSS: Towards a Generic Cloud Storage Service. In: CLOSER 2012, pp. 27–36. SciTePress (2012)
Google Scholar
Garfinkel, S.L.: An evaluation of amazon’s grid computing services: EC2, S3, and SQS. Citeseer (2007)
Google Scholar
Rackspace. Cloud Files, http://www.rackspace.co.uk
Barr, J.: (2011)
Google Scholar
Wang, G., Ng, T.E.: The impact of virtualization on network performance of amazon ec2 data center. In: 2010 Proceedings of the IEEE INFOCOM. IEEE (2010)
Google Scholar
Garfinkel, S.L.: An evaluation of amazon’s grid computing services: EC2, S3, and SQS. in Center for. 2007. Citeseer (2007)
Google Scholar
Openstack, http://openstack.org
Nurmi, D., et al.: The eucalyptus open-source cloud-computing system. IEEE (2009)
Google Scholar
Abe, Y., Gibson, G.: pWalrus: Towards better integration of parallel file systems into cloud storage. IEEE (2010)
Google Scholar
Bresnahan, J., et al.: Cumulus: an open source storage cloud for science. SC10 Poster (2010)
Google Scholar
Borthakur, D.: The hadoop distributed file system: Architecture and design. Hadoop Project Website (2007)
Google Scholar
HBase, A.: http://hbase.apache.org/
Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop Hbase-0.20.2 performance evaluation. In: 2010 4th International Conference on New Trends in Information Science and Service Science, NISS (2010)
Google Scholar
Khetrapal, A., Ganesh, V.: HBase and Hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University (2006)
Google Scholar
Saab, P.: Scaling memcached at Facebook. Facebook Engineering Note (2008)
Google Scholar
Barroso, L.A., Dean, J., Holzle, U.: Web search for a planet: The Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)
Article Google Scholar
Chang, F., et al.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2), 4 (2008)
Google Scholar
Ongaro, D., et al.: Fast crash recovery in RAMCloud. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM (2011)
Google Scholar
Tianming, Y., et al.: DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, IPDPS (2010)
Google Scholar
Yujuan, T., et al.: SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup. In: 2010 39th International Conference on Parallel Processing, ICPP (2010)
Google Scholar
Chuanyi, L., et al.: ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System. In: Fifth IEEE International Workshop on Storage Network Architecture and Parallel I/Os, SNAPI 2008 (2008)
Google Scholar
Quinlan, S., Dorward, S.: Venti: A new approach to archival storage. In: Proceedings of the FAST 2002 Conference on File and Storage Technologies (2002)
Google Scholar
You, L.L., Pollack, K.T., Long, D.D.: Deep Store: An archival storage system architecture. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005. IEEE (2005)
Google Scholar
Dubnicki, C., et al.: Hydrastor: A scalable secondary storage. In: Procedings of the 7th Conference on File and Storage Technologies. USENIX Association (2009)
Google Scholar
Jiansheng, W., et al.: MAD2: A scalable high-throughput exact deduplication approach for network backup services. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST (2010)
Google Scholar
Guo, Y.-K., Guo, L.: IC cloud: Enabling compositional cloud. International Journal of Automation and Computing 8(3), 269–279 (2011)
Article Google Scholar
Sandberg, R., et al.: Design and implementation of the Sun network filesystem (1985)
Google Scholar
Carns, P.H., et al.: PVFS: A parallel file system for Linux clusters. USENIX Association (2000)
Google Scholar
Schwan, P.: Lustre: Building a file system for 1000-node clusters (2003)
Google Scholar
Gilbert, H., Handschuh, H.: Security analysis of SHA-256 and sisters. In: Matsui, M., Zuccherato, R.J. (eds.) SAC 2003. LNCS, vol. 3006, pp. 175–193. Springer, Heidelberg (2004)
Chapter Google Scholar
Apache. Hadoop MapReduce, http://hadoop.apache.org/mapreduce/
Borthakur, D.: Hadoop avatarnode high availability (2010)
Google Scholar
Doclo, L.: Clustering Tomcat Servers with High Availability and Disaster Fallback (2011)
Google Scholar
Mulesoft, Tomcat Clustering - A Step By Step Guide
Google Scholar
Amazon. Route 53, http://aws.amazon.com/route53/
JetS3t. JetS3t, http://jets3t.s3.amazonaws.com

Download references

Author information

Authors and Affiliations

Department of Computing, Imperial College London, U.K.
Yang Li, Li Guo & Yike Guo

Authors

Yang Li
View author publications
You can also search for this author in PubMed Google Scholar
Li Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yike Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Empire State College, Long Island Center, State University of New York, 11788, NY, U.S.A.
Ivan I. Ivanov
University of Twente, Enschede, The Netherlands
Marten van Sinderen
Institute of Architecture of Application Systems, University of Stuttgart, Universittsstraße 38, 70569, Stuttgart, Germany
Frank Leymann
CTS, Charlotte, NC, USA
Tony Shan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Guo, L., Guo, Y. (2013). An Efficient and Performance-Aware Big Data Storage System. In: Ivanov, I.I., van Sinderen, M., Leymann, F., Shan, T. (eds) Cloud Computing and Services Science. CLOSER 2012. Communications in Computer and Information Science, vol 367. Springer, Cham. https://doi.org/10.1007/978-3-319-04519-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-04519-1_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04518-4
Online ISBN: 978-3-319-04519-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics