Cloud-Hosted Data Storage Systems

  • Liang Zhao
  • Sherif Sakr
  • Anna Liu
  • Athman Bouguettaya
Chapter

Abstract

Over the past decade, rapidly growing Internet-based services such as e-mail, blogging, social networking, search and e-commerce have substantially redefined the way consumers communicate, access contents, share information and purchase products. Relational database management systems (RDBMS) have been considered as the one-size-fits-all solution for data persistence and retrieval for decades. However, ever increasing need for scalability and new application requirements have created new challenges for traditional RDBMS. Recently, a new generation of low-cost, high-performance database software, aptly named as NoSQL (Not Only SQL), has emerged to challenge the dominance of RDBMS. The main features of these systems include: ability to horizontally scale, supporting weaker consistency models, using flexible schemas and data models and supporting simple low-level query interfaces. In this chapter, we explore the recent advancements and the state-of-the-art of Web scale data management approaches. We discuss the advantages and the disadvantages of several recently introduced approaches and its suitability to support certain class of applications and end-users.

Keywords

Relational Database Management System Buffer Pool NoSQL Database Column Family Commodity Server 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 7.
    Apache Cassandra database - Project Webpage. http://cassandra.apache.org/.
  2. 8.
    Apache CouchDB database - Project Webpage. http://couchdb.apache.org/.
  3. 10.
    Apache HBase database - Project Webpage. http://hbase.apache.org/.
  4. 14.
    DEX: a distributed key-value storage system. http://www.dama.upc.edu/technology-transfer/dex.
  5. 15.
    Dynomite: a distributed key-value storage system. http://wiki.github.com/cliffmoon/dynomite/dynomite-framework.
  6. 16.
    Eucalyptus: Open Source AWS Compatible Private Clouds. http://www.eucalyptus.com/.
  7. 20.
  8. 23.
  9. 24.
  10. 26.
    Hadoop Distributed Filesystem (HDFS). http://hadoop.apache.org/hdfs/.
  11. 30.
    HyperTable: A high performance, scalable, distributed storage and processing system for structured and unstructured data. http://hypertable.org/.
  12. 34.
    List of NoSQL Databases. http://NoSQL-database.org/.
  13. 35.
    Memcached: a distributed memory object caching system. http://memcached.org/.
  14. 38.
    MongoDB: an open-source document database. http://www.mongodb.org/.
  15. 39.
    Neo4J: Graph Database System. http://neo4j.org/.
  16. 41.
    Riak: a distributed key-value storage system. http://wiki.basho.com/display/RIAK/Riak.
  17. 43.
    SalesForce Cloud Solutions. http://salesforce.com/.
  18. 44.
  19. 52.
    Voldemort: a distributed key-value storage system. http://project-voldemort.com/.
  20. 53.
    YCSB++ Benchmark - Project Webpage. http://www.pdl.cmu.edu/ycsb++/index.shtml.
  21. 54.
    YCSB: Yahoo! Cloud Serving Benchmark. http://wiki.github.com/brianfrankcooper/YCSB/.
  22. 56.
    Daniel Abadi. Data management in the cloud: Limitations and opportunities. Data Eng. Bull., 32(1):3–12, March 2009.Google Scholar
  23. 57.
    Daniel Abadi. Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer, 45(2):37–42, February 2012.CrossRefMathSciNetGoogle Scholar
  24. 62.
    Divyakant Agrawal, Amr El Abbadi, Fatih Emekci, and Ahmed Metwally. Database management as a service: Challenges and opportunities. In Proceedings of the 25th IEEE International Conference on Data Engineering, ICDE ’09, pages 1709–1716, Shanghai, China, March 2009. IEEE Computer Society.Google Scholar
  25. 70.
    Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein,, and Ion Stoica. The Potential Dangers of Causal Consistency and an Explicit Solution. In SoCC, 2012.Google Scholar
  26. 79.
    Philip A. Bernstein, Istvan Cseri, Nishant Dani, Nigel Ellis, Ajay Kalhan, Gopal Kakivaya, David B. Lomet, Ramesh Manne, Lev Novik, and Tomas Talius. Adapting Microsoft SQL server for cloud computing. In Proceedings of the 27th IEEE International Conference on Data Engineering, ICDE ’11, pages 1255–1263, Hannover, Germany, 2011. IEEE Computer Society.Google Scholar
  27. 85.
    Matthias Brantner, Daniela Florescu, David Graf, Donald Kossmann, and Tim Kraska. Building a database on S3. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, pages 251–264, Vancouver, BC, Canada, 2008. ACM.Google Scholar
  28. 86.
    Eric Brewer. Towards robust distributed systems (abstract). In Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing, PODC ’00, page 7, Portland, OR, USA, 2000. ACM.Google Scholar
  29. 89.
    Chris Bunch, Navraj Chohan, Chandra Krintz, Jovan Chohan, Jonathan Kupferman, Puneet Lakhina, Yiming Li, and Yoshihide Nomura. An evaluation of distributed datastores using the AppScale cloud platform. In Proceedings of the 3rd IEEE International Conference on Cloud Computing, CLOUD ’10, pages 305–312, Washington, DC, USA, 2010. IEEE Computer Society.Google Scholar
  30. 90.
    Mike Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, pages 335–350, Seattle, WA, USA, 2006. USENIX Association.Google Scholar
  31. 94.
    Rick Cattell. Scalable SQL and NoSQL data stores. SIGMOD Rec., 39(4):12–27, May 2011.Google Scholar
  32. 96.
    Emmanuel Cecchet, Rahul Singh, Upendra Sharma, and Prashant Shenoy. Dolly: virtualization-driven database provisioning for the cloud. In Proceedings of the 7th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE ’11, pages 51–62, Newport Beach, CA, USA, 2011. ACM.Google Scholar
  33. 99.
    Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1–4:26, June 2008.Google Scholar
  34. 105.
    Navraj Chohan, Chris Bunch, Sydney Pang, Chandra Krintz, Nagy Mostafa, Sunil Soman, and Rich Wolski. AppScale: Scalable and open AppEngine application development and deployment. In Dimiter R. Avresky, Michel Diaz, Arndt Bode, Bruno Ciciani, and Eliezer Dekel, editors, Proceedings of the 1st International Conference on Cloud Computing, volume 34 of CloudComp ’09, pages 57–70, Munich, Germany, October 2009. Springer Berlin Heidelberg.Google Scholar
  35. 111.
    Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow., 1(2):1277–1288, August 2008.Google Scholar
  36. 112.
    Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC ’10, pages 143–154, Indianapolis, IN, USA, 2010. ACM.Google Scholar
  37. 116.
    Carlo Curino, Evan Jones, Yang Zhang, Eugene Wu, and Sam Madde. Relational Cloud: The Case for a Database Service. In CIDR, 2011.Google Scholar
  38. 121.
    Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev., 41(6):205–220, October 2007.Google Scholar
  39. 133.
    Daniela Florescu and Donald Kossmann. Rethinking cost and performance of database systems. SIGMOD Rec., 38(1):43–48, June 2009.Google Scholar
  40. 137.
    Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google file system. SIGOPS Oper. Syst. Rev., 37(5):29–43, October 2003.Google Scholar
  41. 138.
    Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News, 33(2):51–59, June 2002.CrossRefGoogle Scholar
  42. 139.
    Jim Gray. Distributed computing economics. Queue, 6(3):63–68, May 2008.CrossRefGoogle Scholar
  43. 144.
    Hakan Hacigümüs, Sharad Mehrotra, and Balakrishna R. Iyer. Providing Database as a Service. In ICDE, 2002.Google Scholar
  44. 151.
    Tony Hey, Stewart Tansley, and Kristin M. Tolle, editors. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington, USA, 2009.Google Scholar
  45. 158.
    David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, STOC ’97, pages 654–663, El Paso, TX, USA, May 1997. ACM.Google Scholar
  46. 160.
    Bettina Kemme, Ricardo Jiménez Peris, and Marta Patiño-Martínez. Database Replication. Synthesis Lectures on Data Management. Morgan & Claypool, 1st edition, 2010.Google Scholar
  47. 162.
    Donald Kossmann, Tim Kraska, and Simon Loesing. An evaluation of alternative architectures for transaction processing in the cloud. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 579–590, Indianapolis, IN, USA, June 2010. ACM.Google Scholar
  48. 167.
    Avinash Lakshman and Prashant Malik. Cassandra: a structured storage system on a p2p network. In Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures, SPAA ’09, pages 47–47, New York, NY, USA, 2009. ACM.Google Scholar
  49. 168.
    Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35–40, April 2010.Google Scholar
  50. 178.
    Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. Don’t settle for eventual: Scalable causal consistency for wide-area storage with COPS. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP ’11, pages 401–416, New York, NY, USA, 2011. ACM.Google Scholar
  51. 189.
    M. Tamer Özsu and Patrick Valduriez. Principles of Distributed Database Systems. Springer, New York, NY, USA, 3rd edition, March 2011.Google Scholar
  52. 192.
    Swapnil Patil, Milo Polte, Kai Ren, Wittawat Tantisiriroj, Lin Xiao, Julio López, Garth Gibson, Adam Fuchs, and Billie Rinaldi. YCSB++: benchmarking and performance debugging advanced features in scalable table stores. In SOCC, 2011.Google Scholar
  53. 196.
    Dan Pritchett. BASE: An ACID alternative. Queue, 6(3):48–55, May 2008.CrossRefGoogle Scholar
  54. 205.
    Sherif. Sakr, Anna. Liu, Daniel .M. Batista, and Mohammad. Alomari. A survey of large scale data management approaches in cloud environments. IEEE Communications Surveys & Tutorials, 13(3):311–336, 2011.Google Scholar
  55. 207.
    Sherif Sakr, Liang Zhao, Hiroshi Wada, and Anna Liu. CloudDB AutoAdmin: Towards a truly elastic cloud-based data store. In Proceedings of the 9th IEEE International Conference on Web Services, ICWS ’11, pages 732–733, Washington, DC, USA, July 2011. IEEE Computer Society.Google Scholar
  56. 209.
    Adam Silberstein, Jianjun Chen, David Lomax, B. McMillan, M. Mortazavi, P. P. S. Narayan, Raghu Ramakrishnan, and Russell Sears. PNUTS in Flight: Web-Scale Data Serving at Yahoo. IEEE Internet Computing, 16(1):13–23, 2012.Google Scholar
  57. 211.
    Ahmed A. Soror, Umar Farooq Minhas, Ashraf Aboulnaga, Kenneth Salem, Peter Kokosielis, and Sunil Kamath. Automatic virtual machine configuration for database workloads. ACM Trans. Database Syst., 35(1):7:1–7:47, February 2008.Google Scholar
  58. 212.
    Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. Transactional storage for geo-replicated systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP ’11, pages 385–400, New York, NY, USA, 2011. ACM.Google Scholar
  59. 216.
    Michael Stonebraker. One size fits all: an idea whose time has come and gone. Commun. ACM, 51(12):76, 2008.Google Scholar
  60. 219.
    Andrew S. Tanenbaum and Maarten van Steen. Distributed Systems: Principles and Paradigms. Prentice Hall, Upper Saddle River, NJ, USA, 2nd edition, October 2006.Google Scholar
  61. 226.
    Werner Vogels. Eventually consistent. Commun. ACM, 52(1):40–44, January 2009.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Liang Zhao
    • 1
  • Sherif Sakr
    • 2
    • 3
  • Anna Liu
    • 4
  • Athman Bouguettaya
    • 5
  1. 1.NICTAKensingtonAustralia
  2. 2.Software Systems Research Group NICTAEveleighAustralia
  3. 3.Faculty of Computers and InformationCairo UniversityGizaEgypt
  4. 4.NICTAEveleighAustralia
  5. 5.School of Computer Science and Information TechnologyRMIT UniversityMelbourneAustralia

Personalised recommendations