Advertisement

NoSQL database systems: a survey and decision guidance

  • Felix Gessert
  • Wolfram Wingerath
  • Steffen Friedrich
  • Norbert Ritter
Special Issue Paper

Abstract

Today, data is generated and consumed at unprecedented scale. This has lead to novel approaches for scalable data management subsumed under the term “NoSQL” database systems to handle the ever-increasing data volume and request loads. However, the heterogeneity and diversity of the numerous existing systems impede the well-informed selection of a data store appropriate for a given application context. Therefore, this article gives a top-down overview of the field: instead of contrasting the implementation specifics of individual representatives, we propose a comparative classification model that relates functional and non-functional requirements to techniques and algorithms employed in NoSQL databases. This NoSQL Toolbox allows us to derive a simple decision tree to help practitioners and researchers filter potential system candidates based on central application requirements.

Keywords

NoSQL Data management Scalability Data models Sharding Replication 

References

  1. 1.
    Abadi D (2012) Consistency tradeoffs in modern distributed database system design: cap is only part of the story. Computer 45(2):37–42CrossRefGoogle Scholar
  2. 2.
    Attiya H, Bar-Noy A et al (1995) Sharing memory robustly in message-passing systems. JACM 42(1)Google Scholar
  3. 3.
    Bailis P, Kingsbury K (2014) The network is reliable. Commun ACM 57(9):48–55CrossRefGoogle Scholar
  4. 4.
    Baker J, Bond C, Corbett JC et al (2011) Megastore: providing scalable, highly available storage for interactive services. In: CIDR, pp 223–234Google Scholar
  5. 5.
    Bernstein PA, Cseri I, Dani N et al (2011) Adapting microsoft sql server for cloud computing. In: 27th ICDE, pp 1255–1263 IEEEGoogle Scholar
  6. 6.
    Boykin O, Ritchie S, O’Connell I, Lin J (2014) Summingbird: a framework for integrating batch and online mapreduce computations. VLDB 7(13)Google Scholar
  7. 7.
    Brewer EA (2000) Towards robust distributed systemsGoogle Scholar
  8. 8.
    Calder B, Wang J, Ogus A et al (2011) Windows azure storage: a highly available cloud storage service with strong consistency. In: 23th SOSP. ACMGoogle Scholar
  9. 9.
    Chang F, Dean J, Ghemawat S et al (2006) Bigtable: a distributed storage system for structured data. In: 7th OSDI, USENIX Association, pp 15–15Google Scholar
  10. 10.
    Charron-Bost B, Pedone F, Schiper A (2010) Replication: theory and practice, lecture notes in computer science, vol. 5959. SpringerGoogle Scholar
  11. 11.
    Cooper BF, Ramakrishnan R, Srivastava U et al (2008) Pnuts: Yahoo!’s hosted data serving platform. Proc VLDB Endow 1(2):1277–1288CrossRefGoogle Scholar
  12. 12.
    Corbett JC, Dean J, Epstein M, et al (2012) Spanner: Google’s globally-distributed database. In: Proceedings of OSDI, USENIX Association, pp 251–264Google Scholar
  13. 13.
    Curino C, Jones E, Popa RA et al. (2011) Relational cloud: a database service for the cloud. In: 5th CIDRGoogle Scholar
  14. 14.
    Das S, Agrawal D, El Abbadi A et al (2010) G-store: a scalable data store for transactional multi key access in the cloud. In: 1st SoCC, ACM, pp 163–174Google Scholar
  15. 15.
    Davidson SB, Garcia-Molina H, Skeen D et al (1985) Consistency in a partitioned network: a survey. SUR 17(3):341–370CrossRefGoogle Scholar
  16. 16.
    Dean J (2009) Designs, lessons and advice from building large distributed systems. Keynote talk at LADIS 2009Google Scholar
  17. 17.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. COMMUN ACM 51(1)Google Scholar
  18. 18.
    DeC andia G, Hastorun D et al (2007) Dynamo: amazon’s highly available key-value store. In: 21th SOSP, ACM, pp 205–220Google Scholar
  19. 19.
    Fischer MJ, Lynch NA, Paterson MS (1985) Impossibility of distributed consensus with one faulty process. J ACM 32(2):374–382MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Gessert F, Schaarschmidt M, Wingerath W, Friedrich S, Ritter N (2015) The cache sketch: Revisiting expiration-based caching in the age of cloud data management. In: BTW, pp 53–72Google Scholar
  21. 21.
    Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2):51–59CrossRefGoogle Scholar
  22. 22.
    Gray J, Helland P (1996) The dangers of replication and a solution. SIGMOD Rec 25(2):173–182CrossRefGoogle Scholar
  23. 23.
    Haerder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317MathSciNetCrossRefGoogle Scholar
  24. 24.
    Hamilton J (2007) On designing and deploying internet-scale services. In: 21st LISA. USENIX AssociationGoogle Scholar
  25. 25.
    Hellerstein JM, Stonebraker M, Hamilton J (2007) Architecture of a database system. Now Publishers IncGoogle Scholar
  26. 26.
    Herlihy MP, Wing JM (1990) Linearizability: a correctness condition for concurrent objects. TOPLAS 12Google Scholar
  27. 27.
    Hoelzle U, Barroso LA (2009) The Datacenter As a Computer: an introduction to the design of warehouse-scale machines. Morgan and Claypool PublishersGoogle Scholar
  28. 28.
    Hunt P, Konar M, Junqueira FP, Reed B (2010) Zookeeper: wait-free coordination for internet-scale systems. In: USENIXATC. USENIX AssociationGoogle Scholar
  29. 29.
    Kallman R, Kimura H, Natkins J et al (2008) H-store: a high-performance, distributed main memory transaction processing system. VLDB EndowmentGoogle Scholar
  30. 30.
    Karger D, Lehman E, Leighton T et al (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: 29th STOC, ACMGoogle Scholar
  31. 31.
    Kleppmann M (2016) Designing data-intensive applications. O Reilly, to appearGoogle Scholar
  32. 32.
    Kraska T, Pang G, Franklin MJ et al (2013) Mdcc: Multi-data center consistency. In: 8th EuroSys, ACMGoogle Scholar
  33. 33.
    Kreps J (2014) Questioning the lambda architecture. Accessed: 17 Dec 2015Google Scholar
  34. 34.
    Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system. SIGOPS Oper Syst Rev 44(2):35–40CrossRefGoogle Scholar
  35. 35.
    Laney D (2001) 3d data management: Controlling data volume, velocity, and variety. Tech. rep, META GroupGoogle Scholar
  36. 36.
    Lloyd W, Freedman MJ, Kaminsky, M et al (2011) Don’t settle for eventual: scalable causal consistency for wide-area storage with cops. In: 23th SOSP. ACMGoogle Scholar
  37. 37.
    Mahajan P, Alvisi L, Dahlin M et al (2011) Consistency, availability, and convergence. University of Texas at Austin Tech Report 11Google Scholar
  38. 38.
    Mao Y, Junqueira FP, Marzullo K (2008) Mencius: building efficient replicated state machines for wans. OSDI 8:369–384Google Scholar
  39. 39.
    Marz N, Warren J (2015) Big data: principles and best practices of scalable realtime data systems. Manning Publications CoGoogle Scholar
  40. 40.
    Min C, Kim K, Cho H et al (2012) Sfs: random write considered harmful in solid state drives. In: FASTGoogle Scholar
  41. 41.
    Özsu MT, Valduriez P (2011) Principles of distributed database systems. Springer Science & Business MediaGoogle Scholar
  42. 42.
    Pritchett D (2008) Base: an acid alternative. Queue 6(3):48–55CrossRefGoogle Scholar
  43. 43.
    Qiao L, Surlaker K, Das S et al (2013) On brewing fresh espresso: Linkedin’s distributed data serving platform. In: SIGMOD, ACM, pp 1135–1146Google Scholar
  44. 44.
    Sadalage PJ, Fowler M (2013) NoSQL distilled : a brief guide to the emerging world of polyglot persistence. Addison-Wesley, Upper Saddle RiverGoogle Scholar
  45. 45.
    Shapiro M, Preguica N, Baquero C et al (2011) A comprehensive study of convergent and commutative replicated data types. Ph.D. thesis, INRIAGoogle Scholar
  46. 46.
    Shukla D, Thota S, Raman K et al (2015) Schema-agnostic indexing with azure documentdb. PVLDB 8(12)Google Scholar
  47. 47.
    Sovran Y, Power R, Aguilera MK, Li J (2011) Transactional storage for geo-replicated systems. In: 23th SOSP, ACM, pp 385–400Google Scholar
  48. 48.
    Stonebraker M, Madden S, Abadi DJ et al (2007) The end of an architectural era: (it’s time for a complete rewrite). In: 33rd VLDB, pp 1150–1160Google Scholar
  49. 49.
    Wiese L et al (2015) Advanced Data Management: For SQL. Cloud and Distributed Databases. Walter de Gruyter GmbH & Co KG, NoSQLGoogle Scholar
  50. 50.
    Zhang H, Chen G et al (2015) In-memory big data management and processing: a survey. TKDEGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Felix Gessert
    • 1
  • Wolfram Wingerath
    • 1
  • Steffen Friedrich
    • 1
  • Norbert Ritter
    • 1
  1. 1.Universität HamburgHamburgGermany

Personalised recommendations