Big Data Storage and Data Models

  • Dongyao WuEmail author
  • Sherif Sakr
  • Liming Zhu


Data and storage models are the basis for big data ecosystem stacks. While storage model captures the physical aspects and features for data storage, data model captures the logical representation and structures for data processing and management. Understanding storage and data model together is essential for understanding the built-on big data ecosystems. In this chapter we are going to investigate and compare the key storage and data models in the spectrum of big data frameworks.


Hadoop Distribute File System Distribute File System Network File System Document Store Column Family 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    S. Sakr, M. Medhat Gaber (eds.), Large Scale and Big Data - Processing and Management (Auerbach Publications, Boston, 2014)Google Scholar
  2. 2.
    S. Sakr, A. Liu, A.G. Fayoumi, The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)CrossRefGoogle Scholar
  3. 3.
    J. Satran, K. Meth, Internet small computer systems interface (iscsi) (2004)Google Scholar
  4. 4.
    SCSI Protocol. Information technologyscsi architecture model5 (sam-5). INCITS document, 10Google Scholar
  5. 5.
    S. Hopkins, B. Coile, Aoe (ata over ethernet). The Brantley Coile Company, Inc., Technical report AoEr11, 2009Google Scholar
  6. 6.
    ATA Serial. High-speed serialized at attachment. Serial ATA working group, available at (2001)
  7. 7.
    EBS Amazon. Elastic block store has launched all things distributed (2008).
  8. 8.
    EC2 Amazon. Amazon elastic compute cloud (amazon ec2), Amazon Elastic Compute Cloud (Amazon EC2) (2010)Google Scholar
  9. 9.
    RDS Amazon. Amazon relational database service (amazon rds). Accessed 27 Feb 2016
  10. 10.
    S. Sivasubramanian, Amazon dynamodb: a seamlessly scalable non-relational database service. in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (ACM, New York, 2012), pp. 729–730Google Scholar
  11. 11.
    Amazon. Amazon cloudsearch service. Accessed 27 Feb 2016
  12. 12.
    O. Sefraoui, M. Aissaoui, M. Eleuldj, Openstack: toward an open-source solution for cloud computing. Intern. J. Comput. Appl. 55(3), 38–42 (2012)Google Scholar
  13. 13.
    K. Pepple, Openstack nova architecture. Viitattu 25, 2012 (2011)Google Scholar
  14. 14.
    OpenStack. Openstack block storage cinder. Accessed 27 Feb 2016
  15. 15.
    K. Shvachko, H. Kuang, S. Radia, R. Chansler, The Hadoop distributed file system. in IEEE MSST (2010)Google Scholar
  16. 16.
    S. Sakr, Big Data 2.0 Processing Systems (Springer, Switzerland, 2016)CrossRefGoogle Scholar
  17. 17.
    K. Goda, Network attached secure device. in Encyclopedia of Database Systems (Springer, New York, 2009), pp. 1899–1900Google Scholar
  18. 18.
    S3 Amazon. Amazon simple storage service(amazon s3). Accessed 27 Feb 2016
  19. 19.
    Azure Microsoft. Microsoft azure: Cloud computing platform and services. Accessed 27 Feb 2016
  20. 20.
    Atoms EMC. Atmos - cloud storage, big data - emc. Accessed 27 Feb 2016
  21. 21.
    Swift OpenStack. Openstack swift - enterprise storage from swiftstack. Accessed 27 Feb 2016
  22. 22.
    E.A. Brewer, Towards robust distributed systems. in Proceedings of the PODC, vol. 7 (2000)Google Scholar
  23. 23.
    J. Gray et al., The transaction concept: virtues and limitations. in Proceedings of the VLDB, vol. 81 (1981), pp. 144–154Google Scholar
  24. 24.
    A.B. MySQL, MySQL: The World’s Most Popular Open Source Database (MySQL AB, 1995)Google Scholar
  25. 25.
    K. Loney, Oracle Database 10g: The Complete Reference (McGraw-Hill/Osborne, London, 2004)Google Scholar
  26. 26.
  27. 27.
    PostgreSQL Datatype. Postgresql: the world’s most advanced open source database. Accessed 27 Feb 2016
  28. 28.
    D. Pritchett, Base: an acid alternative. Queue 6(3), 48–55 (2008)CrossRefGoogle Scholar
  29. 29.
    J. Zawodny, Redis: lightweight key/value store that goes the extra mile. Linux Mag. 79, (2009)Google Scholar
  30. 30.
    B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004(124), 5 (2004)Google Scholar
  31. 31.
    MongoDB Inc. Mongodb for giant ideas. Accessed 27 Feb 2016
  32. 32.
    Apache. Apache couchdb. Accessed 27 Feb 2016
  33. 33.
    P.A. Bernstein, N. Goodman, Concurrency control in distributed database systems. ACM Comput. Surv. (CSUR) 13(2), 185–221 (1981)Google Scholar
  34. 34.
    F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach, M. Burrows, T. Chandra, A. Fikes, R.E. Gruber, Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)CrossRefGoogle Scholar
  35. 35.
    S. Ghemawat, H. Gobioff, S.-T. Leung, The google file system. in ACM SIGOPS Operating Systems Review, vol. 37 (ACM, Bolton Landing, 2003), pp. 29–43Google Scholar
  36. 36.
    L. George, HBase: The Definitive Guide (O’Reilly Media, Inc., Sebastopol, 2011)Google Scholar
  37. 37.
    P. Hunt, M. Konar, F.P. Junqueira, B. Reed, Zookeeper: wait-free coordination for internet-scale systems. in USENIX Annual Technical Conference, vol. 8 (2010), p. 9Google Scholar
  38. 38.
    A. Lakshman, P. Malik, Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)CrossRefGoogle Scholar
  39. 39.
    M. Ronstrom, L. Thalmann, Mysql cluster architecture overview. MySQL Technical White Paper (2004)Google Scholar
  40. 40.
    M. Stonebraker, A. Weisberg, The voltdb main memory dbms. IEEE Data Eng. Bull. 36(2), 21–27 (2013)Google Scholar
  41. 41.
    A. Lamb, M. Fuller, R. Varadarajan, N. Tran, B. Vandiver, L. Doshi, C. Bear, The vertica analytic database: C-store 7 years later. Proc. VLDB Endow. 5(12), 1790–1801 (2012)CrossRefGoogle Scholar
  42. 42.
    F. Fernández de Vega, E. Cantú-Paz, Parallel and Distributed Computational Intelligence, vol. 269 (Springer, Berlin, 2010)zbMATHGoogle Scholar
  43. 43.
    Microsoft. Sql database - relational database service. Accessed 27 Feb 2016
  44. 44.
    Google. Cloud sql - mysql relational database. Accessed 27 Feb 2016
  45. 45.
    Xeround. Xeround. Accessed 27 Feb 2016
  46. 46.
    EnterpriseDB. Enterprisedb - the postgres database company. Accessed 27 Feb 2016

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Data61CSIROSydneyAustralia
  2. 2.School of Computer Science and EngineeringUniversity of New South WalesSydneyAustralia
  3. 3.National GuardKing Saud Bin Abdulaziz University for Health SciencesRiyadhSaudi Arabia

Personalised recommendations