Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Big Data Platforms for Data Analytics

  • Volker Markl
  • Vineyak Borkar
  • Matei Zaharia
  • Till Westmann
  • Alexander Alexandrov
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80645

Synonyms

Big data management systems; Data intensive computing software; Predictive analytics platforms

Definition

Due to the volume, velocity, and variety of data now coming from the Web, social media, and personal devices, the analysis of “Big Data” has become a priority. A number of software platforms have been developed to support the analysis of massive data sets using clusters of computers working in parallel. These platforms fall into two categories: those based on the relational data model and its SQL query language, and those with more flexible data models and query languages tailored to less rigidly structured data. The latter category is referred to here as Big Data Platforms. (SQL analytics on Big Data are covered separately.)

Historical Background

Today’s platforms for Big Data Analytics are the result of technical work carried out in two computer systems software fields: database systems and distributed systems.

Parallel Databases

In the field of database systems, the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Data, data everywhere. The Economist; 25 Feb 2010.Google Scholar
  2. 2.
    Alexandrov A, Bergmann R, Ewen S, Freytag J-C, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V, Naumann F, Peters M, Rheinländer A, Sax M, Schelter S, Höger M, Tzoumas K, Warneke D. The stratosphere platform for big data analytics. VLDB J. 2014;(6):1–26.Google Scholar
  3. 3.
    Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar VR, Bu Y, Carey MJ, Cetindil I, Cheelangi M, Faraaz K, Gabrielova E, Grover R, Heilbron Z, Kim Y, Li C, Li G, Ok JM, Onose N, Pirzadeh P, Tsotras VJ, Vernica R, Wen J, Westmann T. Asterixdb: a scalable, open source BDMS. Proc VLDB Endow. 2014;7(14):1905–16.CrossRefGoogle Scholar
  4. 4.
    Borkar VR, Carey MJ, Grover R, Onose N, Vernica R. Hyracks: a flexible and extensible foundation for data-intensive computing. In: Abiteboul S, Böhm K, Koch C, Tan K-L, editors. Proceedings of the 27th International Conference on Data Engineering; 2011. p. 1151–62.Google Scholar
  5. 5.
    Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–77.CrossRefGoogle Scholar
  6. 6.
    DeWitt DJ, Gray J. Parallel database systems: the future of high performance database systems. Commun ACM. 1992;35(6):85–98.CrossRefGoogle Scholar
  7. 7.
    Ghemawat S, Gobioff H, Leung S. The Google File System. In: Scott ML, Peterson LL, editors. Proceedings of the 19th ACM Symposium on Operating Systems Principles; 2003. p. 29–43.Google Scholar
  8. 8.
    Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I. Graphx: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation; 2014.Google Scholar
  9. 9.
    Graefe G. Query evaluation techniques for large databases. ACM Comput Surv. 1993;25(2): 73–169.CrossRefGoogle Scholar
  10. 10.
    Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building blocks. In: Ferreira P, Gross TR, Veiga L, editors. Proceedings of the 2007 EuroSys Conference; 2007. p. 59–72.Google Scholar
  11. 11.
    Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow. 2010;3(1):330–39.CrossRefGoogle Scholar
  12. 12.
    Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda PK, Currey J. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: Draves R, van Renesse R, editors. Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation; 2008. p. 1–14.Google Scholar
  13. 13.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Gribble SD, Katabi D, editors. Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation; 2012. p. 15–28.Google Scholar
  14. 14.
    Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating System Principles; 2013.Google Scholar
  15. 15.
    Zhou J, Bruno N, Wu M, Larson P, Chaiken R, Shakib D. SCOPE: parallel databases meet MapReduce. VLDB J. 2012;21(5):611–36.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Volker Markl
    • 1
  • Vineyak Borkar
    • 2
  • Matei Zaharia
    • 3
  • Till Westmann
    • 4
  • Alexander Alexandrov
    • 5
  1. 1.IBM Almaden Research CenterSan JoseUSA
  2. 2.CTO and VP of EngineeringX15 SoftwareSan FranciscoUSA
  3. 3.Douglas T. Ross Career Development Professor of Software TechnologyMIT CSAILCambridgeUSA
  4. 4.Oracle LabsRedwood CityUSA
  5. 5.Database and Information Management (DIMA)Institute of Software Engineering and Theoretical Computer ScienceBerlinGermany