Advertisement

A Survey on Parallel Database Systems from a Storage Perspective: Rows Versus Columns

  • Carlos Ordonez
  • Ladjel Bellatreche
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 903)

Abstract

Big data requirements have revolutionized database technology, bringing many innovative and revamped DBMSs to process transactional (OLTP) or demanding query workloads (cubes, exploration, pre-processing). Parallel and main memory processing have become important features to exploit new hardware and cope with data volume. With such landscape in mind, we present a survey comparing modern row and columnar DBMSs, contrasting their ability to write data (storage mechanisms, transaction processing, batch loading, enforcing ACID) and their ability to read data (query processing, physical operators, sequential vs parallel). We provide a unifying view of alternative storage mechanisms, database algorithms and query optimizations used across diverse DBMSs. We contrast the architecture and processing of a parallel DBMS with an HPC system. We cover the full spectrum of subsystems going from storage to query processing. We consider parallel processing and the impact of much larger RAM, which brings back main-memory databases. We then discuss important parallel aspects including speedup, sequential bottlenecks, data redistribution, high speed networks, main memory processing with larger RAM and fault-tolerance at query processing time. We outline an agenda for future research.

Notes

Acknowledgments

The first author thanks the guidance from Michael Stonebraker to understand query processing based on columnar storage, arrays of unlimited size to support mathematical analytics and lock-free transaction processing in main memory.

References

  1. 1.
    Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of ACM SIGMOD Conference, pp. 967–980 (2008)Google Scholar
  2. 2.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases: The Logical Level, Facsimile edn. Pearson Education POD, London (1994)Google Scholar
  3. 3.
    Abouzied, A., Bajda-Pawlikowski, K., Huang, J., Abadi, D.J., Silberschatz, A.: HadoopDB in action: building real world applications. In: Proceedings of ACM SIGMOD Conference, pp. 1111–1114. ACM (2010)Google Scholar
  4. 4.
    Bancilhon, F., Ramakrishnan, R.: An Amateur’s introduction to recursive query processing strategies. In: Proceedings of ACM SIGMOD Conference, pp. 16–52 (1986)Google Scholar
  5. 5.
    Baumann, P., Dumitru, A.M., Merticariu, V.: The array database that is not a database: file based array query answering in Rasdaman. In: Nascimento, M.A., et al. (eds.) SSTD 2013. LNCS, vol. 8098, pp. 478–483. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40235-7_32CrossRefzbMATHGoogle Scholar
  6. 6.
    Bellatreche, L., Benkrid, S., Ghazal, A., Crolotte, A., Cuzzocrea, A.: Verification of partitioning and allocation techniques on teradata DBMS. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011. LNCS, vol. 7016, pp. 158–169. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-24650-0_14CrossRefGoogle Scholar
  7. 7.
    Ceri, S., Della Valle, E., Pedreschi, D., Trasarti, R.: Mega-modeling for big data analytics. In: Atzeni, P., Cheung, D., Ram, S. (eds.) ER 2012. LNCS, vol. 7532, pp. 1–15. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-34002-4_1CrossRefGoogle Scholar
  8. 8.
    Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J., Welton, C.: MAD skills: new analysis practices for big data. In: Proceeidngs of VLDB Conference, pp. 1481–1492 (2009)Google Scholar
  9. 9.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  10. 10.
    DeWitt, D., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)CrossRefGoogle Scholar
  11. 11.
    Dongarra, J., Duff, I.S., Sorensen, D.C., van der Vost, H.A.: Numerical Linear Algebra for High-Performance Computers. SIAM (1998)Google Scholar
  12. 12.
    Färber, F., et al.: The SAP HANA database: an architecture overview. IEEE Data Eng. Bull. 35(1), 28–33 (2012)Google Scholar
  13. 13.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems: The Complete Book, 2nd edn. Prentice Hall, Upper Saddle River (2008)Google Scholar
  14. 14.
    Ghazal, A., et al.: BigBench: towards an industry standard benchmark for big data analytics. In: Proceedings of ACM SIGMOD Conference, pp. 1197–1208. ACM (2013)Google Scholar
  15. 15.
    Hameurlain, A., Morvan, F.: Parallel relational database systems: why, how and beyond. In: Wagner, R.R., Thoma, H. (eds.) DEXA 1996. LNCS, vol. 1134, pp. 302–312. Springer, Heidelberg (1996).  https://doi.org/10.1007/BFb0034690CrossRefGoogle Scholar
  16. 16.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006)zbMATHGoogle Scholar
  17. 17.
    Hellerstein, J., et al.: The MADlib analytics library or MAD skills, the SQL. Proc. VLDB 5(12), 1700–1711 (2012)CrossRefGoogle Scholar
  18. 18.
    Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, K.S., Kersten, M.L.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)Google Scholar
  19. 19.
    Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column stores. In: Proceedings of ACM SIGMOD Conference, pp. 297–308 (2009)Google Scholar
  20. 20.
    Jacobs, A.: The pathologies of big data. Commun. ACM 52(8), 36–44 (2009)CrossRefGoogle Scholar
  21. 21.
    Jemal, D., Faiz, R., Boukorca, A., Bellatreche, L.: MapReduce-DBMS: an integration model for big data management and optimization. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 430–439. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22852-5_36CrossRefGoogle Scholar
  22. 22.
    Lamb, A., et al.: The Vertica analytic database: C-store 7 years later. PVLDB 5(12), 1790–1801 (2012)MathSciNetGoogle Scholar
  23. 23.
    Larson, P.A., Hanson, E.N., Price, S.L.: Columnar storage in SQL server 2012. IEEE Data Eng. Bull. 35(1), 15–20 (2012)Google Scholar
  24. 24.
    MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of VLDB Conference, pp. 1227–1230 (2004)Google Scholar
  25. 25.
    Manegold, S., Boncz, P.A., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE Trans. Knowl. Data Eng. (TKDE) 14(4), 709–730 (2002)CrossRefGoogle Scholar
  26. 26.
    Ordonez, C.: Optimization of linear recursive queries in SQL. IEEE Trans. Knowl. Data Eng. (TKDE) 22(2), 264–277 (2010)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Data Eng. (TKDE) 22(12), 1752–1765 (2010)CrossRefGoogle Scholar
  28. 28.
    Ordonez, C., Chen, Z.: Horizontal aggregations in SQL to prepare data sets for data mining analysis. IEEE Trans. Knowl. Data Eng. (TKDE) 24(4), 678–691 (2012)CrossRefGoogle Scholar
  29. 29.
    Sismanis, Y., Deligiannakis, A., Roussopoulos, N., Kotidis, Y.: Dwarf: shrinking the petacube. In: ACM SIGMOD Conference, pp. 464–475 (2002)Google Scholar
  30. 30.
    Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes? Commun. ACM 53(1), 64–71 (2010)CrossRefGoogle Scholar
  31. 31.
    Stonebraker, M., et al.: C-Store: a column-oriented DBMS. In: Proceedings of VLDB Conference, pp. 553–564 (2005)Google Scholar
  32. 32.
    Stonebraker, M., Brown, P., Zhang, D., Becla, J.: SciDB: a database management system for applications with complex analytics. Comput. Sci. Eng. 15(3), 54–62 (2013)CrossRefGoogle Scholar
  33. 33.
    Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: VLDB, pp. 1150–1160 (2007)Google Scholar
  34. 34.
    Tran, N., Bodagala, S., Dave, J.: Designing query optimizers for big data problems of the future. PVLDB 11(6), 1168–1169 (2013)Google Scholar
  35. 35.
    Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: SQL and rich analytics at scale. In: Proceedings of ACM SIGMOD Conference, pp. 13–24 (2013)Google Scholar
  36. 36.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud USENIX Workshop (2010)Google Scholar
  37. 37.
    Zukowski, M., Boncz, P.: Vectorwise: beyond column stores. IEEE Data Eng. Bull. 35(1), 21–27 (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of HoustonHoustonUSA
  2. 2.LIAS/ISAE-ENSMAPoitiersFrance

Personalised recommendations