Skip to main content

Modern Column Stores for Big Data Processing

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10721))

Abstract

The advent of MapReduce/Hadoop and NoSQL databases undermined the primacy of SQL relational databases for data processing. Pioneering work by researchers on MonetDB and C-Store opened up the world of column stores that retain the SQL model but use different store and engine for performance gains. The emergence of pay-by-use clouds and MPP versions of column stores on cloud eliminated scale-out issues of row stores. Data mining researchers have also shown that SQL on parallel, columnar database could be a candidate for Big Data analytics. In this survey written for a tutorial, we trace the technology evolution and history of the fall of row stores and rise of column stores, delving into architectural details of column DBs from academia and industry.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chamberlin, D.D., et al.: A history and evaluation of System R. Commun. ACM 24(10), 632–646 (1981)

    Article  Google Scholar 

  2. Graeffe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(6), 73–170 (1993)

    Article  Google Scholar 

  3. Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)

    Article  Google Scholar 

  4. Pavlo, A., Aslett, M.: What’s really new with NewSQL? ACM SIGMOD Record 45(2), 45–55 (2016)

    Article  Google Scholar 

  5. Chen, M., Mao, S., Liu, Y.: Big data: a survey, mobile network applications. Mob. Netw. Appl. 19, 171–209 (2014). Springer Science

    Article  Google Scholar 

  6. Strauch, C.: NoSQL databases, selected topics on software-technology ultra-large scale sites. Stuttgart Media University, pp. 1–149 (2011). http://www.christof-strauch.de/nosqldbs.pdf

  7. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: USENIX OSDI 2004, pp. 137–149 (2004)

    Google Scholar 

  8. Ailamaki, A., Dewitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of 25th VLDB (VLDB 1999), pp. 266–277 (1999)

    Google Scholar 

  9. Brewer, E.: Towards robust distributed systems. In: 19th ACM Symposium on Principles of Distributed Computing (PODC 2000), Portland, USA, pp. 7–10 (2000)

    Google Scholar 

  10. Stonebraker, M., Cetintemel, U.: One size fits all: an idea whose time has come and gone. In: IEEE International Conference on Data Engineering (ICDE 2005), pp. 2–11 (2005)

    Google Scholar 

  11. Sridhar, K.T.: Big data analytics using SQL: Quo Vadis? In: IFIP CONFENIS 2017, Shanghai, China, 13 p. (2017)

    Google Scholar 

  12. Idreos, S., et al.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)

    Google Scholar 

  13. Stonebraker, M., et al.: C-Store: a column oriented DBMS. In: Proceedings of Very Large Data Bases (VLDB 2005), Trundheim, Norway, pp. 553–564 (2005)

    Google Scholar 

  14. Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S.: The design and implementation of modern column oriented database systems. Found. Trends Database 5(3), 197–280 (2012)

    Article  Google Scholar 

  15. Pavlo, A., et al.: A comparison of approaches to large scale data analysis. In: ACM SIGMOD 2009, Providence, USA, pp. 165–178 (2009)

    Google Scholar 

  16. Mohan, C.: History repeats itself: sensible and NonsenSQL aspects of the NoSQL hoopla. In: Proceedings of EDBT/ICDT 2013, Genoa, Italy, pp. 11–16 (2013)

    Google Scholar 

  17. Brewer, E.: CAP twelve years later: how the “rules” have changed. IEEE Comput. 45(2), 23–29 (2012)

    Article  Google Scholar 

  18. Grolinger, K., et al.: Challenges for MapReduce in big data. In: IEEE SERVICES 2014, Anchorage, USA, pp. 182–189 (2014)

    Google Scholar 

  19. Wayner, P.: 7 Hard truths about the NoSQL revolution. InfoWorld, July 2012

    Google Scholar 

  20. Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: ACM SIGMOD 1985, Austin, USA, pp. 268–279 (1985)

    Google Scholar 

  21. French, C.D.: Teaching an OLTP database kernel advanced data warehousing techniques. In: IEEE International Conference on Data Engineering (ICDE 1997), pp. 194–198 (1997)

    Google Scholar 

  22. MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of Very Large Data Bases (VLDB 2004), Toronto, Canada, pp. 1227–1230 (2004)

    Google Scholar 

  23. Boncz, P., Martin, L., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)

    Article  Google Scholar 

  24. Manegold, S., Kersten M.L., Boncz, P.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. In: Proceedings of the VLDB Endowment (VLDB 2009), Lyon, France (2009). PVLDB 2(2), 1648–1653

    Google Scholar 

  25. Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipeining query execution. In: ACM CIDR 2005, Asilomar, USA, 13 p. (2005)

    Google Scholar 

  26. Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating compression and execution in column-oriented database systems. In: ACM SIGMOD 2006, Chicago, USA, pp. 671–682 (2006)

    Google Scholar 

  27. Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: ACM SIGMOD 2008, Vancouver, Canada, pp. 967–980 (2008)

    Google Scholar 

  28. Sridhar, K.T., Sakkeer, M.A.: Optimizing database load and extract for big data era. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014. LNCS, vol. 8422, pp. 503–512. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05813-9_34

    Chapter  Google Scholar 

  29. Sridhar, K.T.: Reliability techniques for MPP SQL database product engineering. In: IEEE ICSRS 2017, Milan, Italy, 6 p., December 2017, to appear

    Google Scholar 

  30. Ordonez, C.: Programming the K-means clustering algorithm in SQL. In: AAAI KDD 2004, Seattle, USA, pp. 823–828 (2004)

    Google Scholar 

  31. Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Eng. 22(12), 1752–1765 (2010)

    Article  Google Scholar 

  32. Graeffe, G., Fayyad, U., Chaudhuri, S.: On the efficient gathering of sufficient statistics from large SQL databases. In: AAAI KDD 1998, pp. 100–105 (1998)

    Google Scholar 

  33. Ordonez, C.: Can we analyze big data inside a DBMS? In: Proceedings of 16th International ACM Workshop on Data Warehousing and OLAP (DOLAP 2013), pp. 85–92 (2013)

    Google Scholar 

  34. Jindal, A., Madden, S., Castellanos, M., Hsu, M.: Graph analytics using the Vertica relational database. In: IEEE Big Data, Santa Clara, USA, pp. 1191–1200 (2015)

    Google Scholar 

  35. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: alternatives and implications. In: ACM SIGMOD 1998, Seattle, USA, pp. 343–354 (1998)

    Google Scholar 

  36. Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and kNN-joins in large relational databases (almost) for free. In: IEEE ICDE 2010, pp. 4–15 (2010)

    Google Scholar 

  37. Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008). Springer

    Article  Google Scholar 

  38. Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. T. Sridhar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sridhar, K.T. (2017). Modern Column Stores for Big Data Processing. In: Reddy, P., Sureka, A., Chakravarthy, S., Bhalla, S. (eds) Big Data Analytics. BDA 2017. Lecture Notes in Computer Science(), vol 10721. Springer, Cham. https://doi.org/10.1007/978-3-319-72413-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-72413-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-72412-6

  • Online ISBN: 978-3-319-72413-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics