Modern Column Stores for Big Data Processing

Sridhar, K. T.

doi:10.1007/978-3-319-72413-3_8

Modern Column Stores for Big Data Processing

K. T. Sridhar^17,18

Conference paper
First Online: 25 November 2017

2325 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10721))

Abstract

The advent of MapReduce/Hadoop and NoSQL databases undermined the primacy of SQL relational databases for data processing. Pioneering work by researchers on MonetDB and C-Store opened up the world of column stores that retain the SQL model but use different store and engine for performance gains. The emergence of pay-by-use clouds and MPP versions of column stores on cloud eliminated scale-out issues of row stores. Data mining researchers have also shown that SQL on parallel, columnar database could be a candidate for Big Data analytics. In this survey written for a tutorial, we trace the technology evolution and history of the fall of row stores and rise of column stores, delving into architectural details of column DBs from academia and industry.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Chamberlin, D.D., et al.: A history and evaluation of System R. Commun. ACM 24(10), 632–646 (1981)
Article Google Scholar
Graeffe, G.: Query evaluation techniques for large databases. ACM Comput. Surv. 25(6), 73–170 (1993)
Article Google Scholar
Chaudhuri, S., Dayal, U., Narasayya, V.: An overview of business intelligence technology. Commun. ACM 54(8), 88–98 (2011)
Article Google Scholar
Pavlo, A., Aslett, M.: What’s really new with NewSQL? ACM SIGMOD Record 45(2), 45–55 (2016)
Article Google Scholar
Chen, M., Mao, S., Liu, Y.: Big data: a survey, mobile network applications. Mob. Netw. Appl. 19, 171–209 (2014). Springer Science
Article Google Scholar
Strauch, C.: NoSQL databases, selected topics on software-technology ultra-large scale sites. Stuttgart Media University, pp. 1–149 (2011). http://www.christof-strauch.de/nosqldbs.pdf
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: USENIX OSDI 2004, pp. 137–149 (2004)
Google Scholar
Ailamaki, A., Dewitt, D.J., Hill, M.D., Wood, D.A.: DBMSs on a modern processor: where does time go? In: Proceedings of 25th VLDB (VLDB 1999), pp. 266–277 (1999)
Google Scholar
Brewer, E.: Towards robust distributed systems. In: 19th ACM Symposium on Principles of Distributed Computing (PODC 2000), Portland, USA, pp. 7–10 (2000)
Google Scholar
Stonebraker, M., Cetintemel, U.: One size fits all: an idea whose time has come and gone. In: IEEE International Conference on Data Engineering (ICDE 2005), pp. 2–11 (2005)
Google Scholar
Sridhar, K.T.: Big data analytics using SQL: Quo Vadis? In: IFIP CONFENIS 2017, Shanghai, China, 13 p. (2017)
Google Scholar
Idreos, S., et al.: MonetDB: two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35(1), 40–45 (2012)
Google Scholar
Stonebraker, M., et al.: C-Store: a column oriented DBMS. In: Proceedings of Very Large Data Bases (VLDB 2005), Trundheim, Norway, pp. 553–564 (2005)
Google Scholar
Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S.: The design and implementation of modern column oriented database systems. Found. Trends Database 5(3), 197–280 (2012)
Article Google Scholar
Pavlo, A., et al.: A comparison of approaches to large scale data analysis. In: ACM SIGMOD 2009, Providence, USA, pp. 165–178 (2009)
Google Scholar
Mohan, C.: History repeats itself: sensible and NonsenSQL aspects of the NoSQL hoopla. In: Proceedings of EDBT/ICDT 2013, Genoa, Italy, pp. 11–16 (2013)
Google Scholar
Brewer, E.: CAP twelve years later: how the “rules” have changed. IEEE Comput. 45(2), 23–29 (2012)
Article Google Scholar
Grolinger, K., et al.: Challenges for MapReduce in big data. In: IEEE SERVICES 2014, Anchorage, USA, pp. 182–189 (2014)
Google Scholar
Wayner, P.: 7 Hard truths about the NoSQL revolution. InfoWorld, July 2012
Google Scholar
Copeland, G.P., Khoshafian, S.N.: A decomposition storage model. In: ACM SIGMOD 1985, Austin, USA, pp. 268–279 (1985)
Google Scholar
French, C.D.: Teaching an OLTP database kernel advanced data warehousing techniques. In: IEEE International Conference on Data Engineering (ICDE 1997), pp. 194–198 (1997)
Google Scholar
MacNicol, R., French, B.: Sybase IQ multiplex - designed for analytics. In: Proceedings of Very Large Data Bases (VLDB 2004), Toronto, Canada, pp. 1227–1230 (2004)
Google Scholar
Boncz, P., Martin, L., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008)
Article Google Scholar
Manegold, S., Kersten M.L., Boncz, P.: Database architecture evolution: mammals flourished long before dinosaurs became extinct. In: Proceedings of the VLDB Endowment (VLDB 2009), Lyon, France (2009). PVLDB 2(2), 1648–1653
Google Scholar
Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipeining query execution. In: ACM CIDR 2005, Asilomar, USA, 13 p. (2005)
Google Scholar
Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating compression and execution in column-oriented database systems. In: ACM SIGMOD 2006, Chicago, USA, pp. 671–682 (2006)
Google Scholar
Abadi, D.J., Madden, S.R., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: ACM SIGMOD 2008, Vancouver, Canada, pp. 967–980 (2008)
Google Scholar
Sridhar, K.T., Sakkeer, M.A.: Optimizing database load and extract for big data era. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014. LNCS, vol. 8422, pp. 503–512. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-05813-9_34
Chapter Google Scholar
Sridhar, K.T.: Reliability techniques for MPP SQL database product engineering. In: IEEE ICSRS 2017, Milan, Italy, 6 p., December 2017, to appear
Google Scholar
Ordonez, C.: Programming the K-means clustering algorithm in SQL. In: AAAI KDD 2004, Seattle, USA, pp. 823–828 (2004)
Google Scholar
Ordonez, C.: Statistical model computation with UDFs. IEEE Trans. Knowl. Eng. 22(12), 1752–1765 (2010)
Article Google Scholar
Graeffe, G., Fayyad, U., Chaudhuri, S.: On the efficient gathering of sufficient statistics from large SQL databases. In: AAAI KDD 1998, pp. 100–105 (1998)
Google Scholar
Ordonez, C.: Can we analyze big data inside a DBMS? In: Proceedings of 16th International ACM Workshop on Data Warehousing and OLAP (DOLAP 2013), pp. 85–92 (2013)
Google Scholar
Jindal, A., Madden, S., Castellanos, M., Hsu, M.: Graph analytics using the Vertica relational database. In: IEEE Big Data, Santa Clara, USA, pp. 1191–1200 (2015)
Google Scholar
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: alternatives and implications. In: ACM SIGMOD 1998, Seattle, USA, pp. 343–354 (1998)
Google Scholar
Yao, B., Li, F., Kumar, P.: K nearest neighbor queries and kNN-joins in large relational databases (almost) for free. In: IEEE ICDE 2010, pp. 4–15 (2010)
Google Scholar
Wu, X., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–37 (2008). Springer
Article Google Scholar
Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4–6 (2012)
Article Google Scholar

Download references

Author information

Authors and Affiliations

XtremeData Technologies, Bangalore, India
K. T. Sridhar
XtremeData, Inc., Schaumburg, USA
K. T. Sridhar

Authors

K. T. Sridhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. T. Sridhar .

Editor information

Editors and Affiliations

International Institute of Information Technology, Hyderabad, India
P. Krishna Reddy
Rajiv Gandhi Education City, Sonepat, India
Ashish Sureka
University of Texas at Arlington, Arlington, Texas, USA
Sharma Chakravarthy
University of Aizu, Aizu-Wakamatsu, Japan
Subhash Bhalla

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sridhar, K.T. (2017). Modern Column Stores for Big Data Processing. In: Reddy, P., Sureka, A., Chakravarthy, S., Bhalla, S. (eds) Big Data Analytics. BDA 2017. Lecture Notes in Computer Science(), vol 10721. Springer, Cham. https://doi.org/10.1007/978-3-319-72413-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-72413-3_8
Published: 25 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72412-6
Online ISBN: 978-3-319-72413-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics