Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Columnar Storage Formats

  • Avrilia FloratouEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_248

Definitions

Row Storage:

A data layout that contiguously stores the values belonging to the columns that make up the entire row.

Columnar Storage:

A data layout that contiguously stores values belonging to the same column for multiple rows.

Overview

Fast analytics over Hadoop data has gained significant traction over the last few years, as multiple enterprises are using Hadoop to store data coming from various sources including operational systems, sensors and mobile devices, and web applications. Various Big Data frameworks have been developed to support fast analytics on top of this data and to provide insights in near real time.

A crucial aspect in delivering high performance in such large-scale environments is the underlying data layout. Most Big Data frameworks are designed to operate on top of data stored in various formats, and they are extensible enough to incorporate new data formats. Over the years, a plethora of open-source data formats have been designed to support the...

This is a preview of subscription content, log in to check access.

References

  1. Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases (VLDB’01), pp 169–180Google Scholar
  2. Apache Arrow (2017) Apache Arrow. https://arrow.apache.org/
  3. Apache Hadoop (2017) Apache Hadoop. http://hadoop.apache.org
  4. Apache Hadoop HDFS (2017) Apache Hadoop HDFS. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
  5. Apache Hbase (2017) Apache HBase. https://hbase.apache.org/
  6. Apache Hive (2017) Apache Hive. https://hive.apache.org/
  7. Apache Kudu (2017) Apache Kudu. https://kudu.apache.org/
  8. Apache ORC (2017) Apache ORC. https://orc.apache.org/
  9. Apache Parquet (2017) Apache Parquet. https://parquet.apache.org/
  10. Apache Pig (2017) Apache Pig. https://pig.apache.org/
  11. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  12. Floratou A, Patel JM, Shekita EJ, Tata S (2011) Column-oriented Storage Techniques for MapReduce. Proc VLDB Endow 4(7):419–429CrossRefGoogle Scholar
  13. Floratou A, Minhas UF, Özcan F (2014) SQL-on-Hadoop: full circle back to shared-nothing database architectures. Proc VLDB Endow 7(12):1295–1306CrossRefGoogle Scholar
  14. He Y, Lee R, Huai Y, Shao Z, Jain N, Zhang X, Xu Z (2011) RCFile: a fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: Proceedings of the 2011 IEEE 27th international conference on data engineering (ICDE’11). IEEE Computer Society, pp 1199–1208Google Scholar
  15. Huai Y, Ma S, Lee R, O’Malley O, Zhang X (2013) Understanding insights into the basic structure and essential issues of table placement methods in clusters. Proc VLDB Endow 6(14):1750–1761CrossRefGoogle Scholar
  16. Kornacker M, Behm A, Bittorf V, Bobrovytsky T, Ching C, Choi A, Erickson J, Grund M, Hecht D, Jacobs M, Joshi I, Kuff L, Kumar D, Leblang A, Li N, Pandis I, Robinson H, Rorke D, Rus S, Russell J, Tsirogiannis D, Wanderman-Milne S, Yoder M (2015) Impala: a modern, open-source SQL engine for hadoop. In: CIDRGoogle Scholar
  17. Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow 3(1–2):330–339CrossRefGoogle Scholar
  18. ORC Encodings (2017) ORC Encodings. https://orc.apache.org/docs/run-length.html
  19. Snappy Compression (2017) Snappy Compression. https://en.wikipedia.org/wiki/Snappy_(compression)
  20. Vertica (2017) Vertica. https://www.vertica.com/
  21. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI, USENIXGoogle Scholar
  22. ZLIB Compression (2017) ZLIB Compression. https://en.wikipedia.org/wiki/Zlib

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.MicrosoftSunnyvaleUSA