Abstract
Table layout determines the way how the relational row-column data values are organized and stored. In recent years, considerable candidates have been developed in MapReduce based query systems; they differ on storage space utilization, data loading time, query performance and so on. In most time, users are confronted with the problem of choosing the comprehensive optimum table layout given the workloads and the schema of tables. The straightforward way to run queries on generated data and compare the results is time consuming, and incurs the inaccuracy due to the MapReduce’s nondeterministic execution runtime. In this paper, we propose a lightweight framework to evaluate table layouts without running the query. The framework adopts the black box method to test critical metrics, and the query aware strategy that extracts table-layout-related operations from query. Based on the metrics and operations, the framework makes suggestions to users. We conduct extensive experiments to empirically study the popular table layouts. Through the results illustration, we discover that column projection and compression are the most two prominent factors for general cases. Moreover, we discuss optimization chances for the intermediate tables produced in high level language systems.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apache Hadoop. http://hadoop.apache.org/
Parquet. A Columnar Storage Format for Hadoop. http://parquet.io/
Zebra. Columnar Storage Format. https://wiki.apache.org/pig/zebra
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI Conference (2004)
Thusoo, A., Sarma, J.S., Jain, N., et al.: Hive-a petabyte scale data warehouse using hadoop. In: ICDE Conference (2010)
Olston, C., Reed, B., Srivastava, U., et al.: Pig Latin: a not so foreign language for data processing. In: SIGMOD (2008)
He, Y., et al.: RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: ICDE (2011)
Lin, Y., et al.: Llama: Leveraging columnar storage for scalable join processing in the MapReduce framework. In: SIGMOD (2011)
Avrilia, F., Patel, J.M., Shekita, E.J., et al.: Column-oriented storage techniques for MapReduce. In: VLDB (2011)
Guo, S., Xiong, J., Wang, W., et al.: Mastiff: A MapReduce-based system for time-based big data analytics. In: CLUSTER (2012)
Alekh, J., Quiane-Ruiz, J.-A., et al.: Trojan data layouts: right shoes for a running elephant. In: SOCC (2011)
Huai, Y., et al.: Understanding insights into the basic structure and essential issues of table placement methods in clusters. In: VLDB (2014)
Ramakrishnan, R., et al.: Database Management Systems. McGraw-Hill (2003)
Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: SIGMOD Conference, pp. 268–279 (1985)
Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., et al.: C-store: A column-oriented DBMS. In: VLDB (2005)
Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. Row-stores: How different are they really? In: SIGMOD (2008)
Tsirogiannis, D., Harizopoulos, S., Shah, M.A., et al.: Query Processing techniques for solid state drives. In: SIGMOD (2009)
Herodotou, H., Babu, S.: Profiling, what-if analysis, and Cost-based optimization of MapReduce programs. In: VLDB (2011)
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., et al.: MRShare: sharing across multiple queries in MapReduce. In: VLDB (2010)
Lim, H., Herodotou, H., et al.: Stubby: A transformation-based optimizer for MapReduce workflows. In: VLDB (2012)
Lee, R., Luo, T., Huai, Y., Wang, F., et al.: Ysmart: Yet another sql to mapreduce translator. In: ICDCS (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhu, F., Liu, J., Xu, L., Ye, D., Wei, J., Huang, T. (2015). A Lightweight Evaluation Framework for Table Layouts in MapReduce Based Query Systems. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-25255-1_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)