Abstract
Distributed processing commonly requires data spread across machines using a priori static or hash-based data allocation. In this paper, we explore an alternative approach that starts from a master node in control of the complete database, and a variable number of worker nodes for delegated query processing. Data is shipped just-in-time to the worker nodes using a need to know policy, and is being reused, if possible, in subsequent queries. A bidding mechanism among the workers yields a scheduling with the most efficient reuse of previously shipped data, minimizing the data transfer costs.
Just-in-time data shipment allows our system to benefit from locally available idle resources to boost overall performance. The system is maintenance-free and allocation is fully transparent to users. Our experiments show that the proposed adaptive distributed architecture is a viable and flexible alternative for small scale MapReduce-type of settings.
Keywords
- Query Processing
- Master Node
- Query Optimization
- Query Plan
- Work Node
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bajda-Pawlikowski, K., Abadi, D.J., et al.: Efficient Processing of Data Warehousing Queries in a Split Execution Environment. In: SIGMOD, pp. 1165–1176 (2011)
Chambers, C., Raniwala, A., et al.: FlumeJava: easy, efficient data-parallel pipelines. In: PLDI, pp. 363–375 (2010)
Curino, C., Jones, E.P.C., et al.: Relational Cloud: a Database Service for the Cloud. In: CIDR, pp. 235–240 (2011)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI, pp. 137–150 (2004)
Elmore, A.J., Das, S., Agrawal, D., Abbadi, A.E.: Zephyr: live migration in shared nothing databases for elastic cloud platforms. In: SIGMOD Conference, pp. 301–312 (2011)
Floratou, A., Patel, J.M., Shekita, E.J., Tata, S.: Column-Oriented Storage Techniques for MapReduce. In: VLDB, pp. 419–429 (2011)
Franklin, M.J., Jónsson, B.T., Kossmann, D.: Performance tradeoffs for client-server query processing. In: SIGMOD Conference, pp. 149–160 (1996)
Goncalves, R., Kersten, M.L.: The data cyclotron query processing scheme. In: EDBT, pp. 75–86 (2010)
Hadoop (2012), http://hadoop.apache.org/
Herodotou, H., Lim, H., et al.: Starfish: A self-tuning system for big data analytics. In: CIDR (2011)
Ivanova, M., Kersten, M.L., Nes, N.J., Goncalves, R.: An architecture for recycling intermediates in a column-store. ACM Trans. Database Syst. 35(4), 24 (2010)
Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The Performance of MapReduce: An In-depth Study. PVLDB 3(1), 472–483 (2010)
Kossmann, D., Franklin, M.J., Drasch, G.: Cache investment: integrating query optimization and distributed data placement. ACM Trans. Database Syst. 25(4), 517–558 (2000)
Olston, C., Reed, B., et al.: et al. Pig latin: a not-so-foreign language for data processing. In: SIGMOD Conference, pp. 1099–1110 (2008)
Olston, C., Reed, B., Silberstein, A., Srivastava, U.: Automatic optimization of parallel dataflow programs. In: USENIX Annual Technical Conference, pp. 267–273 (2008)
Pavlo, A., Paulson, E., et al.: A Comparison of Approaches to Large-scale Data Analysis. In: SIGMOD Conference, pp. 165–178 (2009)
Plattner, C., Alonso, G., Özsu, M.T.: Extending DBMSs with Satellite Databases. VLDB J. 17(4), 657–682 (2008)
Raman, V., Han, W., Narang, I.: Parallel querying with non-dedicated computers. In: VLDB, pp. 61–72 (2005)
Röhm, U., Böhm, K., Schek, H.-J.: Cache-Aware Query Routing in a Cluster of Databases. In: ICDE, pp. 641–650 (2001)
Thusoo, A., Sarma, J.S., et al.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2, 1626–1629 ( August 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ivanova, M., Kersten, M., Groffen, F. (2012). Just-In-Time Data Distribution for Analytical Query Processing. In: Morzy, T., Härder, T., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2012. Lecture Notes in Computer Science, vol 7503. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33074-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-33074-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33073-5
Online ISBN: 978-3-642-33074-2
eBook Packages: Computer ScienceComputer Science (R0)
