Abstract
Traditionally, Query Optimization and Compiler Optimizations have been developed independently. Whereas Query Optimization aims at optimizing database queries to minimize the number of disk operations, Compiler Optimizations target to maximize performance of generic executable codes. While query optimizers were originally designed for systems that needed to process large volumes of data with little main memory, the size of computer main memory has increased significantly. As a result, techniques are now being considered in the database community that have been developed in the field of compiler optimization. In this paper, we demonstrate that the converse is much more lucrative: extend compiler transformations to also target query optimization. By doing so, advanced compiler optimizations are employed as the driving force in query optimization and database systems can be on par with future complex computer architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Index sets should not be confused with database indexes and serve a different purpose. An index set is used to specify iteration in the forelem framework. Many index sets are only used in the intermediate representation and are not materialized.
- 2.
A description of how to translate SQL queries to forelem will be part of a forthcoming publication.
- 3.
Note that the update operation \(\mathbb {R} = \mathbb {R} \cup \ldots \) could be interpreted as an output dependency. However, the fact that \(\mathbb {R}\) is a multiset does not impose a strict order on the execution, therefore we do not consider this as a dependency.
References
Allen, F.E., Cocke, J.: A program data flow analysis procedure. Commun. ACM 19(3), 137–147 (1976)
Allen, J.R.: Dependence Analysis for Subscripted Variables and its Applications to Program Transformations. Ph.D. Dissertation, Rice University (1983)
Allen, R., Kennedy, K.: Automatic translation of fortran programs to vector form. ACM Trans. Program. Lang. Syst. 9, 491–542 (1987)
Andrade, H., Aryangat, S., Kurç, T.M., Saltz, J.H., Sussman, A.: Efficient execution of multi-query data analysis batches using compiler optimization strategies. In: LCPC, pp. 509–524 (2003)
Byler, M., Wolfe, M., Davies, J.R.B., Huson, C., Leasure, B.: Multiple version loops. In: ICPP, pp. 312–318 (1987)
Cloudera: Impala, August 2014. http://impala.io/
Fursin, G.G., O’Boyle, M., Knijnenburg, P.M.W.: Evaluating iterative compilation. In: Pugh, B., Tseng, C.-W. (eds.) LCPC 2002. LNCS, vol. 2481, pp. 362–376. Springer, Heidelberg (2005)
Kang, M.H., Dietz, H.G., Bhargava, B.K.: Multiple-query optimization at algorithm-level. Data Knowl. Eng. 14(1), 57–75 (1994)
Kennedy, K.: A survey of data flow analysis techniques, Muchnik, S.S., and Jones, N.D. (eds.), Program Flow Analysis: Theory and Applications, pp. 5–54. Prentice-Hall, Englewood Cliffs (1981)
Kennedy, K., McKinley, K.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, Alexandru, Padua, David A. (eds.) LCPC 1993. LNCS, vol. 768. Springer, Heidelberg (1994)
Knijnenburg, P., Kisuki, T., O’Boyle, M.: Combined selection of tile sizes and unroll factors using iterative compilation. J. Supercomput. 24(1), 43–67 (2003)
Krikellas, K., Viglas, S., Cintra, M.: Generating code for holistic query evaluation. In: ICDE, pp. 613–624 (2010)
Kuck, D.J., Kuhn, R.H., Padua, D.A., Leasure, B., Wolfe, M.: Dependence graphs and compiler optimizations. In: Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1981, pp. 207–218. ACM, New York (1981)
Lam, M.D., Rothberg, E.E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. SIGARCH Comput. Archit. News 19(2), 63–74 (1991)
Lieuwen, D.F.: Parallelizing loops in database programming languages. In: ICDE, pp. 86–93 (1998)
Lieuwen, D.F., DeWitt, D.J.: A transformation-based approach to optimizing loops in database programming languages. In: SIGMOD Conference, pp. 91–100 (1992)
MonetDB Project: MonetDB, February 2013. http://www.monetdb.org/
Neumann, T.: Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4, 539–550 (2011)
PostgreSQL Project: PostgreSQL: The world’s most advanced open source database, February 2013. http://www.postgresql.org/
Rietveld, K.F.D., Wijshoff, H.A.G.: Forelem: A versatile optimization framework for tuple-based computations. In: CPC 2013: 17th Workshop on Compilers for Parallel Computing, July 2013
Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of the Supercomputing 1993, pp. 410–419 (1993)
Transaction Processing Performance Council: TPC-H, May 2009. http://tpc.org/tpch/default.asp
Yach, D.P., Graham, J.D., Scian, A.F.: Database system with methodology for accessing a database from portable devices. US Patent #6341288, Jan 2002
Zima, H., Chapman, B.: Supercompilers for Parallel and Vector Computers. ACM, New York (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Rietveld, K.F.D., Wijshoff, H.A.G. (2015). Re-Engineering Compiler Transformations to Outperform Database Query Optimizers. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-17473-0_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)