Abstract
Relational query languages rely heavily on costly join operations to combine tuples from multiple tables into a single resulting tuple. In many cases, the cost of query evaluation can be reduced by manually optimizing (parts of) queries to use cheaper semi-joins instead of joins. Unfortunately, existing database products can only apply such optimizations automatically in rather limited cases.
To improve on this situation, we propose a framework for automatic query optimization via weak-equivalent rewrite rules for a multiset relational algebra (that serves as a faithful formalization of core SQL). The weak-equivalent rewrite rules we propose aim at replacing joins by semi-joins. To further maximize their usability, these rewrite rules do so by only providing “weak guarantees” on the evaluation results of rewritten queries. We show that, in the context of certain operators, these weak-equivalent rewrite rules still provide strong guarantees on the final evaluation results of the rewritten queries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Every occurrence of the operators \(\cup \), \(\cap \), and \(-\) in this paper is to be interpreted using standard set semantics.
- 2.
We only consider conjunctions of conditions in the selection operator because more general boolean combinations of conditions do not provide additional opportunities for rewriting in our framework.
- 3.
The set operators we define have the same semantics as the UNION ALL, INTERSECT ALL, and EXCEPT ALL operators of standard SQL [8]. The semantics of UNION, INTERSECT, and EXCEPT can be obtained using deduplication.
- 4.
To simplify presentation, the \(\theta \)-join and the \(\theta \)-semi-join also perform equi-join on all attributes common to the multiset relations involved.
- 5.
The max-union operators is inspired by the \(\max \)-based multiset relation union [2].
- 6.
- 7.
We could have defined a multiset semi-join operator that does take into account the number of occurrences of tuples in \(\llbracket \textit{e}_2 \rrbracket _{\mathfrak {I}}\). With such a semi-join operator, we would no longer be able to sharply reduce the size of intermediate query results, however, and lose some potential to optimize query evaluation.
- 8.
Notice that \(\langle \mathscr {R}_1; \tau _1 \rangle \mathrel {\tilde{\subseteq }}\langle \mathscr {R}_2; \tau _2 \rangle \) does not imply that \(\langle \mathscr {R}_1; \tau _1 \rangle \) is fully included in \(\langle \mathscr {R}_2; \tau _2 \rangle \): there can be tuples \(\textsf {t}\in \mathscr {R}_1\) for which \(\tau _1(\textsf {t}) > \tau _2(\textsf {t})\).
References
Abiteboul, S., Hull, R., Vianu, V. (eds.): Foundations of Databases, 1st edn. Addison-Wesley Publishing Company, Boston (1995)
Albert, J.: Algebraic properties of bag data types. In: Proceedings of the 17th International Conference on Very Large Data Base, pp. 211–219. VLDB 1991, Morgan Kaufmann Publishers Inc. (1991)
Bernstein, P.A., Chiu, D.M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981). https://doi.org/10.1145/322234.322238
Dayal, U., Goodman, N., Katz, R.H.: An extended relational algebra with control over duplicate elimination. In: Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pp. 117–123. PODS 1982, ACM (1982). https://doi.org/10.1145/588111.588132
Grefen, P.W.P.J., de By, R.A.: A multi-set extended relational algebra: a formal approach to a practical issue. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering, pp. 80–88. IEEE (1994). https://doi.org/10.1109/ICDE.1994.283002
Hellings, J., Pilachowski, C.L., Van Gucht, D., Gyssens, M., Wu, Y.: From relation algebra to semi-join algebra: an approach for graph query optimization. In: Proceedings of The 16th International Symposium on Database Programming Languages. ACM (2017). https://doi.org/10.1145/3122831.3122833
Hellings, J., Pilachowski, C.L., Van Gucht, D., Gyssens, M., Wu, Y.: From relation algebra to semi-join algebra: an approach to graph query optimization. Comput. J. 64(5), 789–811 (2020). https://doi.org/10.1093/comjnl/bxaa031
International Organization for Standardization: ISO/IEC 9075–1: Information technology - database languages - SQL (2011)
Klausner, A., Goodman, N.: Multirelations: semantice and languages. In: Proceedings of the 11th International Conference on Very Large Data Bases, pp. 251–258. VLDB 1985, VLDB Endowment (1985)
Lamperti, G., Melchiori, M., Zanella, M.: On multisets in database systems. In: Calude, C.S., Paun, G., Rozenberg, G., Salomaa, A. (eds.) WMC 2000. LNCS, vol. 2235, pp. 147–215. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45523-X_9
Paulley, G.N.: Exploiting Functional Dependence in Query Optimization. Ph.D. thesis, University of Waterloo (2000)
Ullman, J.D.: Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies. W.H. Freeman & Co, San Francisco (1990)
Yannakakis, M.: Algorithms for acyclic database schemes. In: Proceedings of the Seventh International Conference on Very Large Data Bases, vol. 7, pp. 82–94. VLDB 1981, VLDB Endowment (1981)
Acknowledgement
This material is based upon work supported by the National Science Foundation under Grant No. #1606557.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hellings, J., Wu, Y., Van Gucht, D., Gyssens, M. (2022). Optimizing Multiset Relational Algebra Queries Using Weak-Equivalent Rewrite Rules. In: Varzinczak, I. (eds) Foundations of Information and Knowledge Systems. FoIKS 2022. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-031-11321-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-11321-5_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11320-8
Online ISBN: 978-3-031-11321-5
eBook Packages: Computer ScienceComputer Science (R0)