Skip to main content

Optimizing Multiset Relational Algebra Queries Using Weak-Equivalent Rewrite Rules

  • Conference paper
  • First Online:
Foundations of Information and Knowledge Systems (FoIKS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS))

  • 238 Accesses

Abstract

Relational query languages rely heavily on costly join operations to combine tuples from multiple tables into a single resulting tuple. In many cases, the cost of query evaluation can be reduced by manually optimizing (parts of) queries to use cheaper semi-joins instead of joins. Unfortunately, existing database products can only apply such optimizations automatically in rather limited cases.

To improve on this situation, we propose a framework for automatic query optimization via weak-equivalent rewrite rules for a multiset relational algebra (that serves as a faithful formalization of core SQL). The weak-equivalent rewrite rules we propose aim at replacing joins by semi-joins. To further maximize their usability, these rewrite rules do so by only providing “weak guarantees” on the evaluation results of rewritten queries. We show that, in the context of certain operators, these weak-equivalent rewrite rules still provide strong guarantees on the final evaluation results of the rewritten queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Every occurrence of the operators \(\cup \), \(\cap \), and \(-\) in this paper is to be interpreted using standard set semantics.

  2. 2.

    We only consider conjunctions of conditions in the selection operator because more general boolean combinations of conditions do not provide additional opportunities for rewriting in our framework.

  3. 3.

    The set operators we define have the same semantics as the UNION ALL, INTERSECT ALL, and EXCEPT ALL operators of standard SQL [8]. The semantics of UNION, INTERSECT, and EXCEPT can be obtained using deduplication.

  4. 4.

    To simplify presentation, the \(\theta \)-join and the \(\theta \)-semi-join also perform equi-join on all attributes common to the multiset relations involved.

  5. 5.

    The max-union operators is inspired by the \(\max \)-based multiset relation union [2].

  6. 6.

    Attribute introduction is a restricted form of the operator commonly known as generalized projection or extended projection [1, 5].

  7. 7.

    We could have defined a multiset semi-join operator that does take into account the number of occurrences of tuples in \(\llbracket \textit{e}_2 \rrbracket _{\mathfrak {I}}\). With such a semi-join operator, we would no longer be able to sharply reduce the size of intermediate query results, however, and lose some potential to optimize query evaluation.

  8. 8.

    Notice that \(\langle \mathscr {R}_1; \tau _1 \rangle \mathrel {\tilde{\subseteq }}\langle \mathscr {R}_2; \tau _2 \rangle \) does not imply that \(\langle \mathscr {R}_1; \tau _1 \rangle \) is fully included in \(\langle \mathscr {R}_2; \tau _2 \rangle \): there can be tuples \(\textsf {t}\in \mathscr {R}_1\) for which \(\tau _1(\textsf {t}) > \tau _2(\textsf {t})\).

References

  1. Abiteboul, S., Hull, R., Vianu, V. (eds.): Foundations of Databases, 1st edn. Addison-Wesley Publishing Company, Boston (1995)

    MATH  Google Scholar 

  2. Albert, J.: Algebraic properties of bag data types. In: Proceedings of the 17th International Conference on Very Large Data Base, pp. 211–219. VLDB 1991, Morgan Kaufmann Publishers Inc. (1991)

    Google Scholar 

  3. Bernstein, P.A., Chiu, D.M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981). https://doi.org/10.1145/322234.322238

    Article  MATH  Google Scholar 

  4. Dayal, U., Goodman, N., Katz, R.H.: An extended relational algebra with control over duplicate elimination. In: Proceedings of the 1st ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, pp. 117–123. PODS 1982, ACM (1982). https://doi.org/10.1145/588111.588132

  5. Grefen, P.W.P.J., de By, R.A.: A multi-set extended relational algebra: a formal approach to a practical issue. In: Proceedings of 1994 IEEE 10th International Conference on Data Engineering, pp. 80–88. IEEE (1994). https://doi.org/10.1109/ICDE.1994.283002

  6. Hellings, J., Pilachowski, C.L., Van Gucht, D., Gyssens, M., Wu, Y.: From relation algebra to semi-join algebra: an approach for graph query optimization. In: Proceedings of The 16th International Symposium on Database Programming Languages. ACM (2017). https://doi.org/10.1145/3122831.3122833

  7. Hellings, J., Pilachowski, C.L., Van Gucht, D., Gyssens, M., Wu, Y.: From relation algebra to semi-join algebra: an approach to graph query optimization. Comput. J. 64(5), 789–811 (2020). https://doi.org/10.1093/comjnl/bxaa031

    Article  MathSciNet  Google Scholar 

  8. International Organization for Standardization: ISO/IEC 9075–1: Information technology - database languages - SQL (2011)

    Google Scholar 

  9. Klausner, A., Goodman, N.: Multirelations: semantice and languages. In: Proceedings of the 11th International Conference on Very Large Data Bases, pp. 251–258. VLDB 1985, VLDB Endowment (1985)

    Google Scholar 

  10. Lamperti, G., Melchiori, M., Zanella, M.: On multisets in database systems. In: Calude, C.S., Paun, G., Rozenberg, G., Salomaa, A. (eds.) WMC 2000. LNCS, vol. 2235, pp. 147–215. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45523-X_9

    Chapter  Google Scholar 

  11. Paulley, G.N.: Exploiting Functional Dependence in Query Optimization. Ph.D. thesis, University of Waterloo (2000)

    Google Scholar 

  12. Ullman, J.D.: Principles of Database and Knowledge-Base Systems: Volume II: The New Technologies. W.H. Freeman & Co, San Francisco (1990)

    Google Scholar 

  13. Yannakakis, M.: Algorithms for acyclic database schemes. In: Proceedings of the Seventh International Conference on Very Large Data Bases, vol. 7, pp. 82–94. VLDB 1981, VLDB Endowment (1981)

    Google Scholar 

Download references

Acknowledgement

This material is based upon work supported by the National Science Foundation under Grant No. #1606557.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jelle Hellings .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hellings, J., Wu, Y., Van Gucht, D., Gyssens, M. (2022). Optimizing Multiset Relational Algebra Queries Using Weak-Equivalent Rewrite Rules. In: Varzinczak, I. (eds) Foundations of Information and Knowledge Systems. FoIKS 2022. Lecture Notes in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-031-11321-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-11321-5_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-11320-8

  • Online ISBN: 978-3-031-11321-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics