Lero: applying learning-to-rank in query optimizer

Chen, Xingguang; Zhu, Rong; Ding, Bolin; Wang, Sibo; Zhou, Jingren

doi:10.1007/s00778-024-00850-3

Lero: applying learning-to-rank in query optimizer

Regular Paper
Published: 25 April 2024

(2024)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Xingguang Chen¹,
Rong Zhu ORCID: orcid.org/0000-0001-9976-9490²,
Bolin Ding²,
Sibo Wang¹ &
…
Jingren Zhou²

59 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent studies, machine learning techniques have been employed to support or enhance cost-based query optimizers in DBMS. Although these approaches have shown superiority in certain benchmarks, they also suffer from certain drawbacks. These include unstable performance, high training costs, and slow model updating, which can be attributed to the inherent challenges of predicting the cost or latency of execution plans using machine learning models. In this paper, we introduce a learning-to-rank query optimizer, called Lero, which builds on top of the native query optimizer and continuously learns to improve query optimization. The key observation is that the relative order or rank of plans, rather than the exact cost or latency, is sufficient for query optimization. Lero employs a pairwise approach to train a classifier to compare any two plans and tell which one is better. Such a binary classification task is much easier than the regression task to predict the cost or latency, in terms of model efficiency and effectiveness. Rather than building a learned optimizer from scratch, Lero is designed to leverage decades of wisdom of databases and improve the native optimizer. With its non-intrusive design, Lero can be implemented on top of any existing DBMS with minimum integration efforts. We implement Lero and demonstrate its outstanding performance using PostgreSQL and Spark SQL. In our experiments, Lero achieves near-optimal performance on several benchmarks. It reduces the execution time of the native PostgreSQL optimizer by up to \(70\%\) and other learned query optimizers by up to \(37\%\) on single-machine environments. On distributed environments, our Lero improves the running time of the native Spark SQL optimizer by up to \(27\%\). Meanwhile, Lero continuously learns and automatically adapts to query workloads and changes in data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Fig. 9

Fig. 15

Fig. 16

DB-GPT: Large Language Model Meets Database

Article Open access 19 January 2024

Optimal classification trees

Article 03 April 2017

Optimal Index Selection Using Optimized Deep Q-Learning Algorithm for NoSQL Database

Article 27 April 2024

References

Agrawal, S., Chaudhuri, S., Kollár, L., Marathe, A.P., Narasayya, V.R., Syamala, M.: Database tuning advisor for microsoft SQL server 2005. In: PVLDB, pp. 1110–1121 (2004)
Aken, D.V., Pavlo, A., Gordon, G.J., Zhang, B.: Automatic database management system tuning through large-scale machine learning. In: SIGMOD, pp. 1009–1024 (2017)
Armbrust, M., Xin, R.S., Lian, C., Huai, Y., Liu, D., Bradley, J.K., Meng, X., Kaftan, T., Franklin, M.J., Ghodsi, A., Zaharia, M.: Spark SQL: relational data processing in spark. In: SIGMOD, pp. 1383–1394 (2015)
Balsa implementation. https://github.com/balsa-project/balsa (2022)
Chaiken, R., Jenkins, B., Larson, P.Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. PVLDB 1(2), 1265–1276 (2008)
Google Scholar
Chaudhuri, S., Narasayya, V.R.: An efficient cost-driven index selection tool for microsoft SQL server. In: PVLDB, pp. 146–155 (1997)
Chen, J., Pan, X., Monga, R., Bengio, S., Jozefowicz, R.: Revisiting distributed synchronous sgd. arXiv:1604.00981 (2016)
Council(TPC), T.P.P.: TPC-DS Vesion 2 and Version 3. http://www.tpc.org/tpcds/ (2021)
Council(TPC), T.P.P.: Tpc-h vesion 2 and version 3. http://www.tpc.org/tpch/ (2021)
Dey, A., Bhaumik, S., Doraiswamy, H., Haritsa, J.R.: Efficiently approximating query optimizer plan diagrams. PVLDB 1(2), 1325–1336 (2008)
Google Scholar
Ding, B., Das, S., Marcus, R., Wu, W., Chaudhuri, S., Narasayya, V.R.: Ai meets ai: leveraging query executions to improve index recommendations. In: SIGMOD, pp. 1241–1258 (2019)
Doraiswamy, H., Darera, P.N., Haritsa, J.R.: Identifying robust plans through plan diagram reduction. PVLDB 1(1), 1124–1140 (2008)
Google Scholar
Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. PVLDB 12(9), 1044–1057 (2019)
Google Scholar
Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)
MathSciNet Google Scholar
Fuhr, N.: Optimum polynomial retrieval functions based on the probability ranking principle. ACM Trans. Inf. Syst. 7(3), 183–204 (1989)
Article Google Scholar
Han, Y.: Github repository: stats end-to-end cardest benchmark. https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark (2021)
Han, Y., Wu, Z., Wu, P., Zhu, R., Yang, J., Liang, T.W., Zeng, K., Cong, G., Qin, Y., Pfadler, A., Qian, Z., Zhou, J., Li, J., Cui, B.: Cardinality estimation in dbms: a comprehensive benchmark evaluation. PVLDB 15(4), 752–765 (2021)
Google Scholar
Hester, T., Vecerik, M., Pietquin, O., Lanctot, M., Schaul, T., Piot, B., Horgan, D., Quan, J., Sendonaris, A., Osband, I., et al.: Deep q-learning from demonstrations. In: AAAI, pp. 3223–3230 (2018)
Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: Deepdb: learn from data, not from queries!. pp. 992–1005 (2020)
Johannes, F., Eyke, H.: Preference Learning. Preference Learning (2011)
Kanet, J.J., Sridharan, V.: Scheduling with inserted idle time: problem taxonomy and literature review. Oper. Res. 48(1), 99–110 (2000)
Article Google Scholar
Karatzoglou, A., Baltrunas, L., Shi, Y.: Learning to rank for recommender systems. In: RecSys, pp. 493–494. ACM (2013)
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: Estimating correlated joins with deep learning. In: CIDR (2019)
Krishnan, S., Yang, Z., Goldberg, K., Hellerstein, J., Stoica, I.: Learning to optimize join queries with deep reinforcement learning. arXiv:1808.03196 (2018)
Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)
Google Scholar
Lero on postgresql. https://github.com/Blondig/Lero-on-PostgreSQL (2022)
Lero on spark sql. https://github.com/Blondig/Lero-on-Spark (2023)
Li, J., König, A.C., Narasayya, V., Chaudhuri, S.: Robust estimation of resource consumption for sql queries using statistical techniques. arXiv:1208.0278 (2012)
Li, P., Wu, Q., Burges, C.: Mcrank: Learning to rank using multiple classification and gradient boosting. Advances in neural information processing systems 20 (2007)
Liu, J., Dong, W., Zhou, Q., Li, D.: Fauce: fast and accurate deep ensembles with uncertainty for cardinality estimation. PVLDB 14(11), 1950–1963 (2021)
Google Scholar
Liu, T.Y.: Learning to rank for information retrieval. Found. Trends Inf. Retr. 3(3), 225–331 (2009)
Article Google Scholar
Liu, W., Liu, Q., Tang, R., Chen, J., He, X., Heng, P.A.: Personalized re-ranking with item relationships for e-commerce. In: CIKM, pp. 925–934 (2020)
Marcus, R.: Bao appendix. https://rmarcus.info/appendix.html (2020)
Marcus, R.: Github repository: Bao for postgresql. https://github.com/learnedsystems/BaoForPostgreSQL (2020)
Marcus, R., Negi, P., Mao, H., Tatbul, N., Alizadeh, M., Kraska, T.: Bao: Making learned query optimization practical. In: SIGMOD, pp. 1275–1288 (2021)
Marcus, R., Negi, P., Mao, H., Zhang, C., Alizadeh, M., Kraska, T., Papaemmanouil, O., Tatbul, N.: Neo: A learned query optimizer. PVLDB 12(11), 1705–1718 (2019)
Google Scholar
Marcus, R., Papaemmanouil, O.: Plan-structured deep neural network models for query performance prediction. PVLDB 12(11), 1733–1746 (2019)
Google Scholar
Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB 2(1), 982–993 (2009)
Google Scholar
Mou, L., Li, G., Zhang, L., Wang, T., Jin, Z.: Convolutional neural networks over tree structures for programming language processing. In: AAAI, pp. 1287–1293 (2016)
Negi, P., Interlandi, M., Marcus, R., Alizadeh, M., Kraska, T., Friedman, M., Jindal, A.: Steering query optimizers: a practical take on big data workloads. In: SIGMOD, pp. 2557–2569 (2021)
Pang, L., Xu, J., Ai, Q., Lan, Y., Cheng, X., Wen, J.: Setrank: Learning a permutation-invariant ranking model for information retrieval. In: SIGIR, pp. 499–508 (2020)
Pilotscope. https://github.com/alibaba/pilotscope (2024)
Query Planning. https://www.postgresql.org/docs/current/runtime-config-query.html (2023)
Reddy, N., Haritsa, J.R.: Analyzing plan diagrams of database query optimizers. In: PVLDB, pp. 1228–1240 (2005)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Readings in Artificial Intelligence and Databases, pp. 511–522. Elsevier (1989)
Sun, J., Li, G.: An end-to-end learning-based cost estimator. PVLDB 13(3), 307–319 (2019)
MathSciNet Google Scholar
Swezey, R., Grover, A., Charron, B., Ermon, S.: Pirank: Scalable learning to rank via differentiable sorting. Adv. Neural Inf. Process. Syst. 34, 21644–21654 (2021)
Google Scholar
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4), 285–294 (1933)
Article Google Scholar
Trummer, I., Wang, J., Maram, D., Moseley, S., Jo, S., Antonakakis, J.: Skinnerdb: Regret-bounded query evaluation via reinforcement learning. In: SIGMOD, pp. 1153–1170 (2019)
Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB 4(11), 852–863 (2011)
Google Scholar
Valentin, G., Zuliani, M., Zilio, D.C., Lohman, G.M., Skelley, A.: DB2 advisor: An optimizer smart enough to recommend its own indexes. In: ICDE, pp. 101–110 (2000)
Vargas, S., Castells, P.: Rank and relevance in novelty and diversity metrics for recommender systems. In: Proceedings of the fifth ACM conference on Recommender systems, pp. 109–116 (2011)
Wang, J., Trummer, I., Basu, D.: UDO: universal database optimization using reinforcement learning. PVLDB 14(13), 3402–3414 (2021)
Google Scholar
Wang, X., Qu, C., Wu, W., Wang, J., Zhou, Q.: Are we ready for learned cardinality estimation? PVLDB 14(9), 1640–1654 (2021)
Google Scholar
Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: Are optimizer cost models really unusable? In: ICDE, pp. 1081–1092 (2013)
Wu, Z., Shaikhha, A.: Bayescard: A unified bayesian framework for cardinality estimation. arXiv:2012.14743 (2020)
Yang, Z., Chiang, W., Luan, S., Mittal, G., Luo, M., Stoica, I.: Balsa: Learning a query optimizer without expert demonstrations. In: SIGMOD, pp. 931–944 (2022)
Yang, Z., Kamsetty, A., Luan, S., Liang, E., Duan, Y., Chen, X., Stoica, I.: Neurocard: One cardinality estimator for all tables. PVLDB 14(1), 61–73 (2021)
Google Scholar
Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., Abbeel, P., Hellerstein, J.M., Krishnan, S., Stoica, I.: Deep unsupervised cardinality estimation. PVLDB 13(3), 279–292 (2019)
Google Scholar
Yu, X., Li, G., Chai, C., Tang, N.: Reinforcement learning with tree-lstm for join order selection. In: ICDE, pp. 1297–1308 (2020)
Zhang, X., Chang, Z., Wu, H., Li, Y., Chen, J., Tan, J., Li, F., Cui, B.: A unified and efficient coordinating framework for autonomous DBMS tuning. Proc. ACM Manag. Data 1(2), 1–26 (2023)
Google Scholar
Zhi Kang, J.K., Tan, S.Y., Cheng, F., Sun, S., He, B.: Efficient deep learning pipelines for accurate cost estimations over large scale query workload. In: SIGMOD, pp. 1014–1022 (2021)
Zhou, L.: A survey on contextual multi-armed bandits. arXiv:1508.03326 (2015)
Zhou, X., Sun, J., Li, G., Feng, J.: Query performance prediction for concurrent queries using graph embedding. PVLDB 13(9), 1416–1428 (2020)
Zhu, R., Chen, W., Ding, B., Chen, X., Pfadler, A., Wu, Z., Zhou, J.: Lero: A learning-to-rank query optimizer. PVLDB 16(6), 1466–1479 (2023)
Google Scholar
Zhu, R., Weng, L., Wei, W., Wu, D., Peng, J., Wang, Y., Ding, B., Lian, D., Zheng, B., Zhou, J.: Pilotscope: Steering databases with machine learning drivers. PVLDB 17(5), 980–993 (2024)
Google Scholar
Zhu, R., Wu, Z., Chai, C., Pfadler, A., Ding, B., Li, G., Zhou, J.: Learned query optimizer: At the forefront of ai-driven databases. In: EDBT, pp. 1–4 (2022)
Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., Zhou, J., Cui, B.: Flat: Fast, lightweight and accurate method for cardinality estimation. PVLDB 14(9), 1489–1502 (2021)
Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (U1936205), Hong Kong Innovation and Technology Commission (MRP/071/20X), and Hong Kong Research Grants Council (14217322, 24203419, C4158-20 G).

Author information

Authors and Affiliations

The Chinese University of Hong Kong, Shatin, Hong Kong
Xingguang Chen & Sibo Wang
Alibaba Group, Hangzhou, China
Rong Zhu, Bolin Ding & Jingren Zhou

Authors

Xingguang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Rong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Bolin Ding
View author publications
You can also search for this author in PubMed Google Scholar
Sibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jingren Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rong Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, X., Zhu, R., Ding, B. et al. Lero: applying learning-to-rank in query optimizer. The VLDB Journal (2024). https://doi.org/10.1007/s00778-024-00850-3

Download citation

Received: 28 October 2023
Revised: 08 February 2024
Accepted: 15 March 2024
Published: 25 April 2024
DOI: https://doi.org/10.1007/s00778-024-00850-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lero: applying learning-to-rank in query optimizer

Abstract

Access this article

Similar content being viewed by others

DB-GPT: Large Language Model Meets Database

Optimal classification trees

Optimal Index Selection Using Optimized Deep Q-Learning Algorithm for NoSQL Database

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lero: applying learning-to-rank in query optimizer

Abstract

Access this article

Similar content being viewed by others

DB-GPT: Large Language Model Meets Database

Optimal classification trees

Optimal Index Selection Using Optimized Deep Q-Learning Algorithm for NoSQL Database

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation