Cardinality estimation with smoothing autoregressive models

Lin, Yuming; Xu, Zejun; Zhang, Yinghao; Li, You; Zhang, Jingwei

doi:10.1007/s11280-023-01195-7

Cardinality estimation with smoothing autoregressive models

Published: 28 July 2023

Volume 26, pages 3441–3461, (2023)
Cite this article

World Wide Web Aims and scope Submit manuscript

Yuming Lin¹,
Zejun Xu¹,
Yinghao Zhang¹,
You Li¹ &
…
Jingwei Zhang¹

244 Accesses
Explore all metrics

Abstract

Cardinality estimation, which aims at accurately estimating the result size of queries, is a fundamental task in database query processing and optimization. One of the most recent and effective solutions to this problem is the use of deep autoregressive models to obtain joint probability distributions through unsupervised learning. However, due to the data sparsity, it is difficult for the estimator to accurately capture the actual distribution, which affects the accuracy of the cardinality estimation. In addition, autoregressive estimators’ progressive sampling characteristics are prone to error propagation, which is more evident in high-dimensional data. To reduce the autoregressive cardinality estimation error and to obtain a better trade-off between estimate accuracy and latency, we propose a random smoothing autoregressive cardinality estimation model (SAM-CE), which uses a random smoothing technique combined with a deep autoregressive model to simplify the learning of joint probability distributions. A smooth progressive sampling method that is suitable for range queries is designed to improve the estimator accuracy by improving the sample quality. We conduct extensive experiments to demonstrate the effectiveness and performance of the proposed SAM-CE. The results show that SAM-CE achieves the state of the art effectiveness of cardinality estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Cardinality Estimator by Non-autoregressive Model

CELA: An Accurate Learned Cardinality Estimator with Strong Generalization Ability and Dimensional Adaptability

Cardinality estimation using normalizing flow

Article 29 August 2023

Availability of data and materials

The datasets generated during the current study are available from the corresponding author on reasonable request.

Notes

catalog.data.gov/dataset/vehicle-snowmobile-and-boat-registrations.
https://archive.ics.uci.edu/ml/index.php.

References

Lan, H., Bao, Z., Peng, Y.: A survey on advancing the dbms query optimizer: Cardinality estimation, cost model, and plan enumeration. Data Sci. Eng. 6(1), 86–101 (2021). https://doi.org/10.1007/s41019-020-00149-7
Article Google Scholar
Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endow. 9(3), 204–215 (2015)
Article Google Scholar
Leis, V., Radke, B., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27(5), 643–668 (2018). https://doi.org/10.1007/s00778-017-0480-7
Article Google Scholar
Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. Proceedings of the VLDB Endowment 12(9), 1044–1057 (2019)
Article Google Scholar
Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P.A., Kemper, A.: Learned cardinalities: Estimating correlated joins with deep learning. In: 9th Biennial Conference on Innovative Data Systems Research, CIDR 2019, Asilomar, CA, USA, January 13-16, 2019, Online Proceedings. http://cidrdb.org/cidr2019/papers/p101-kipf-cidr19.pdf
Bruno, N., Chaudhuri, S., Gravano, L.: Stholes: A multidimensional workload-aware histogram. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 211–222 (2001)
Cormode, G., Garofalakis, M.N., Haas, P.J., Jermaine, C.: Synopses for massive data: Samples, histograms, wavelets, sketches. Found. TrendsDatabases 4(1–3), 1–294 (2012). https://doi.org/10.1561/1900000004
Article MATH Google Scholar
Flajolet, P., Fusy, E., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In: Discrete Mathematics and Theoretical Computer Science (2007)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985). https://doi.org/10.1016/0022-0000(85)90041-8
Article MathSciNet MATH Google Scholar
Giroire, F.: Order statistics and estimating cardinalities of massive data sets. Discret. Appl. Math. 157(2), 406–427 (2009). https://doi.org/10.1016/j.dam.2008.06.020
Article MathSciNet MATH Google Scholar
Lipton, R.J., Naughton, J.F., Schneider, D.A.: Practical selectivity estimation through adaptive sampling. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, pp. 1–11 (2022)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Jarke, M., Carey, M.J., Dittrich, K.R., Lochovsky, F.H., Loucopoulos, P., Jeusfeld, M.A. (eds.) VLDB’97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece, pp. 486–495. http://www.vldb.org/conf/1997/P486.PDF
Heimel, M., Kiefer, M., Markl, V.: Self-tuning, gpu-accelerated kernel density models or multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492
Hilprecht, B., Schmidt, A., Kulessa, M., Molina, A., Kersting, K., Binnig, C.: Deepdb: Learn from data, not from queries! Proc. VLDB Endow. 13(7), 992–1005 (2020)
Article Google Scholar
Wu, P., Cong, G.: A unified deep model of learning from both data and queries for cardinality estimation. In: Proceedings of the 2021 International Conference on Management of Data, pp. 2009–2022 (2021)
Yang, Z., Liang, E., Kamsetty, A., Wu, C., Duan, Y., Chen, X., Abbeel, P., Hellerstein, J.M., Krishnan, S., Stoica, I.: Deep unsupervised cardinality estimation. Proceedings of the Vldb Endowment 13(3), 279–292 (2019). https://doi.org/10.14778/3368289.3368294
Hasan, S., Thirumuruganathan, S., Augustine, J., Koudas, N., Das, G.: Deep learning models for selectivity estimation of multi-attribute queries. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050 (2020)
Bishop, C.M.: Training with noise is equivalent to tikhonov regularization. Neural Comput. 7(1), 108–116 (1995). https://doi.org/10.1162/neco.1995.7.1.108
Article Google Scholar
To, H., Chiang, K., Shahabi, C.: Entropy-based histograms for selectivity estimation. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 1939–1948 (2013)
Lynch, C.A.: Selectivity estimation and query optimization in large databases with highly skewed distribution of column values. In: Proceedings of the 14th International Conference on Very Large Data Bases, pp. 240–251 (1998)
Park, Y., Zhong, S., Mozafari, B.: Quicksel: Quick selectivity learning with mixture models. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1017–1033 (2020)
Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: VLDB, vol. 1, pp. 541–550 (2001)
Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: VLDB, vol. 95, pp. 311–322 (1995)
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968). https://doi.org/10.1109/TIT.1968.1054142
Article MATH Google Scholar
Spiegel, J., Polyzotis, N.: Graph-based synopses for relational selectivity estimation. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 205–216 (2006)
Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pp. 461–472 (2001)
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Approximating multi-dimensional aggregate range queries over real attributes. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, May 16-18, 2000, Dallas, Texas, USA, pp. 463–474 (2000). https://doi.org/10.1145/342009.335448
Lakshmi, M.S., Zhou, S.: Selectivity estimation in extensible databases - A neural network approach. In: Gupta, A., Shmueli, O., Widom, J. (eds.) VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA, pp.623–627. http://www.vldb.org/conf/1998/p623.pdf
Liu, H., Xu, M., Yu, Z., Corvinelli, V., Zuzarte, C.: Cardinality estimation using neural networks. In: Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, pp. 53–59
Lu, H., Setiono, R.: Effective query size estimation using neural networks. Appl. Intell. 16(3), 173–183 (2002). https://doi.org/10.1023/A:1014333932021
Article MATH Google Scholar
Ortiz, J., Balazinska, M., Gehrke, J., Keerthi, S.S.: An empirical analysis of deep learning for cardinality estimation. CoRR abs/1905.06425 (2019). arXiv:1905.06425
Zhu, R., Wu, Z., Han, Y., Zeng, K., Pfadler, A., Qian, Z., Zhou, J., Cui, B.: Flat: Fast, lightweight and accurate method for cardinality estimation. Proc. VLDB Endow. 14(9), 1489–1502 (2021)
Article Google Scholar
Narayanan, H., Mitter, S.K.: Sample complexity of testing the manifold hypothesis. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a Meeting Held 6-9 December 2010, pp. 1786–1794. Vancouver, British Columbia, Canada (2010)
Cornish, R., Caterini, A., Deligiannidis, G., Doucet, A.: Relaxing bijectivity constraints with continuously indexed normalising flows. In: International Conference on Machine Learning, pp. 2133–2143. PMLR (2020)
Meng, C., Song, J., Song, Y., Zhao, S., Ermon, S.: Improved autoregressive modeling with distribution smoothing. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https://openreview.net/forum?id=rJA5Pz7lHKb
Cohen, J., Rosenfeld, E., Kolter, Z.: Certified adversarial robustness via randomized smoothing. In: International Conference on Machine Learning, pp. 1310–1320. PMLR
Wang, X., Qu, C., Wu, W., Wang, J., Zhou, Q.: Are we ready for learned cardinality estimation? Proc. VLDB Endow. 14(9), 1640–1654 (2021)
Article Google Scholar
Poosala, V., Ioannidis, Y.E., Haas, P.J., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: Jagadish, H.V., Mumick, I.S. (eds.) Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996, pp. 294–305 (1996). https://doi.org/10.1145/233269.233342
Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endow. 2(1), 982–993 (2009). https://doi.org/10.14778/1687627.1687738
Germain, M., Gregor, K., Murray, I., Larochelle, H.: MADE: masked autoencoder for distribution estimation. In: Bach, F.R., Blei, D.M. (eds.) Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 881–889 (2015). http://proceedings.mlr.press/v37/germain15.html

Download references

Funding

This work was supported by National Natural Science Foundation of China (Nos. 62062027 and U22A2099), Innovation Project of GUET Graduate Education (No. 2022YCXS079) and the project of Guangxi Key Laboratory of Trusted Software.

Author information

Authors and Affiliations

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Jinji Road, Guilin, 541004, Guangxi, China
Yuming Lin, Zejun Xu, Yinghao Zhang, You Li & Jingwei Zhang

Authors

Yuming Lin
View author publications
You can also search for this author in PubMed Google Scholar
Zejun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yinghao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
You Li
View author publications
You can also search for this author in PubMed Google Scholar
Jingwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yuming Lin: Methodology, Conceptualization, Investigation, Validation, Writing - original draft, Writing - review & editing, Funding acquisition. Zejun Xu: Methodology, Software, Writing - Original Draft, Writing - review & editing, Validation. Yinghao Zhang: Data preparation and maintenance, Validation. You Li: Methodology, Writing - review & editing, Funding acquisition. Jingwei Zhang: Resources, Supervision. All authors reviewed the manuscript.

Corresponding author

Correspondence to You Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lin, Y., Xu, Z., Zhang, Y. et al. Cardinality estimation with smoothing autoregressive models. World Wide Web 26, 3441–3461 (2023). https://doi.org/10.1007/s11280-023-01195-7

Download citation

Received: 22 March 2023
Revised: 15 June 2023
Accepted: 05 July 2023
Published: 28 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11280-023-01195-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cardinality estimation with smoothing autoregressive models

Abstract

Access this article

Similar content being viewed by others

Robust Cardinality Estimator by Non-autoregressive Model

CELA: An Accurate Learned Cardinality Estimator with Strong Generalization Ability and Dimensional Adaptability

Cardinality estimation using normalizing flow

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Cardinality estimation with smoothing autoregressive models

Abstract

Access this article

Similar content being viewed by others

Robust Cardinality Estimator by Non-autoregressive Model

CELA: An Accurate Learned Cardinality Estimator with Strong Generalization Ability and Dimensional Adaptability

Cardinality estimation using normalizing flow

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation