Skip to main content

Selectivity Estimation with Attribute Value Dependencies Using Linked Bayesian Networks

  • 165 Accesses

Part of the Lecture Notes in Computer Science book series (TLDKS,volume 12410)

Abstract

Relational query optimisers rely on cost models to choose between different query execution plans. Selectivity estimates are known to be a crucial input to the cost model. In practice, standard selectivity estimation procedures are prone to large errors. This is mostly because they rely on the so-called attribute value independence and join uniformity assumptions. Therefore, multidimensional methods have been proposed to capture dependencies between two or more attributes both within and across relations. However, these methods require a large computational cost which makes them unusable in practice. We propose a method based on Bayesian networks that is able to capture cross-relation attribute value dependencies with little overhead. Our proposal is based on the assumption that dependencies between attributes are preserved when joins are involved. Furthermore, we introduce a parameter for trading between estimation accuracy and computational cost. We validate our work by comparing it with other relevant methods on a large workload derived from the JOB and TPC-DS benchmarks. Our results show that our method is an order of magnitude more efficient than existing methods, whilst maintaining a high level of accuracy.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-662-62386-2_6
  • Chapter length: 35 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-662-62386-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.

Notes

  1. 1.

    JOB dataset and queries: https://github.com/gregrahn/join-order-benchmark/.

  2. 2.

    Docker image: https://github.com/MaxHalford/postgres-job-docker.

  3. 3.

    Method source code: https://github.com/MaxHalford/tldks-2020.

References

  1. Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. In: ACM SIGMOD Record, vol. 28, pp. 275–286. ACM (1999)

    Google Scholar 

  2. Akdere, M., Çetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: Learning-based query performance modeling and prediction. In: IEEE 28th International Conference on Data Engineering (ICDE), pp. 390–401. IEEE (2012)

    Google Scholar 

  3. Bartlett, M., Cussens, J.: Integer linear programming for the Bayesian network structure learning problem. Artif. Intell. 244, 258–271 (2017)

    MathSciNet  CrossRef  Google Scholar 

  4. Blohsfeld, B., Korus, D., Seeger, B.: A comparison of selectivity estimators for range queries on metric attributes. In: ACM SIGMOD Record, vol. 28, pp. 239–250. ACM (1999)

    Google Scholar 

  5. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. In: ACM SIGMOD Record, vol. 30, pp. 211–222. ACM (2001)

    Google Scholar 

  6. Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43. ACM (1998)

    Google Scholar 

  7. Chaudhuri, S., Motwani, R., Narasayya, V.: On random sampling over joins. In: ACM SIGMOD Record, vol. 28, pp. 263–274. ACM (1999)

    Google Scholar 

  8. Chaudhuri, S., Narasayya, V., Ramamurthy, R.: Exact cardinality query optimization for optimizer testing. Proc. VLDB Endowment 2(1), 994–1005 (2009)

    CrossRef  Google Scholar 

  9. Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback, vol. 23. ACM (1994)

    Google Scholar 

  10. Chen, Y., Yi, K.: Two-level sampling for join size estimation. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 759–774. ACM (2017)

    Google Scholar 

  11. Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)

    CrossRef  Google Scholar 

  12. Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)

    MathSciNet  CrossRef  Google Scholar 

  13. Cowell, R.G., Dawid, P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer, New York (2006). https://doi.org/10.1007/b97670

    CrossRef  MATH  Google Scholar 

  14. Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: dependency-based histogram synopses for high-dimensional data. ACM SIGMOD Record 30(2), 199–210 (2001)

    CrossRef  Google Scholar 

  15. Dutt, A., Wang, C., Nazi, A., Kandula, S., Narasayya, V., Chaudhuri, S.: Selectivity estimation for range predicates using lightweight models. Proc. VLDB Endowment 12(9), 1044–1057 (2019)

    CrossRef  Google Scholar 

  16. Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. In: ACM SIGMOD Record, vol. 30, pp. 461–472. ACM (2001)

    Google Scholar 

  17. Halford, M., Saint-Pierre, P., Morvan, F.: An approach based on Bayesian networks for query selectivity estimation. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) DASFAA 2019. LNCS, vol. 11447, pp. 3–19. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-18579-4_1

    CrossRef  Google Scholar 

  18. Heimel, M., Kiefer, M., Markl, V.: Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492. ACM (2015)

    Google Scholar 

  19. Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem, vol. 53. Elsevier, North-Holland (1992)

    MATH  Google Scholar 

  20. Ioannidis, Y.: The history of histograms (abridged). In: Proceedings 2003 VLDB Conference, pp. 19–30. Elsevier (2003)

    Google Scholar 

  21. Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. (CSUR) 28(1), 121–123 (1996)

    CrossRef  Google Scholar 

  22. Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results, vol. 20. ACM (1991)

    Google Scholar 

  23. Ivanov, O., Bartunov, S.: Adaptive cardinality estimation. arXiv preprint arXiv:1711.08330 (2017)

  24. Jensen, F.V., et al.: An Introduction to Bayesian Networks, vol. 210. UCL press, London (1996)

    Google Scholar 

  25. Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018)

  26. Kipf, A., et al.: Estimating cardinalities with deep sketches. In: Proceedings of the 2019 International Conference on Management of Data, pp. 1937–1940 (2019)

    Google Scholar 

  27. Kooi, R.P.: The optimization of queries in relational databases (1981)

    Google Scholar 

  28. Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proc. Am. Math. Soc. 7(1), 48–50 (1956)

    MathSciNet  CrossRef  Google Scholar 

  29. Kschischang, F.R., Frey, B.J., Loeliger, H.A., et al.: Factor graphs and the sum-product algorithm. IEEE Trans. Inf. Theory 47(2), 498–519 (2001)

    MathSciNet  CrossRef  Google Scholar 

  30. Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)

    CrossRef  Google Scholar 

  31. Leis, V., et al.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27, 643–668 (2018)

    CrossRef  Google Scholar 

  32. Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017)

    Google Scholar 

  33. Liu, H., Xu, M., Yu, Z., Corvinelli, V., Zuzarte, C.: Cardinality estimation using neural networks. In: Proceedings of the 25th Annual International Conference on Computer Science and Software Engineering, pp. 53–59. IBM Corp. (2015)

    Google Scholar 

  34. Markl, V., Haas, P.J., Kutsch, M., Megiddo, N., Srivastava, U., Tran, T.M.: Consistent selectivity estimation via maximum entropy. VLDB J. 16(1), 55–76 (2007)

    CrossRef  Google Scholar 

  35. Matias, Y., Vitter, J.S., Wang, M.: Wavelet-based histograms for selectivity estimation. In: ACM SIGMOD Record, vol. 27, pp. 448–459. ACM (1998)

    Google Scholar 

  36. Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endowment 2(1), 982–993 (2009)

    CrossRef  Google Scholar 

  37. Müller, M., Moerkotte, G., Kolb, O.: Improved selectivity estimation by combining knowledge from sampling and synopses. Proc. VLDB Endowment 11(9), 1016–1028 (2018)

    CrossRef  Google Scholar 

  38. Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. In: ACM SIGMOD Record, vol. 17, pp. 28–36. ACM (1988)

    Google Scholar 

  39. Olken, F., Rotem, D.: Simple random sampling from relational databases (1986)

    Google Scholar 

  40. Poess, M., Smith, B., Kollar, L., Larson, P.: TPC-DS, taking decision support benchmarking to the next level. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 582–587 (2002)

    Google Scholar 

  41. Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. In: ACM SIGMOD Record, vol. 25, pp. 294–305. ACM (1996)

    Google Scholar 

  42. Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. VLDB 97, 486–495 (1997)

    Google Scholar 

  43. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM (1979)

    Google Scholar 

  44. Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo-DB2’s learning optimizer. VLDB 1, 19–28 (2001)

    Google Scholar 

  45. Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. Proc. VLDB Endowment 4(11), 852–863 (2011)

    CrossRef  Google Scholar 

  46. Van Aken, D., Pavlo, A., Gordon, G.J., Zhang, B.: Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1009–1024. ACM (2017)

    Google Scholar 

  47. Vengerov, D., Menck, A.C., Zait, M., Chakkappen, S.P.: Join size estimation subject to filter conditions. Proc. VLDB Endowment 8(12), 1530–1541 (2015)

    CrossRef  Google Scholar 

  48. Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: IEEE 29th International Conference on Data Engineering (ICDE), pp. 1081–1092. IEEE (2013)

    Google Scholar 

  49. Yin, S., Hameurlain, A., Morvan, F.: SLA definition for multi-tenant DBMS and its impact on query optimization. IEEE Trans. Knowl. Data Eng. 30, 2213–2226 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Halford .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Halford, M., Saint-Pierre, P., Morvan, F. (2020). Selectivity Estimation with Attribute Value Dependencies Using Linked Bayesian Networks. In: Hameurlain, A., Tjoa, A.M. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVI. Lecture Notes in Computer Science(), vol 12410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-62386-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-62386-2_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-62385-5

  • Online ISBN: 978-3-662-62386-2

  • eBook Packages: Computer ScienceComputer Science (R0)