Skip to main content

An Approach Based on Bayesian Networks for Query Selectivity Estimation

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11447))

Included in the following conference series:

Abstract

The efficiency of a query execution plan depends on the accuracy of the selectivity estimates given to the query optimiser by the cost model. The cost model makes simplifying assumptions in order to produce said estimates in a timely manner. These assumptions lead to selectivity estimation errors that have dramatic effects on the quality of the resulting query execution plans. A convenient assumption that is ubiquitous among current cost models is to assume that attributes are independent with each other. However, it ignores potential correlations which can have a huge negative impact on the accuracy of the cost model. In this paper we attempt to relax the attribute value independence assumption without unreasonably deteriorating the accuracy of the cost model. We propose a novel approach based on a particular type of Bayesian networks called Chow-Liu trees to approximate the distribution of attribute values inside each relation of a database. Our results on the TPC-DS benchmark show that our method is an order of magnitude more precise than other approaches whilst remaining reasonably efficient in terms of time and space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Specifically we used the following queries: 7, 13, 18, 26, 27, 53, 54, 91.

References

  1. Acharya, S., Gibbons, P.B., Poosala, V., Ramaswamy, S.: Join synopses for approximate query answering. ACM SIGMOD Rec. 28, 275–286 (1999)

    Article  Google Scholar 

  2. Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)

    Google Scholar 

  3. Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: a multidimensional workload-aware histogram. ACM SIGMOD Rec. 30, 211–222 (2001)

    Article  Google Scholar 

  4. Chaudhuri, S., Motwani, R., Narasayya, V.: On random sampling over joins. ACM SIGMOD Rec. 28, 263–274 (1999)

    Article  Google Scholar 

  5. Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback, vol. 23. ACM (1994)

    Google Scholar 

  6. Chen, Y., Yi, K.: Two-level sampling for join size estimation. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 759–774. ACM (2017)

    Google Scholar 

  7. Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theory 14(3), 462–467 (1968)

    Article  MathSciNet  Google Scholar 

  8. Cooper, G.F.: The computational complexity of probabilistic inference using Bayesian belief networks. Artif. Intell. 42(2–3), 393–405 (1990)

    Article  MathSciNet  Google Scholar 

  9. Cowell, R.G., Dawid, P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems: Exact Computational Methods for Bayesian Networks. Springer Science & Business Media (2006)

    Google Scholar 

  10. Getoor, L., Taskar, B., Koller, D.: Selectivity estimation using probabilistic models. ACM SIGMOD Rec. 30, 461–472 (2001)

    Article  Google Scholar 

  11. Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)

    MATH  Google Scholar 

  12. Heimel, M., Kiefer, M., Markl, V.: Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1477–1492. ACM (2015)

    Google Scholar 

  13. Hellerstein, J.M.: Looking back at postgres. arXiv preprint arXiv:1901.01973 (2019)

  14. Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem, vol. 53. Elsevier, Amsterdam (1992)

    MATH  Google Scholar 

  15. Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results, vol. 20. ACM (1991)

    Google Scholar 

  16. Ioannidis, Y.E., Christodoulakis, S.: Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. Database Syst. (TODS) 18(4), 709–748 (1993)

    Article  Google Scholar 

  17. Jaakkola, T., Sontag, D., Globerson, A., Meila, M.: Learning Bayesian network structure using LP relaxations. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 358–365 (2010)

    Google Scholar 

  18. Jensen, F.V.: An Introduction to Bayesian Networks, vol. 210. UCL Press, London (1996)

    Google Scholar 

  19. Kipf, A., Kipf, T., Radke, B., Leis, V., Boncz, P., Kemper, A.: Learned cardinalities: estimating correlated joins with deep learning. arXiv preprint arXiv:1809.00677 (2018)

  20. Kooi, R.P.: The optimization of queries in relational databases (1980)

    Google Scholar 

  21. Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endowment 9(3), 204–215 (2015)

    Article  Google Scholar 

  22. Leis, V., et al.: Query optimization through the looking glass, and what we found running the join order benchmark. VLDB J. 27, 1–26 (2018)

    Article  Google Scholar 

  23. Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017)

    Google Scholar 

  24. Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, pp. 615–629. ACM (2016)

    Google Scholar 

  25. Lipton, R.J., Naughton, J.F.: Query size estimation by adaptive sampling. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 40–46. ACM (1990)

    Google Scholar 

  26. Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endowment 2(1), 982–993 (2009)

    Article  Google Scholar 

  27. Muralikrishna, M., DeWitt, D.J.: Equi-depth multidimensional histograms. SIGMOD Rec. 17, 28–36 (1988)

    Article  Google Scholar 

  28. Muthukrishnan, S., Poosala, V., Suel, T.: On rectangular partitionings in two dimensions: algorithms, complexity and applications. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 236–256. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7_16

    Chapter  Google Scholar 

  29. Olken, F.: Random sampling from databases. Ph.D. thesis, University of California, Berkeley (1993)

    Google Scholar 

  30. Piatetsky-Shapiro, G., Connell, C.: Accurate estimation of the number of tuples satisfying a condition. ACM SIGMOD Rec. 14(2), 256–276 (1984)

    Article  Google Scholar 

  31. Poess, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: Proceedings of the 33rd International Conference on Very Large Databases, pp. 1138–1149. VLDB Endowment (2007)

    Google Scholar 

  32. Poosala, V., Haas, P.J., Ioannidis, Y.E., Shekita, E.J.: Improved histograms for selectivity estimation of range predicates. ACM Sigmod Rec. 25, 294–305 (1996)

    Article  Google Scholar 

  33. Robertson, N., Seymour, P.D.: Graph minors. II: algorithmic aspects of tree-width. J. Algorithms 7(3), 309–322 (1986)

    Article  MathSciNet  Google Scholar 

  34. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 23–34. ACM (1979)

    Google Scholar 

  35. Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: Leo-db2’s learning optimizer. VLDB 1, 19–28 (2001)

    Google Scholar 

  36. Traverso, M.: Presto: interacting with petabytes of data at Facebook. Retrieved February 4, 2014 (2013)

    Google Scholar 

  37. Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. Proc. VLDB Endowment 4(11), 852–863 (2011)

    Google Scholar 

  38. Vengerov, D., Menck, A.C., Zait, M., Chakkappen, S.P.: Join size estimation subject to filter conditions. Proc. VLDB Endowment 8(12), 1530–1541 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Max Halford .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halford, M., Saint-Pierre, P., Morvan, F. (2019). An Approach Based on Bayesian Networks for Query Selectivity Estimation. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11447. Springer, Cham. https://doi.org/10.1007/978-3-030-18579-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18579-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18578-7

  • Online ISBN: 978-3-030-18579-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics