Query optimization through the looking glass, and what we found running the Join Order Benchmark

  • Viktor Leis
  • Bernhard Radke
  • Andrey Gubichev
  • Atanas Mirchev
  • Peter Boncz
  • Alfons Kemper
  • Thomas Neumann
Special Issue Paper
  • 69 Downloads

Abstract

Finding a good join order is crucial for query performance. In this paper, we introduce the Join Order Benchmark that works on real-life data riddled with correlations and introduces 113 complex join queries. We experimentally revisit the main components in the classic query optimizer architecture using a complex, real-world data set and realistic multi-join queries. For this purpose, we describe cardinality-estimate injection and extraction techniques that allow us to compare the cardinality estimators of multiple industrial SQL implementations on equal footing, and to characterize the value of having perfect cardinality estimates. Our investigation shows that all industrial-strength cardinality estimators routinely produce large errors: though cardinality estimation using table samples solves the problem for single-table queries, there are still no techniques in industrial systems that can deal accurately with join-crossing correlated query predicates. We further show that while estimates are essential for finding a good join order, query performance is unsatisfactory if the query engine relies too heavily on these estimates. Using another set of experiments that measure the impact of the cost model, we find that it has much less influence on query performance than the cardinality estimates. We investigate plan enumeration techniques comparing exhaustive dynamic programming with heuristic algorithms and find that exhaustive enumeration improves performance despite the suboptimal cardinality estimates. Finally, we extend our investigation from main-memory only, to also include disk-based query processing. Here, we find that though accurate cardinality estimation should be the first priority, other aspects such as modeling random versus sequential I/O are also important to predict query runtime.

Keywords

Query optimization Join ordering Cardinality estimation Cost models 

Notes

Acknowledgements

We would like to thank Guy Lohman and the anonymous reviewers for their valuable feedback. We also thank Moritz Wilfer for his input in the early stages of this project.

References

  1. 1.
    Ahmed, R., Sen, R., Poess, M., Chakkappen, S.: Of snowstorms and bushy trees. PVLDB 7(13), 1452–1461 (2014)Google Scholar
  2. 2.
    Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD, pp. 119–130 (2005)Google Scholar
  3. 3.
    Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in Oracle. PVLDB 6(11), 1102–1113 (2013)Google Scholar
  4. 4.
    Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC, pp. 61–76 (2013)Google Scholar
  5. 5.
    Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE, pp. 315–326 (2015)Google Scholar
  6. 6.
    Bruno, N., Galindo-Legaria, C.A., Joshi, M.: Polynomial heuristics for query optimization. In: ICDE, pp. 589–600 (2010)Google Scholar
  7. 7.
    Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD, pp. 961–968 (2009)Google Scholar
  8. 8.
    Chaudhuri, S., Narasayya, V.R., Ramamurthy, R.: Exact cardinality query optimization for optimizer testing. PVLDB 2(1), 994–1005 (2009)Google Scholar
  9. 9.
    Colgan, M.: Oracle adaptive joins. https://blogs.oracle.com/optimizer/entry/what_s_new_in_12c (2013)
  10. 10.
    Dutt, A., Haritsa, J.R.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD, pp. 1039–1050 (2014)Google Scholar
  11. 11.
    Estan, C., Naughton, J.F.: End-biased samples for join cardinality estimation. In: ICDE, p. 20 (2006)Google Scholar
  12. 12.
    Fegaras, L.: A new heuristic for optimizing large queries. In: DEXA, pp. 726–735 (1998)Google Scholar
  13. 13.
    Fender, P., Moerkotte, G.: Counter strike: generic top-down join enumeration for hypergraphs. PVLDB 6(14), 1822–1833 (2013)Google Scholar
  14. 14.
    Fender, P., Moerkotte, G., Neumann, T., Leis, V.: Effective and robust pruning for top-down join enumeration algorithms. In: ICDE, pp. 414–425 (2012)Google Scholar
  15. 15.
    Fraser, C., Giakoumakis, L., Hamine, V., Moore-Smith, K.F.: Testing cardinality estimation models in SQL Server. In: DBtest (2012)Google Scholar
  16. 16.
    Graefe, G.: A generalized join algorithm. In: BTW, pp. 267–286 (2011)Google Scholar
  17. 17.
    Gu, Z., Soliman, M.A., Waas, F.M.: Testing the accuracy of query optimizers. In: DBTest (2012)Google Scholar
  18. 18.
    Haas, P.J., Naughton, J.F., Seshadri, S., Swami, A.N.: Selectivity and cost estimation for joins based on random sampling. J. Comput. Syst. Sci. 52(3), 550–569 (1996)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Haritsa, J.R.: The Picasso database query optimizer visualizer. PVLDB 3(2), 1517–1520 (2010)Google Scholar
  20. 20.
    Ilyas, I.F., Markl, V., Haas, P.J., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD, pp. 647–658 (2004)Google Scholar
  21. 21.
    Ioannidis, Y.E.: The history of histograms (abridged). In: VLDB, pp. 19–30 (2003)Google Scholar
  22. 22.
    Ioannidis, Y.E., Christodoulakis, S.: On the propagation of errors in the size of join results. In: SIGMOD (1991)Google Scholar
  23. 23.
    Kader, R.A., Boncz, P.A., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD, pp. 615–626 (2009)Google Scholar
  24. 24.
    Kaushik, R., Ré, C., Suciu, D.: General database statistics using entropy maximization. In: DBPL, pp. 84–99 (2009)Google Scholar
  25. 25.
    Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: Should I scan or should I probe? In: SIGMOD (2017)Google Scholar
  26. 26.
    Lang, H., Mühlbauer, T., Funke, F., Boncz, P.A., Neumann, T., Kemper, A.: Data blocks: hybrid OLTP and OLAP on compressed storage using both vectorization and compilation. In: SIGMOD, pp. 311–326 (2016)Google Scholar
  27. 27.
    Leis, V., Boncz, P., Kemper, A., Neumann, T.: Morsel-driven parallelism: a NUMA-aware query evaluation framework for the many-core age. In: SIGMOD (2014)Google Scholar
  28. 28.
    Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)Google Scholar
  29. 29.
    Leis, V., Kundhikanjana, K., Kemper, A., Neumann, T.: Efficient processing of window functions in analytical SQL queries. PVLDB 8(10), 1058 (2015)Google Scholar
  30. 30.
    Leis, V., Radke, B., Gubichev, A., Kemper, A., Neumann, T.: Cardinality estimation done right: index-based join sampling. In: CIDR (2017)Google Scholar
  31. 31.
    Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE, pp. 26–35 (2007)Google Scholar
  32. 32.
    Liu, F., Blanas, S.: Forecasting the cost of processing multi-join queries via hashing for main-memory databases. In: SoCC, pp. 153–166 (2015)Google Scholar
  33. 33.
    Lohman, G.: Is query optimization a solved problem? http://wp.sigmod.org/?p=1075 (2014)
  34. 34.
    Mackert, L.F., Lohman, G.M.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD, pp. 84–95 (1986)Google Scholar
  35. 35.
    Markl, V., Megiddo, N., Kutsch, M., Tran, T.M., Haas, P.J., Srivastava, U.: Consistently estimating the selectivity of conjuncts of predicates. In: VLDB, pp. 373–384 (2005)Google Scholar
  36. 36.
    Moerkotte, G., Neumann, T.: Dynamic programming strikes back. In: SIGMOD, pp. 539–552 (2008)Google Scholar
  37. 37.
    Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. PVLDB 2(1), 982–993 (2009)Google Scholar
  38. 38.
    Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD, pp. 1123–1136 (2015)Google Scholar
  39. 39.
    Neumann, T.: Query simplification: graceful degradation for join-order optimization. In: SIGMOD, pp. 403–414 (2009)Google Scholar
  40. 40.
    Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: BTW, pp. 73–92 (2013)Google Scholar
  41. 41.
    O’Neil, P.E., O’Neil, E.J., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: TPCTC, pp. 237–252 (2009)Google Scholar
  42. 42.
    Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: VLDB, pp. 486–495 (1997)Google Scholar
  43. 43.
    Pöss, M., Nambiar, R.O., Walrath, D.: Why you should run TPC-DS: a workload analysis. In: PVLDB, pp. 1138–1149 (2007)Google Scholar
  44. 44.
    Rusu, F., Dobra, A.: Sketches for size of join estimation. TODS 33(3), 15 (2008)CrossRefGoogle Scholar
  45. 45.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)Google Scholar
  46. 46.
    Steinbrunn, M., Moerkotte, G., Kemper, A.: Heuristic and randomized optimization for the join ordering problem. VLDB J. 6(3), 191–208 (1997)CrossRefGoogle Scholar
  47. 47.
    Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO—DB2’s learning optimizer. In: VLDB, pp. 19–28 (2001)Google Scholar
  48. 48.
    Tzoumas, K., Deshpande, A., Jensen, C.S.: Lightweight graphical models for selectivity estimation without independence assumptions. PVLDB 4(11), 852–863 (2011)Google Scholar
  49. 49.
    Waas, F., Pellenkoft, A.: Join order selection-good enough is easy. In: BNCOD, pp. 51–67 (2000)Google Scholar
  50. 50.
    Waas, F.M., Giakoumakis, L., Zhang, S.: Plan space analysis: an early warning system to detect plan regressions in cost-based optimizers. In: DBTest (2011)Google Scholar
  51. 51.
    Wu, W., Chi, Y., Zhu, S., Tatemura, J., Hacigümüs, H., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: ICDE, pp. 1081–1092 (2013)Google Scholar
  52. 52.
    Wu, W., Naughton, J.F., Singh, H.: Sampling-based query re-optimization. In: SIGMOD (2016)Google Scholar
  53. 53.
    Yu, F., Hou, W., Luo, C., Che, D., Zhu, M.: CS2: a new database synopsis for query estimation. In: SIGMOD, pp. 469–480 (2013)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  • Viktor Leis
    • 1
  • Bernhard Radke
    • 1
  • Andrey Gubichev
    • 1
  • Atanas Mirchev
    • 1
  • Peter Boncz
    • 2
  • Alfons Kemper
    • 1
  • Thomas Neumann
    • 1
  1. 1.Technische Universität MünchenGarchingGermany
  2. 2.CWIAmsterdamThe Netherlands

Personalised recommendations