Advertisement

The VLDB Journal

, Volume 27, Issue 4, pp 521–545 | Cite as

Smooth Scan: robust access path selection without cardinality estimation

  • Renata Borovica-Gajic
  • Stratos Idreos
  • Anastasia Ailamaki
  • Marcin Zukowski
  • Campbell Fraser
Regular Paper

Abstract

Query optimizers depend heavily on statistics representing column distributions to create good query plans. In many cases, though, statistics are outdated or nonexistent, and the process of refreshing statistics is very expensive, especially for ad hoc workloads on ever bigger data. This results in suboptimal plans that severely hurt performance. The core of the problem is the fixed decision on the type of physical operators that comprise a query plan. This paper makes a case for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the observed statistical properties of the data at run time. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no optimization decisions on the access paths up front. Additionally, by depending only on the result distribution and eschewing statistics and cardinality estimates altogether, Smooth Scan ensures repeatable execution across multiple query invocations. Smooth Scan implemented in PostgreSQL demonstrates robust, near-optimal performance on micro-benchmarks and real-life workloads, while being statistics oblivious at the same time.

Keywords

Access path selection Cardinality estimation Robust query execution Adaptive query processing DBMS 

Notes

Acknowledgements

We would like to thank the organizers of the Dagstuhl seminar 12321 on “Robust query processing” for the inspirational sessions and the introduction of robustness issues in query processing. In particular, we thank Goetz Graefe for his support throughout this work.

Supplementary material

778_2018_507_MOESM1_ESM.xlsx (6.3 mb)
Supplementary material 1 (xlsx 6482 KB)
778_2018_507_MOESM2_ESM.pptx (7.1 mb)
Supplementary material 2 (pptx 7283 KB)

References

  1. 1.
    Abdel Kader, R., Boncz, P., Manegold, S., van Keulen, M.: ROX: run-time optimization of XQueries. In: SIGMOD (2009)Google Scholar
  2. 2.
    Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: SIGMOD (1999)Google Scholar
  3. 3.
    Amsaleg, L., Franklin, M.J., Tomasic, A., Urhan, T.: Scrambling query plans to cope with unexpected delays. In: DIS (1996)Google Scholar
  4. 4.
    Antoshenkov, G.: Dynamic query optimization in Rdb/VMS. In: ICDE (1993)Google Scholar
  5. 5.
    Antoshenkov, G., Ziauddin, M.: Query processing and optimization in Oracle Rdb. PVLDB 5(4), 229–237 (1996)Google Scholar
  6. 6.
    Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)Google Scholar
  7. 7.
    Babcock, B., Chaudhuri, S.: Towards a robust query optimizer: a principled and practical approach. In: SIGMOD (2005)Google Scholar
  8. 8.
    Babu, S., Bizarro, P.: Adaptive query processing in the looking glass. In: CIDR (2005)Google Scholar
  9. 9.
    Babu, S., Bizarro, P., DeWitt, D.: Proactive re-optimization. In: SIGMOD (2005)Google Scholar
  10. 10.
    Barber, R., Bendel, P., Czech, M., Draese, O., Ho, F., Hrle, N., Idreos, S., Kim, M., Koeth, O., Lee, J., Li, T.T., Lohman, G.M., Morfonios, K., Müller, R., Murthy, K., Pandis, I., Qiao, L., Raman, V., Szabo, S., Sidle, R., Stolze, K.: Blink: not your father’s database! In: BIRTE (2011)Google Scholar
  11. 11.
    Bellamkonda, S., Li, H.G., Jagtap, U., Zhu, Y., Liang, V., Cruanes, T.: Adaptive and big data scale parallel execution in oracle. PVLDB 6(11), 1102–1113 (2013)Google Scholar
  12. 12.
    Bizarro, P., Babu, S., DeWitt, D., Widom, J.: Content-based routing: different plans for different data. In: PVLDB, pp. 757–768 (2005)Google Scholar
  13. 13.
    Boncz, P.A., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: TPCTC (2013)Google Scholar
  14. 14.
    Borovica, R., Alagiannis, I., Ailamaki, A.: Automated physical designers: what you see is (not) what you get. In: DBTest (2012)Google Scholar
  15. 15.
    Borovica-Gajic, R.: Toward timely, predictable and cost-effective data analytics. Ph.D. thesis (2016)Google Scholar
  16. 16.
    Borovica-Gajic, R., Appuswamy, R., Ailamaki, A.: Cheap data analytics using cold storage devices. PVLDB 9(12), 1029–1040 (2016)Google Scholar
  17. 17.
    Borovica-Gajic, R., Graefe, G., Lee, A.: Robust performance in database query processing (dagstuhl seminar 17222). Dagstuhl Rep. 7(5), 169–180 (2017)Google Scholar
  18. 18.
    Borovica-Gajic, R., Idreos, S., Ailamaki, A., Zukowski, M., Fraser, C.: Smooth scan: statistics-oblivious access paths. In: ICDE (2015)Google Scholar
  19. 19.
    Bouganim, L., Fabret, F., Mohan, C., Valduriez, P.: Dynamic query scheduling in data integration systems. In: ICDE (2000)Google Scholar
  20. 20.
    Bruno, N., Chaudhuri, S.: Efficient creation of statistics over query expressions. In: ICDE (2003)Google Scholar
  21. 21.
    Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A multidimensional workload-aware histogram. In: SIGMOD (2001)Google Scholar
  22. 22.
    Cao, L., Rundensteiner, E.A.: High performance stream query processing with correlation-aware partitioning. PVLDB 7(4), 265–276 (2013)Google Scholar
  23. 23.
    Cao, L.B.P.: Web caching and Zipf-like distributions: evidence and implications. In: INFOCOM (1999)Google Scholar
  24. 24.
    Chaudhuri, S.: Query optimizers: time to rethink the contract? In: SIGMOD (2009)Google Scholar
  25. 25.
    Chaudhuri, S., Narasayya, V., Ramamurthy, R.: A pay-as-you-go framework for query execution feedback. PVLDB 1(1), 1141–1152 (2008)Google Scholar
  26. 26.
    Chaudhuri, S., Narasayya, V.R.: Automating statistics management for query optimizers. In: ICDE (2000)Google Scholar
  27. 27.
    Chen, C.M., Roussopoulos, N.: Adaptive selectivity estimation using query feedback. In: SIGMOD (1994)Google Scholar
  28. 28.
    Chen, S., Ailamaki, A., Gibbons, P.B., Mowry, T.C.: Inspector joins. In: VLDB (2005)Google Scholar
  29. 29.
    Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. TODS 9(2), 163–186 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Chu, F.: Least expected cost query optimization: What can we expect. In: Proceedings of the ACM Symposim on Principles of Database Systems, pp. 293–302 (2002)Google Scholar
  31. 31.
    Cole, R.L., Graefe, G.: Optimization of dynamic query evaluation plans. In: SIGMOD (1994)Google Scholar
  32. 32.
    Curino, C., Jones, E., Popa, R.A., Malviya, N., Wu, E., Madden, S., Balakrishnan, H., Zeldovich, N.: Relational cloud: a database service for the cloud. In: CIDR (2011)Google Scholar
  33. 33.
    D., H., Darera, P.N., Haritsa, J.R.: On the production of anorexic plan diagrams. In: VLDB (2007)Google Scholar
  34. 34.
    Harish, D., Darera, P.N., Haritsa, J.R.: Identifying robust plans through plan diagram reduction. PVLDB 1(1), 1124–1140 (2008)Google Scholar
  35. 35.
    Deshpande, A., Ives, Z., Raman, V.: Adaptive query processing. Found. Trends Databases 1(1), 1–140 (2007).  https://doi.org/10.1561/1900000001 CrossRefGoogle Scholar
  36. 36.
    DeWitt, D.J., Naughton, J.F., Burger, J.: Nested loops revisited. In: PDIS (1993)Google Scholar
  37. 37.
    Dutt, A., Haritsa, J.: Plan bouquets: query processing without selectivity estimation. In: SIGMOD (2014)Google Scholar
  38. 38.
    Dutt, A., Narasayya, V.R., Chaudhuri, S.: Leveraging re-costing for online optimization of parameterized queries with guarantees. In: SIGMOD (2017)Google Scholar
  39. 39.
    Elhemali, M., Galindo-Legaria, C.A., Grabs, T., Joshi, M.M.: Execution strategies for SQL subqueries. In: SIGMOD (2007)Google Scholar
  40. 40.
    Eurviriyanukul, K., Paton, N.W., Fernandes, A.A.A., Lynden, S.J.: Adaptive join processing in pipelined plans. In: EDBT (2010)Google Scholar
  41. 41.
    Graefe, G.: Modern B-tree techniques. Found. Trends Databases 3(4), 203–402 (2011)CrossRefGoogle Scholar
  42. 42.
    Graefe, G.: New algorithms for join and grouping operations. Comput. Sci. 27(1), 3–27 (2012)Google Scholar
  43. 43.
    Graefe, G., Guy, W., Kuno, H.A., Paulley, G.N.: Robust query processing (dagstuhl seminar 12321). Dagstuhl Rep. 2(8), 1–15 (2012)Google Scholar
  44. 44.
    Graefe, G., König, A.C., Kuno, H.A., Markl, V., Sattler, K.U.: Robust query processing (dagstuhl seminar 10381). In: Robust Query Processing (2011)Google Scholar
  45. 45.
    Graefe, G., Kuno, H.A., Wiener, J.L.: Visualizing the robustness of query execution. In: CIDR (2009)Google Scholar
  46. 46.
    Graefe, G., Ward, K.: Dynamic query evaluation plans. In: SIGMOD (1989)Google Scholar
  47. 47.
    Gray, J.: Tape is dead, disk is tape, flash is disk. RAM locality is king, Presented at CIDR (2007)Google Scholar
  48. 48.
    Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: SIGMOD (1999)Google Scholar
  49. 49.
    Harris, L.: Stock price clustering and discreteness. Rev. Financ. Stud. 4(3), 389–415 (1991)CrossRefGoogle Scholar
  50. 50.
    Hellerstein, J.M., Franklin, M.J., Chandrasekaran, S., Deshpande, A., Hildrum, K., Madden, S., Raman, V., Shah, M.A.: Adaptive query processing: technology in evolution. IEEE Data Eng. Bull. 23, 2000 (2000)Google Scholar
  51. 51.
    Herodotou, H., Babu, S.: Xplus: a SQL-tuning-aware query optimizer. PVLDB 3(1–2), 1149–1160 (2010)Google Scholar
  52. 52.
    IBM: Managing big data for smart grids and smart meters. White paper. http://goo.gl/n1Ijtd (2012)
  53. 53.
    Ilyas, I.F., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD (2004)Google Scholar
  54. 54.
    Ioannidis, Y.E.: Query optimization. ACM Comput. Surv. 28(1), 121–123 (1996)CrossRefGoogle Scholar
  55. 55.
    Ioannidis, Y.E., Ng, R.T., Shim, K., Sellis, T.K.: Parametric query optimization. PVLDB 6(2), 132–151 (1997)Google Scholar
  56. 56.
    Ives, Z.G.: Efficient Query Processing for Data Integration. University of Washington, Seattle (2002)Google Scholar
  57. 57.
    Ives, Z.G., Florescu, D., Friedman, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD (1999)Google Scholar
  58. 58.
    Ives, Z.G., Halevy, A.Y., Weld, D.S.: Adapting to source properties in processing data integration queries. In: SIGMOD (2004)Google Scholar
  59. 59.
    Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD (1998)Google Scholar
  60. 60.
    Kester, M.S., Athanassoulis, M., Idreos, S.: Access path selection in main-memory optimized data systems: should I scan or should I probe? In: SIGMOD (2017)Google Scholar
  61. 61.
    Leis, V., Gubichev, A., Mirchev, A., Boncz, P.A., Kemper, A., Neumann, T.: How good are query optimizers, really? PVLDB 9(3), 204–215 (2015)Google Scholar
  62. 62.
    Li, Q., Shao, M., Markl, V., Beyer, K.S., Colby, L.S., Lohman, G.M.: Adaptively reordering joins during query execution. In: ICDE (2007)Google Scholar
  63. 63.
    Lohman, G.: Is query optimization a “solved” problem? In: ACM SIGMOD Blog (2014)Google Scholar
  64. 64.
    Mackert, L., Lohman, G.: R* optimizer validation and performance evaluation for local queries. In: SIGMOD (1986)Google Scholar
  65. 65.
    Markl, V., Raman, V., Simmen, D., Lohman, G., Pirahesh, H., Cilimdzic, M.: Robust query processing through progressive optimization. In: SIGMOD (2004)Google Scholar
  66. 66.
    Müller, I., Sanders, P., Lacurie, A., Lehner, W., Färber, F.: Cache-efficient aggregation: hashing is sorting. In: SIGMOD (2015)Google Scholar
  67. 67.
    Nehme, R.V., Rundensteiner, E.A., Bertino, E.: Self-tuning query mesh for adaptive multi-route query processing. In: EDBT, pp. 803–814 (2009)Google Scholar
  68. 68.
    Neumann, T., Galindo-Legaria, C.A.: Taking the edge off cardinality estimation errors using incremental execution. In: DBIS (2013)Google Scholar
  69. 69.
    Ramakrishnan, R., Gehrke, J.: Database Management Systems, 3rd edn. McGraw-Hill, New York (2003)zbMATHGoogle Scholar
  70. 70.
    Raman, V., Deshpande, A., Hellerstein, J.M.: Using state modules for adaptive query processing. In: ICDE (2003)Google Scholar
  71. 71.
    Schindler, J.: I/O characteristics of NoSQL databases. PVLDB 5(12), 2020–2021 (2012)Google Scholar
  72. 72.
    Schindler, J.: Profiling and analyzing the I/O performance of NoSQL DBs. In: SIGMETRICS (2013)Google Scholar
  73. 73.
    Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: SIGMOD (1979)Google Scholar
  74. 74.
    Srivastava, U., Haas, P.J., Markl, V., Kutsch, M., Tran, T.M.: ISOMER: consistent histogram construction using query feedback. In: ICDE (2006)Google Scholar
  75. 75.
    Stillger, M., Lohman, G.M., Markl, V., Kandil, M.: LEO–DB2’s learning optimizer. In: VLDB (2001)Google Scholar
  76. 76.
    TPC: TPC-H benchmark. http://www.tpc.org/tpch/
  77. 77.
    Urhan, T., Franklin, M.J., Amsaleg, L.: Cost-based query scrambling for initial delays. In: SIGMOD (1998)Google Scholar
  78. 78.
    Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB (2003)Google Scholar
  79. 79.
    Wilschut, A., Apers, P.: Dataflow query execution in a parallel main-memory environment. In: PDIS (1991)Google Scholar
  80. 80.
    Wong, E., Youssefi, K.: Decomposition—a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)CrossRefGoogle Scholar
  81. 81.
    Yin, S., Hameurlain, A., Morvan, F.: Robust query optimization methods with respect to estimation errors: a survey. SIGMOD Rec. 44(3), 25–36 (2015)CrossRefGoogle Scholar
  82. 82.
    Zhu, Y., Rundensteiner, E.A., Heineman, G.T.: Dynamic plan migration for continuous queries over data streams. In: SIGMOD (2004)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computing and Information SystemsThe University of MelbourneMelbourneAustralia
  2. 2.School of Engineering and Applied SciencesHarvard UniversityCambridgeUSA
  3. 3.School of Computer and Communication SciencesEcole Polytechnique Federale de Lausanne and RAW LabsLausanneSwitzerland
  4. 4.Snowflake ComputingSan MateoUSA
  5. 5.Google Inc.SeattleUSA

Personalised recommendations