Projection Pushing Revisited

  • Benjamin J. McMahan
  • Guoqiang Pan
  • Patrick Porter
  • Moshe Y. Vardi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

The join operation, which combines tuples from multiple relations, is the most fundamental and, typically, the most expensive operation in database queries. The standard approach to join-query optimization is cost based, which requires developing a cost model, assigning an estimated cost to each query-processing plan, and searching in the space of all plans for a plan of minimal cost. Two other approaches can be found in the database-theory literature. The first approach, initially proposed by Chandra and Merlin, focused on minimizing the number of joins rather then on selecting an optimal join order. Unfortunately, this approach requires a homomorphism test, which itself is NP-complete, and has not been pursued in practical query processing. The second, more recent, approach focuses on structural properties of the query in order to find a project-join order that will minimize the size of intermediate results during query evaluation. For example, it is known that for Boolean project-join queries a project-join order can be found such that the arity of intermediate results is the treewidth of the join graph plus one.

In this paper we pursue the structural-optimization approach, motivated by its success in the context of constraint satisfaction. We chose a setup in which the cost-based approach is rather ineffective; we generate project-join queries with a large number of relations over databases with small relations. We show that a standard SQL planner (we use PostgreSQL) spends an exponential amount of time on generating plans for such queries, with rather dismal results in terms of performance. We then show how structural techniques, including projection pushing and join reordering, can yield exponential improvements in query execution time. Finally, we combine early projection and join reordering in an implementation of the bucket-elimination method from constraint satisfaction to obtain another exponential improvement.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Aho, A., Sagiv, Y., Ullman, J.D.: Efficient optimization of a class of relational expressions. ACM Trans. on Database Systems 4, 435–454 (1979)CrossRefGoogle Scholar
  3. 3.
    Aho, A., Sagiv, Y., Ullman, J.D.: Equivalence of relational expressions. SIAM Journal on Computing 8, 218–246 (1979)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Apers, P., Hevner, A., Yao, S.: Optimization algorithms for distributed queries. IEEE Trans. Software Engineering 9(1), 57–68 (1983)CrossRefGoogle Scholar
  5. 5.
    Arnborg, S., Corneil, D.G., Proskurowski, A.: Complexity of finding embeddings in a k-tree. SIAM Journal of Algebraic and Discrete Methods 8(2), 277–284 (1987)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Bodlaender, H.L.: A tourist guide through treewidth. Acta Cybernetica 11, 1–21 (1993)MATHMathSciNetGoogle Scholar
  7. 7.
    Bouquet, F.: Gestion de la dynamicité et énumération d’implicants premiers: une approche fondée sur les Diagrammes de Décision Binaire. PhD thesis, Université de Provence, France (1999)Google Scholar
  8. 8.
    Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational databases. In: Proc. 9th ACM Symp. on Theory of Computing, pp. 77–90 (1977)Google Scholar
  9. 9.
    Chauhan, P., Clarke, E.M., Jha, S., Kukula, J.H., Veith, H., Wang, D.: Using combinatorial optimization methods for quantification scheduling. In: Proc. 11th Conf. on Correct Hardware Design and Verification Methods, pp. 293–309 (2001)Google Scholar
  10. 10.
    Chekuri, C., Ramajaran, A.: Conjunctive query containment revisited. Technical report, Stanford University (November 1998)Google Scholar
  11. 11.
    Dalmau, V., Kolaitis, P.G., Vardi, M.Y.: Constraint satisfaction, bounded treewidth, and finite-variable logics. In: Van Hentenryck, P. (ed.) CP 2002. LNCS, vol. 2470, pp. 311–326. Springer, Heidelberg (2002)Google Scholar
  12. 12.
    Dechter, R.: Mini-buckets: A general scheme for generating approximations in automated reasoning. In: International Joint Conference on Artificial Intelligence, pp. 1297–1303 (1997)Google Scholar
  13. 13.
    Dechter, R.: Bucket elimination: a unifying framework for reasoning. Artificial Intelligence 113(1-2), 41–85 (1999)MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Dechter, R.: Constraint Processing. Morgan Kaufmann, San Francisco (2003)Google Scholar
  15. 15.
    Dechter, R., Pearl, J.: Network-based heuristics for constraint-satisfaction problems. Artificial Intelligence 34, 1–38 (1987)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Downey, R.G., Fellows, M.R.: Parametrized Complexity. Springer, Heidelberg (1999)Google Scholar
  17. 17.
    Freuder, E.C.: Complexity of k-tree structured constraint satisfaction problems. In: Proc. AAAI 1990, pp. 4–9 (1990)Google Scholar
  18. 18.
    Freytag, J.C.: A rule-based view of query optimization. In: Proceedings of the 1987 ACM SIGMOD international conference on Management of data, pp. 173–180 (1987)Google Scholar
  19. 19.
    Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice-Hall, Englewood Cliffs (2000)Google Scholar
  20. 20.
    Garey, M.R., Johnson, D.S.: Computers and Intractability, A Guide to the Theory of NP-Completeness. W. H. Freeman, New York (1979)MATHGoogle Scholar
  21. 21.
    Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. In: Proc. 18th ACM Symp. on Principles of Database Systems, pp. 21–32 (1999)Google Scholar
  22. 22.
    Griffiths, P.P., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: ACM SIGMOD International Conference on Management of Data, pp. 23–34 (1979)Google Scholar
  23. 23.
    Halevy, A.: Answering queries using views: A survey. VLDB Journal, 270–294 (2001)Google Scholar
  24. 24.
    Hojati, R., Krishnan, S.C., Brayton, R.K.: Early quantification and partitioned transition relations. In: Proc. 1996 Int’l Conf. on Computer Design, pp. 12–19 (1996)Google Scholar
  25. 25.
    Ioannidis, Y., Wong, E.: Query optimization by simulated annealing. In: ACM SIGMOD International Conference on Management of Data, pp. 9–22 (1987)Google Scholar
  26. 26.
    Kolaitis, P.G., Vardi, M.Y.: Conjunctive-query containment and constraint satisfaction. Journal of Computer and System Sciences, 302–332 (2000); Earlier version in: Proc. 17th ACM Symp. on Principles of Database Systems (PODS 1998) (1998)Google Scholar
  27. 27.
    Kunen, I.K., Suciu, D.: A scalable algorithm for query minimization. Technical report, University of Washington (2002)Google Scholar
  28. 28.
    Ramakrishnan, R., Beeri, C., Krishnamurthi, R.: Optimizing existential datalog queries. In: Proceedings of the ACM Symposium on Principles of Database Systems, pp. 89–102 (1988)Google Scholar
  29. 29.
    Rish, I., Dechter, R.: Resolution versus search: Two strategies for SAT. Journal of Automated Reasoning 24(1/2), 225–275 (2000)MATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    San Miguel Aguirre, A., Vardi, M.Y.: Random 3-SAT and BDDs – the plot thickens further. In: Walsh, T. (ed.) CP 2001. LNCS, vol. 2239, pp. 121–136. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  31. 31.
    Tarjan, R.E., Yannakakis, M.: Simple linear-time algorithms to tests chordality of graphs, tests acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM J. on Computing 13(3), 566–579 (1984)MATHCrossRefMathSciNetGoogle Scholar
  32. 32.
    Ullman, J.D.: Database and Knowledge-Base Systems, vol. I and II. Computer Science Press, Rockville (1989)Google Scholar
  33. 33.
    Vardi, M.Y.: On the complexity of bounded-variable queries. In: Proc. 14th ACM Symp. on Principles of Database Systems, pp. 266–276 (1995)Google Scholar
  34. 34.
    Wong, E., Youssefi, K.: Decomposition - a strategy for query processing. ACM Trans. on Database Systems 1(3), 223–241 (1976)CrossRefGoogle Scholar
  35. 35.
    Yannakakis, M.: Algorithms for acyclic database schemes. In: Proc. 7 Int’l Conf. on Very Large Data Bases, pp. 82–94 (1981)Google Scholar
  36. 36.
    Yerneni, R., Li, C., Ullman, J.D., Garcia-Molina, H.: Optimizing large join queries in mediation systems. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 348–364. Springer, Heidelberg (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Benjamin J. McMahan
    • 1
  • Guoqiang Pan
    • 1
  • Patrick Porter
    • 2
  • Moshe Y. Vardi
    • 1
  1. 1.Department of Computer ScienceRice UniversityHoustonU.S.A.
  2. 2.Scalable Software 

Personalised recommendations