On the estimation of join result sizes

  • Arun Swami
  • K. Bernhard Schiefer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 779)

Abstract

Good estimates of join result sizes are critical for query optimization in relational database management systems. We address the problem of incrementally obtaining accurate and consistent estimates of join result sizes. We have invented a new rule for choosing join selectivities for estimating join result sizes. The rule is part of a new unified algorithm called Algorithm ELS (Equivalence and Largest Selectivity). Prior to computing any result sizes, equivalence classes are determined for the join columns. The algorithm also takes into account the effect of local predicates on table and column cardinalities. These computations allow the correct selectivity values for each eligible join predicate to be computed. We show that the algorithm is correct and gives better estimates than current estimation algorithms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Christodoulakis. Estimating Block Transfers and Join Sizes. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 40–54, 1983.Google Scholar
  2. 2.
    S. Christodoulakis. Implications of Certain Assumptions in Database Performance Evaluation. ACM Transactions on Database Systems, 9(2):163–186, June 1984.Google Scholar
  3. 3.
    C. Faloutsos and H. V. Jagadish. On B-tree Indices for Skewed Distributions. In Proceedings of the Eighteenth International Conference on Very Large Data Bases, pages 363–374, Vancouver, British Columbia, 1992. Morgan Kaufman.Google Scholar
  4. 4.
    Y. E. Ioannidis and S. Christodoulakis. On the Propogation of Errors in the Size of Join Results. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 268–277, Denver, Colorado, 1991.Google Scholar
  5. 5.
    Y.C. Kang. Randomized Algorithms for Query Optimization. PhD thesis, University of Wisconsin-Madison, October 1991. TR 1053.Google Scholar
  6. 6.
    C. A. Lynch. Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distributions of Column Values. In Proceedings of the Fourteenth International Conference on Very Large Data Bases, pages 240–251, Los Angeles, USA, 1988. Morgan Kaufman.Google Scholar
  7. 7.
    M. V. Mannino, P. Chu, and T. Sager. Statistical Profile Estimation in Database Systems. ACM Computing Surveys, 20(3):191–221, September 1988.Google Scholar
  8. 8.
    M. Muralikrishna and D. J. Dewitt. Equi-Depth Histograms for Estimating Selectivity Factors for Multi-Dimensional Queries. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 28–36, Chicago, Illinois, 1988.Google Scholar
  9. 9.
    K. Ono and G. M. Lohman. Measuring the Complexity of Join Enumeration in Query Optimization. In Proceedings of the Sixteenth International Conference on Very Large Data Bases, pages 314–325, Brisbane, Australia, 1990. Morgan Kaufman.Google Scholar
  10. 10.
    G. Piatetsky-Shapiro and C. Connell. Accurate Estimation of the Number of Tuples Satisfying a Condition. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 256–276, 1984.Google Scholar
  11. 11.
    H. Pirahesh, J. Hellerstein, and W. Hasan. Extensible/Rule Based Query Rewrite Optimization in Starburst. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 39–48, San Diego, California, 1992.Google Scholar
  12. 12.
    A. Rosenthal. Note on the Expected Size of a Join. ACM-SIGMOD Record, pages 19–25, July 1981.Google Scholar
  13. 13.
    P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In Proceedings of ACM-SIGMOD International Conference on Management of Data, pages 23–34, 1979.Google Scholar
  14. 14.
    A. Swami. Optimization of Large Join Queries. PhD thesis, Stanford University, June 1989. STAN-CS-89-1262.Google Scholar
  15. 15.
    A. Swami and B. Iyer. A Polynomial Time Algorithm for Optimizing Join Queries. In Proceedings of IEEE Data Engineering Conference, pages 345–354. IEEE Computer Society, April 1993.Google Scholar
  16. 16.
    A. Swami and K. B. Schiefer. On the Estimation of Join Result Sizes. Technical report, IBM Research Division, October 1993. IBM Research Report RJ 9569.Google Scholar
  17. 17.
    G. K. Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading, MA, 1949.Google Scholar

Copyright information

© Springer-Verlag 1994

Authors and Affiliations

  • Arun Swami
    • 1
  • K. Bernhard Schiefer
    • 2
  1. 1.IBM Almaden Research CenterSan Jose
  2. 2.IBM Toronto LabNorth YorkCanada

Personalised recommendations