Join Order Selection ( Good Enough Is Easy )
Uniform sampling of join orders is known to be a competitive alternative to transformation-based optimization techniques. However, uniformity of the sampling process is difficult to establish and only for a restricted class of join queries techniques are known.
In this paper, we investigate non-uniform sampling devising a simple yet powerful algorithm that is generally applicable. The key element of the algorithm is a mapping of randomly generated sequences of join predicates to query plans. We take advantage of the bottom-up constructing of query plans by simultaneously computing the costs and discarding partial plans as soon as they exceed the best costs found so far, which implements a highly effective cost-bound pruning component.
Sampling does not produce the optimal plan but a near-optimal solution which is fully sufficient as the cost function grows more and more inaccurate with increasing query size. In return, our algorithm establishes a well-balanced trade-off between result quality and time invested in the optimization process.
KeywordsCost Model Uniform Sampling Large Data Base Cost Distribution Query Plan
Unable to display preview. Download preview PDF.
- [EN94]E. Elmasri and S. B. Navathe. Fundamentals of Database Sytems. Benjamin/Cummings, Redwood City, CA, USA, 2nd edition, 1994.Google Scholar
- [GLPK94]C. A. Galindo-Legaria, A. Pellenkoft, and M. L. Kersten. Fast, Randomized Join-Order Selection — Why Use Transformations? In Proc. of the Int’l. Conf. on Very Large Data Bases, pages 85–95, Santiago, Chile, September 1994.Google Scholar
- [GLPK95]C. A. Galindo-Legaria, A. Pellenkoft, and M. L. Kersten. Uniformly-distributed Random Generation of Join Orders. In Proc. of the Int’l. Conf. on Database Theory, pages 280–293, Prague, Czech Republic, January 1995.Google Scholar
- [IC91]Y. E. Ioannidis and S. Christodoulakis. On the Propagation of Errors in the Size of Join Results. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 268–277, Denver, CO, USA, May 1991.Google Scholar
- [IK91]Y. E. Ioannidis and Y. C. Kang. Left-Deep vs. Bushy Trees: An Analysis of Strategy Spaces and its Implications for Query Optimization. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 168–177, Denver, CO, USA, May 1991.Google Scholar
- [IW87]Y. E. Ioannidis and E. Wong. Query Optimization by Simulated Annealing. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 9–22, San Francisco, CA, USA, May 1987.Google Scholar
- [KBZ86]R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of Nonrecursive Queries. In Proc. of the Int’l. Conf. on Very Large Data Bases, pages 128–137, Kyoto, Japan, August 1986.Google Scholar
- [KS91]H. Korth and A. Silberschatz. Database Systems Concepts. McGraw-Hill, Inc., New York, San Francisco, Washington, DC, USA, 1991.Google Scholar
- [LVZ93]R. S. G. Lanzelotte, P. Valduriez, and M. Zaït. On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces. In Proc. of the Int’l. Conf. on Very Large Data Bases, pages 493–504, Dublin, Ireland, August 1993.Google Scholar
- [Pel97]A. Pellenkoft. Probabilistic and Transformation based Query Optimization. PhD thesis, Universiteit van Amsterdam, Amsterdam, The Netherlands, 1997.Google Scholar
- [PGLK97]A. Pellenkoft, C. A. Galindo-Legaria, and M. L. Kersten. The Complexity of Transformation-Based Join Enumeration. In Proc. of the Int’l. Conf. on Very Large Data Bases, pages 306–315, Athens, Greece, September 1997.Google Scholar
- [SAC+79]_P. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. Access Path Selection in a Relational Database Management System. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 23–34, Boston, MA, USA, May 1979.Google Scholar
- [SG88]A. Swami and A. Gupta. Optimizing Large Join Queries. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 8–17, Chicago, IL, USA, June 1988.Google Scholar
- [SI93]A. Swami and B. R. Iyer. A Polynomial Time Algorithm for Optimizing Join Queries. In Proc. of the IEEE Int’l. Conf. on Data Engineering, pages 345–354, Vienna, Austria, April 1993.Google Scholar
- [SM97]W. Scheufele and G. Moerkotte. On the Complexity of Generating Optimal Plans with Cross Products. In Proc. of the ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, pages 238–248, Tucson, AZ, USA, May 1997.Google Scholar
- [Ste96]M. Steinbrunn. Heuristic and Randomised Optimisation Techniques in Object-Oriented Database. DISDBIS. infix, Sankt Augustin, Germany, 1996.Google Scholar
- [Swa89]A. Swami. Optimization of Large Join Queries: Combining Heuristics and Combinatorial Techniques. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 367–376, Portland, OR, USA, June 1989.Google Scholar
- [Tra98]Transaction Processing Performance Council, San Jose, CA, USA. TPC Benchmark D (Decision Support), Revision 1.3.1, 1998.Google Scholar
- [VM96]B. Vance and D. Maier. Rapid Bushy Join-order Optimization with Cartesian Products. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, pages 35–46, Montreal, Canada, June 1996.Google Scholar
- [Waa99]F. Waas. Cost Distributions in Symmetric Euclidean Traveling Salesman Problems-A Supplement to TSPLIB. Technical Report INS-R9911, CWI, Amsterdam, The Netherlands, September 1999.Google Scholar
- [WGL00]F. Waas and C. A. Galindo-Legaria. Counting, Enumerating and Sampling of Execution Plans in a Cost-Based Query Optimizer. In Proc. of the ACM SIGMOD Int’l. Conf. on Management of Data, Dallas, TX, USA, May 2000. Accepted for publication.Google Scholar
- [WP98]F. Waas and A. Pellenkoft. Exploiting Cost Distributions for Query Optimization. Technical Report INS-R9811, CWI, Amsterdam, The Netherlands, October 1998.Google Scholar