Self-tuning UDF Cost Modeling Using the Memory-Limited Quadtree

  • Zhen He
  • Byung S. Lee
  • Robert R. Snapp
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

Query optimizers in object-relational database management systems require users to provide the execution cost models of user-defined functions(UDFs). Despite this need, however, there has been little work done to provide such a model. Furthermore, none of the existing work is self-tuning and, therefore, cannot adapt to changing UDF execution patterns. This paper addresses this problem by introducing a self-tuning cost modeling approach based on the quadtree. The quadtree has the inherent desirable properties to (1) perform fast retrievals, (2) allow for fast incremental updates (without storing individual data points), and (3) store information at different resolutions. We take advantage of these properties of the quadtree and add the following in order to make the quadtree useful for UDF cost modeling: the abilities to (1) adapt to changing UDF execution patterns and (2) use limited memory. To this end, we have developed a novel technique we call the memory-limited quadtree(MLQ). In MLQ, each instance of UDF execution is mapped to a query point in a multi-dimensional space. Then, a prediction is made at the query point, and the actual value at the point is inserted as a new data point. The quadtree is then used to store summary information of the data points at different resolutions based on the distribution of the data points. This information is used to make predictions, guide the insertion of new data points, and guide the compression of the quadtree when the memory limit is reached. We have conducted extensive performance evaluations comparing MLQ with the existing (static) approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hellerstein, J., Stonebraker, M.: Predicate migration: Optimizing queries with expensive predicates. In: Proc. of ACM-SIGMOD, pp. 267–276 (1993)Google Scholar
  2. 2.
    Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. In: Proc. of ACM SIGMOD, pp. 87–98 (1996)Google Scholar
  3. 3.
    Jihad, B., Kinji, O.: Cost estimation of user-defined methods in object-relational database systems. SIGMOD Record, 22–28 (1999)Google Scholar
  4. 4.
    Boulos, J., Viemont, Y., Ono, K.: A neural network approach for query cost evaluation. Trans. on Information Processing Society of Japan, 2566–2575 (1997)Google Scholar
  5. 5.
    Hellerstein, J.: Practical predicate placement. In: Proc. of ACM SIGMOD, pp. 325–335 (1994)Google Scholar
  6. 6.
    Aboulnaga, A., Chaudhuri, S.: Self-tuning histograms: building histograms without looking at data. In: Proc. of ACM SIGMOD, pp. 181–192 (1999)Google Scholar
  7. 7.
    Bruno, N., Chaudhuri, S., Gravano, L.: STHoles: A mulidimensional workloadaware histogram. In: Proc. of ACM SIGMOD, pp. 211–222 (2001)Google Scholar
  8. 8.
    Stillger, M., Lohman, G., Markl, V., Kandil, M.: LEO - DB2’s LEarning optimizer. In: Proc. of VLDB, pp. 19–28 (2001)Google Scholar
  9. 9.
    Hunter, G.M., Steiglitz, K.: Operations on images using quadtrees. IEEE Trans. on Pattern Analysis and Machine Intelligence 1, 145–153 (1979)CrossRefGoogle Scholar
  10. 10.
    Strobach, P.: Quadtree-structured linear prediction models for image sequence processing. IEEE Trans. on Pattern Analysis and Machine Intelligence 11, 742–748Google Scholar
  11. 11.
    Lee, J.W.: Joint optimization of block size and quantization for quadtree-based motion estimation. IEEE Trans. on Pattern Analysis 7, 909–911 (1998)Google Scholar
  12. 12.
    Aref, W.G., Samet, H.: Efficient window block retrieval in quadtree-based spatial databases. GeoInformatica 1, 59–91 (1997)CrossRefGoogle Scholar
  13. 13.
    Wang, F.: Relational-linear quadtree approach for two-dimensional spatial representation and manipulation. IEEE Trans. on Knowledge and Data Eng. 3, 118–122 (1991)CrossRefGoogle Scholar
  14. 14.
    Lazaridis, I., Mehrotra, S.: Progressive approximate aggregate queries with a multi-resolution tree structure. In: Proc. of ACM SIGMOD, pp. 401–413 (2001)Google Scholar
  15. 15.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. ch. 7, vol. 303, pp. 314–315. Morgan Kaufmann, San Francisco (2001)Google Scholar
  16. 16.
    Poosala, V., Ioannidis, Y.: Selectivity estimation without the attribute value independence assumption. In: Proc. of VLDB, pp. 486–495 (1997)Google Scholar
  17. 17.
    Buccafurri, F., Furfaro, F., Sacca, D., Sirangelo, C.: A quad-tree based multiresolution approach for two-dimensional summary data. In: Proc. of SSDBM, Cambridge, Massachusetts, USA (2003)Google Scholar
  18. 18.
    He, Z., Lee, B.S., Snapp, R.R.: Self-tuning UDF cost modeling using the memory limited quadtree. Technical Report CS-03-18, Department of Computer Science, University of Vermont (2003)Google Scholar
  19. 19.
    Deshpande, A., Garofalakis, M., Rastogi, R.: Independence is good: Dependency-based histogram synopses for high-dimensional data. In: Proc. of ACM SIGMOD, pp. 199–210 (2001)Google Scholar
  20. 20.
    Zipf, G.K.: Human behavior and the principle of least effort. Addison-Wesley, Reading (1949)Google Scholar
  21. 21.
    PSADA: Urban areas of pennsylvania state, http://www.pasda.psu.edu/access/urban.shtml (Last viewed:June 18, 2003)

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Zhen He
    • 1
  • Byung S. Lee
    • 1
  • Robert R. Snapp
    • 1
  1. 1.Department of Computer ScienceUniversity of VermontBurlington

Personalised recommendations