A Framework for the Physical Design Problem for Data Synopses

  • Arnd Christian König
  • Gerhard Weikum
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2287)

Abstract

Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. To this end a plethora of techniques have been proposed for maintaining a compact data “synopsis” on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute the optimal combination of synopses for a given workload and a limited amount of available memory. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Acharya, P. B. Gibbons, V. Poosala, and S. Ramaswamy. Join Synopses for Approximate Query Answering. In Proceedings of the ACM SIGMOD Conference, pages 275–286. ACM Press, 1999.Google Scholar
  2. 2.
    B. Blohsfeld, D. Korus, and B. Seeger. A Comparison of Selectivity Estimators for Range Queries on Metric Attributes. In Proceedings of the ACM SIGMOD Conference, pages 239–250, 1999.Google Scholar
  3. 3.
    K. Chakrabarti, M. N. Garofalakis, R. Rastogi, and K. Shim. Approximate query processing using wavelets. In Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 111–122, 2000.Google Scholar
  4. 4.
    S. Chaudhuri. An overview of query optimization in relational systems. In Proceedings of ACM PODS Conference, pages 34–43, 1998.Google Scholar
  5. 5.
    S. Chaudhuri, R. Motwani, and V. R. Narasayya. On Random Sampling over Joins. In Proceedings of the ACM SIGMOD Conference, pages 263–274, 1999.Google Scholar
  6. 6.
    S. Chaudhuri and V. R. Narasayya. Automating Statistics management for Query Optimizers. IEEE Conference on Data Engineering, pages 339–348, 2000.Google Scholar
  7. 7.
    C. M. Chen and N. Roussoploulos. Adaptive Selectivity Estimation Using Query Feedback. In Proceedings of the ACM SIGMOD Conference, pages 161–172, 1994.Google Scholar
  8. 8.
    V. Ganti, M.-L. Lee, and R. Ramakrishnan. Icicles: Self-tuning samples for approximate query answering. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, Cairo, Egypt, pages 176–187, 2000.Google Scholar
  9. 9.
    P. B. Gibbons, S. Acharya, Y. Bartal, Y. Matias, S. Muthukrishnan, V. Poosala, S. Ramaswamy, and T. Suel. Aqua: System and techniques for approximate query answering. Technical report, Bell Labs, 1998.Google Scholar
  10. 10.
    P. B. Gibbons and Y. Matias. New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In Proceedings of the ACM SIGMOD Conference, 1998.Google Scholar
  11. 11.
    P. B. Gibbons and Y. Matias. Synopsis Data Structures for Massive Data Sets. In Symposium on Discrete Algorithms, 1999.Google Scholar
  12. 12.
    P. B. Gibbons, Y. Matias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms. In Proceedings of the 23rd International Conference on Very Large Databases, 1997.Google Scholar
  13. 13.
    P. J. Haas. Selectivity and Cost Estimation for Joins Based on Random Sampling. Journal of Computer and System Sciences, pages 550–569, 1996.Google Scholar
  14. 14.
    Y. E. Ioannidis and V. Poosala. Histogram-Based Approximation of Set-Valued Query-Answers. In Proceedings of 25th International Conference on Very Large Data Bases, pages 174–185, 1999.Google Scholar
  15. 15.
    H. Jagadish, H. Jin, B. C. Ooi, and K.-L. Tan. Global Optimization of Histograms. In Proceedings of the ACM SIGMOD Conference. ACM Press, 2001.Google Scholar
  16. 16.
    H. V. Jagadish, N. Koudas, S. Mutukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal Histograms with Quality Guarantees. In Proceedings 24th International Conference on Very Large Databases, pages 275–286, 1998.Google Scholar
  17. 17.
    N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the ACM SIGMOD Conference, 1998.Google Scholar
  18. 18.
    A. König and G. Weikum. Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation. In 25th International Conference on Very Large Databases, 1999.Google Scholar
  19. 19.
    A. König and G. Weikum. Auto-Tuned Spline Synopses for Database Statistics Management. 10th Int. Conference on the Management of Data, Pune, India, 2000.Google Scholar
  20. 20.
    A. König and G. Weikum. A Framework for the Physical Design Problem for Data Synopses(extended version) available at: http://www-dbs.cs.uni-sb.de/.
  21. 21.
    J.-H. Lee, D.-H. Kim, and C.-W. Chung. Multi-dimensional Selectivity Estimation Using Compressed Histogram Information. In Proceedings of the ACM SIGMOD Conference, pages 205–214, 1999.Google Scholar
  22. 22.
    Y. Matias, J. S. Vitter, and M. Wang. Wavelet-Based Histograms for Selectivity Estimation. In Proceedings of the ACM SIGMOD Conference, pages 448–459, 1998.Google Scholar
  23. 23.
    V. Pooosala and Y. E. Ioannidis. Selectivity Estimation Without the Attribute Value Independence Assumption. In Proceedings of the ACM SIGMOD Conference, Athens, Greece, 1997.Google Scholar
  24. 24.
    V. Poosala. Histogram-based Estimation Techniques in Database Systems. PhD thesis, University of Wisconsin-Madison, 1997.Google Scholar
  25. 25.
    W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Receipes in C. Cambridge University Press, 1996.Google Scholar
  26. 26.
    E. Skubalska-Rafajlowicz. The Closed Curve Filling Multidimensional Cube, Technical Report no. 46/94. ICT Technical University of Wroclaw, 1994.Google Scholar
  27. 27.
    W. Sun, Y. Ling, N. Rishe, and Y. Deng. An instant and accurate Size Estimation Method for Joins and Selections in an Retrival-Intensive Environment. In Proceedings of the ACM SIGMOD Conference, pages 79–88, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Arnd Christian König
    • 1
  • Gerhard Weikum
    • 1
  1. 1.Department of Computer ScienceUniversity of the SaarlandSaarbrückenGermany

Personalised recommendations