Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach
In fast OLAP applications it is often advantageous to provide approximate answers to range queries in order to achieve very high performances. A possible solution is to inquire summary data rather than the original ones and to perform suitable interpolations. Approximate answers become mandatory in situations where only aggregate data are available. This paper studies the problem of estimating range queries (namely, sum and count) over aggregate data using a probabilistic approach for computing expected value and variance of the answers. The novelty of this approach is the exploitation of possible integrity constraints about the presence of elements in the range that are known to be null or non- null. Closed formulas for all results are provided, and some interesting applications for query estimations on histograms are discussed.
Unable to display preview. Download preview PDF.
- 1.S. Acharya, P.B. Gibbons, Poosala, S. Ramaswamy. Join Synopses for Approximate Query Answering, In Proc. of SIGMOD International Conference On Management Of Data June 1999.Google Scholar
- 2.Barbara, D., DuMouchel, W., Faloutsos, C., Haas, P.J., Hellerstein, J.M., Ionnidis, Y., Jagadish, H.V., Johnson, T., Ng, R., Poosala, V., Ross, K.A., Sevcik, K.C., The New Jersey data reduction report, Bulletin of the Technical Committee on Data Engineering 20, 4, 3–45, 1997.Google Scholar
- 3.Buccafurri, F., Rosaci, D., Sacca’, D., Compressed datacubes for fast OLAP applications, DaWaK 1999, Florence, 65–77.Google Scholar
- 4.Buccafurri, F., Pontieri, L., Rosaci, D., Sacca’, D., Improving Range Query Estimation on Histograms, unpublished manuscript, 2000.Google Scholar
- 5.Chaudhuri, S., Dayal, U., An Overview of Data Warehousing and OLAP Technology, ACM SIGMOD Record 26(1), March 1997.Google Scholar
- 6.C. Faloutsos, H. V. Jagadish, N. D. Sidiripoulos. Recovering Information from Summary Data. In Proceedings of the 1997 VLDB Very Large Data Bases Conference, Athens, 1997Google Scholar
- 7.W. Feller, An introduction to probability theory and its applications. John Wiley & Sons, 1968.Google Scholar
- 8.P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, June 1998Google Scholar
- 9.P.B. Gibbons, Y. Matias, V. Poosala. AQUA Project White Paper, At http://www.bell-labs.com/user/pbgibbons/papers, 1997.
- 10.P.B. Gibbons, Y. Matias, V. Poosala. Fast incremental maintenance of approximate histograms, Proc. of the 23rd VLDB Conf., 466–475, August 1997.Google Scholar
- 11.Gray, J., Bosworth, A., Layman, A., Pirahesh, H., Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proc. of the ICDE 1996, pp. 152–159Google Scholar
- 12.J. M. Hellerstein, P. J. Haas, H. J. Wang. Online Aggregation. In Proceedings of 1997 ACM SIGMOD International Conference on Management of Data, pages 171–182, 1997Google Scholar
- 13.Harinarayan, V., Rajaraman, A., Ullman, J. D., Implementing Data Cubes Efficiently, Proc. of the ACM SIGMOD 1996, pp. 205–216Google Scholar
- 14.Y. Ioannidis, V. Poosala. Balancing histogram optimality and practicality for query result size estimation. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pages 233–244, 1995Google Scholar
- 15.V. Poosala. Histogram-based Estimation Techniques in Database Systems. PhD dissertation, University of Wisconsin-Madison, 1997Google Scholar
- 16.V. Poosala, Y. E. Ioannidis, P. J. Haas, E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 294–305, 1996Google Scholar
- 17.P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. T. Price. Access path selection in a relational database management system. In Proc. of ACM SIGMOD Internatinal Conference, pages 23–24, 1979Google Scholar
- 18.J. S. Vitter, M. Wang, B. Iyer. Data Cube Approximation and Histograms via Wavelets. In Proceedings of the 1998 CIKM International Conference on Information and Knowledge Management, Washington, 1998Google Scholar