Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach

  • Francesco Buccafurri
  • Filippo Furfaro
  • Domenico Saccà
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1973)

Abstract

In fast OLAP applications it is often advantageous to provide approximate answers to range queries in order to achieve very high performances. A possible solution is to inquire summary data rather than the original ones and to perform suitable interpolations. Approximate answers become mandatory in situations where only aggregate data are available. This paper studies the problem of estimating range queries (namely, sum and count) over aggregate data using a probabilistic approach for computing expected value and variance of the answers. The novelty of this approach is the exploitation of possible integrity constraints about the presence of elements in the range that are known to be null or non- null. Closed formulas for all results are provided, and some interesting applications for query estimations on histograms are discussed.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    S. Acharya, P.B. Gibbons, Poosala, S. Ramaswamy. Join Synopses for Approximate Query Answering, In Proc. of SIGMOD International Conference On Management Of Data June 1999.Google Scholar
  2. 2.
    Barbara, D., DuMouchel, W., Faloutsos, C., Haas, P.J., Hellerstein, J.M., Ionnidis, Y., Jagadish, H.V., Johnson, T., Ng, R., Poosala, V., Ross, K.A., Sevcik, K.C., The New Jersey data reduction report, Bulletin of the Technical Committee on Data Engineering 20, 4, 3–45, 1997.Google Scholar
  3. 3.
    Buccafurri, F., Rosaci, D., Sacca’, D., Compressed datacubes for fast OLAP applications, DaWaK 1999, Florence, 65–77.Google Scholar
  4. 4.
    Buccafurri, F., Pontieri, L., Rosaci, D., Sacca’, D., Improving Range Query Estimation on Histograms, unpublished manuscript, 2000.Google Scholar
  5. 5.
    Chaudhuri, S., Dayal, U., An Overview of Data Warehousing and OLAP Technology, ACM SIGMOD Record 26(1), March 1997.Google Scholar
  6. 6.
    C. Faloutsos, H. V. Jagadish, N. D. Sidiripoulos. Recovering Information from Summary Data. In Proceedings of the 1997 VLDB Very Large Data Bases Conference, Athens, 1997Google Scholar
  7. 7.
    W. Feller, An introduction to probability theory and its applications. John Wiley & Sons, 1968.Google Scholar
  8. 8.
    P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, Washington, June 1998Google Scholar
  9. 9.
    P.B. Gibbons, Y. Matias, V. Poosala. AQUA Project White Paper, At http://www.bell-labs.com/user/pbgibbons/papers, 1997.
  10. 10.
    P.B. Gibbons, Y. Matias, V. Poosala. Fast incremental maintenance of approximate histograms, Proc. of the 23rd VLDB Conf., 466–475, August 1997.Google Scholar
  11. 11.
    Gray, J., Bosworth, A., Layman, A., Pirahesh, H., Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total, Proc. of the ICDE 1996, pp. 152–159Google Scholar
  12. 12.
    J. M. Hellerstein, P. J. Haas, H. J. Wang. Online Aggregation. In Proceedings of 1997 ACM SIGMOD International Conference on Management of Data, pages 171–182, 1997Google Scholar
  13. 13.
    Harinarayan, V., Rajaraman, A., Ullman, J. D., Implementing Data Cubes Efficiently, Proc. of the ACM SIGMOD 1996, pp. 205–216Google Scholar
  14. 14.
    Y. Ioannidis, V. Poosala. Balancing histogram optimality and practicality for query result size estimation. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pages 233–244, 1995Google Scholar
  15. 15.
    V. Poosala. Histogram-based Estimation Techniques in Database Systems. PhD dissertation, University of Wisconsin-Madison, 1997Google Scholar
  16. 16.
    V. Poosala, Y. E. Ioannidis, P. J. Haas, E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 294–305, 1996Google Scholar
  17. 17.
    P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. T. Price. Access path selection in a relational database management system. In Proc. of ACM SIGMOD Internatinal Conference, pages 23–24, 1979Google Scholar
  18. 18.
    J. S. Vitter, M. Wang, B. Iyer. Data Cube Approximation and Histograms via Wavelets. In Proceedings of the 1998 CIKM International Conference on Information and Knowledge Management, Washington, 1998Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Francesco Buccafurri
    • 1
  • Filippo Furfaro
    • 2
  • Domenico Saccà
    • 3
  1. 1.DIMETUniversity of Reggio CalabriaReggio CalabriaItaly
  2. 2.DEISUniversity of CalabriaRendeItaly
  3. 3.ISI-CNR & DEISRendeItaly

Personalised recommendations