Skip to main content

On Linear-Spline Based Histograms

  • Conference paper
  • First Online:
Advances in Web-Age Information Management (WAIM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2419))

Included in the following conference series:

Abstract

Approximation is a very effective paradigm to speed up query processing in large databases. One popular approximation mechanism is data size reduction. There are three reduction techniques: sampling, histograms, and wavelets. Histogram techniques are supported by many commercial database systems, and have been shown very effective for approximately processing aggregation queries. In this paper, we will investigate the optimal models for building histograms based on linear spline techniques. We will firstly propose several novel models. Secondly, we will present efficient algorithms to achieve these proposed optimal models. Our experiment results showed that our new techniques can greatly improve the approximation accuracy comparing to the existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In SIGMOD Conference, pages 487–498, 2000.

    Google Scholar 

  2. R. E. Bellman. The theory of dynamic programming. Bull. Amer Math Soc, 60:503–516, 1954.

    Article  MATH  MathSciNet  Google Scholar 

  3. F. Buccafurri, L. Ponteri, D. Rosaci, and D. Sacca. Improving range query estimation on histograms. In 18th ICDE, pages 628–638, 2002.

    Google Scholar 

  4. M. Garofalakis and P. B. Gibons. Approximate query processing: Taming the terabytes. VLDB, 2001.

    Google Scholar 

  5. P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. pages 331–342, 1998.

    Google Scholar 

  6. A. Graps. An introduction to wavelets. IEEE Computational Science and Engineering, 2:50–61, Summer 1995.

    Article  Google Scholar 

  7. Y. Ioannidis. Universality of serial histograms. In Proceedings of the 19th Conference on Very Large Databases, Morgan Kaufman pubs. (Los Altos CA), Dublin, 1993.

    Google Scholar 

  8. Y. E. Ioannidis and S. Christodoulakis. Optimal histograms for limiting worst-case error propagation in the size of the join results. ACM Transactions on Database Systems, 18(4):709–748, 1993.

    Article  Google Scholar 

  9. Y. E. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. pages 233–244, 1995.

    Google Scholar 

  10. H. V. Jagadish, H. Jin, B. C. Ooi, and K.-L. Tan. Global optimization of histograms. In SIGMOD Conference, 2001.

    Google Scholar 

  11. H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24–27, 1998, New York City, New York, USA, 1998.

    Google Scholar 

  12. A. C. König and G. Weikum. Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, UK, 1999.

    Google Scholar 

  13. R. P. Kooi. The optimization of queries in relational database. PhD thesis, Case Western Reserver University, Sep 1980.

    Google Scholar 

  14. Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. pages 448–459, 1998.

    Google Scholar 

  15. G. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. 14(2):256–276, 1984.

    Google Scholar 

  16. V. Poosala. Histogram-Based Estimation Techniques in Database Systems. PhD thesis, University of Wisconsin-Madison, 1997.

    Google Scholar 

  17. V. Poosala, P. J. Haas, Y. E. Ioannidis, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD’96, pages 294–305, 1996.

    Google Scholar 

  18. J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37–57, 1985.

    Article  MATH  MathSciNet  Google Scholar 

  19. D. D. wackerly, W. Mendenhall, and R. L. Scheaffer. Mathematical Statistics with Application. Duxbury Press, 1995.

    Google Scholar 

  20. G. K. Zipf. Human behaviour and the principle of least effort. Addison-Wesley, Reading, MA, 1949.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Q., Lin, X. (2002). On Linear-Spline Based Histograms. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_33

Download citation

  • DOI: https://doi.org/10.1007/3-540-45703-8_33

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44045-1

  • Online ISBN: 978-3-540-45703-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics