Abstract
Approximation is a very effective paradigm to speed up query processing in large databases. One popular approximation mechanism is data size reduction. There are three reduction techniques: sampling, histograms, and wavelets. Histogram techniques are supported by many commercial database systems, and have been shown very effective for approximately processing aggregation queries. In this paper, we will investigate the optimal models for building histograms based on linear spline techniques. We will firstly propose several novel models. Secondly, we will present efficient algorithms to achieve these proposed optimal models. Our experiment results showed that our new techniques can greatly improve the approximation accuracy comparing to the existing techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Acharya, P. B. Gibbons, and V. Poosala. Congressional samples for approximate answering of group-by queries. In SIGMOD Conference, pages 487–498, 2000.
R. E. Bellman. The theory of dynamic programming. Bull. Amer Math Soc, 60:503–516, 1954.
F. Buccafurri, L. Ponteri, D. Rosaci, and D. Sacca. Improving range query estimation on histograms. In 18th ICDE, pages 628–638, 2002.
M. Garofalakis and P. B. Gibons. Approximate query processing: Taming the terabytes. VLDB, 2001.
P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. pages 331–342, 1998.
A. Graps. An introduction to wavelets. IEEE Computational Science and Engineering, 2:50–61, Summer 1995.
Y. Ioannidis. Universality of serial histograms. In Proceedings of the 19th Conference on Very Large Databases, Morgan Kaufman pubs. (Los Altos CA), Dublin, 1993.
Y. E. Ioannidis and S. Christodoulakis. Optimal histograms for limiting worst-case error propagation in the size of the join results. ACM Transactions on Database Systems, 18(4):709–748, 1993.
Y. E. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. pages 233–244, 1995.
H. V. Jagadish, H. Jin, B. C. Ooi, and K.-L. Tan. Global optimization of histograms. In SIGMOD Conference, 2001.
H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In VLDB’98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24–27, 1998, New York City, New York, USA, 1998.
A. C. König and G. Weikum. Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In VLDB’99, Proceedings of 25th International Conference on Very Large Data Bases, September 7–10, 1999, Edinburgh, Scotland, UK, 1999.
R. P. Kooi. The optimization of queries in relational database. PhD thesis, Case Western Reserver University, Sep 1980.
Y. Matias, J. S. Vitter, and M. Wang. Wavelet-based histograms for selectivity estimation. pages 448–459, 1998.
G. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. 14(2):256–276, 1984.
V. Poosala. Histogram-Based Estimation Techniques in Database Systems. PhD thesis, University of Wisconsin-Madison, 1997.
V. Poosala, P. J. Haas, Y. E. Ioannidis, and E. J. Shekita. Improved histograms for selectivity estimation of range predicates. In SIGMOD’96, pages 294–305, 1996.
J. S. Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37–57, 1985.
D. D. wackerly, W. Mendenhall, and R. L. Scheaffer. Mathematical Statistics with Application. Duxbury Press, 1995.
G. K. Zipf. Human behaviour and the principle of least effort. Addison-Wesley, Reading, MA, 1949.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Q., Lin, X. (2002). On Linear-Spline Based Histograms. In: Meng, X., Su, J., Wang, Y. (eds) Advances in Web-Age Information Management. WAIM 2002. Lecture Notes in Computer Science, vol 2419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45703-8_33
Download citation
DOI: https://doi.org/10.1007/3-540-45703-8_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44045-1
Online ISBN: 978-3-540-45703-9
eBook Packages: Springer Book Archive