Abstract
A new technique for constructing multi-dimensional histograms is proposed. This technique first invokes a density-based clustering algorithm to locate dense and sparse regions of the input data. Then the data distribution inside each of these regions is summarized by partitioning it into non-overlapping blocks laid onto a grid. The granularity of this grid is chosen depending on the underlying data distribution: the more homogeneous the data, the coarser the grid. Our approach is compared with state-of-the-art histograms on both synthetic and real-life data and is shown to be more effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acharya, S., Poosala, V., Ramaswamy, S.: Selectivity estimation in spatial databases. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), Philadelphia (PA), USA (1999)
Chaudhuri, S.: An Overview of Query Optimization in Relational Systems. In: Proc. Symposium on Principles of Database Systems (PODS), Seattle (WA), USA (1998)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discorvering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD), Portland (OR), USA (1996)
Garofalakis, M., Gibbons, P.B.: Wavelet Synopses with Error Guarantees. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison (WI), USA (2002)
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. The VLDB Journal 14(2) (April 2005)
Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), San José (CA), USA (1995)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Chichester (2005)
Korn, F., Johnson, T., Jagadish, H.V.: Range Selectivity Estimation for Continuous Attributes. In: Proc. 11th International Conference on Scientific and Statistical Database Management (SSDBM), Cleveland (OH), USA (1999)
Mamoulis, N., Papadias, D.: Selectivity Estimation of Complex Spatial Queries. In: Proc. 7th International Symposium on Advances in Spatial and Temporal Databases (SSTD), Redondo Beach (CA), USA (2001)
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Proc. 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece (1997)
Shanmugasundaram, J., Fayyad, U., Bradley, P.S.: Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In: Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Diego (CA), USA (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Furfaro, F., Mazzeo, G.M., Sirangelo, C. (2005). Clustering-Based Histograms for Multi-dimensional Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_47
Download citation
DOI: https://doi.org/10.1007/11546849_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)