Clustering-Based Histograms for Multi-dimensional Data

Furfaro, Filippo; Mazzeo, Giuseppe M.; Sirangelo, Cristina

doi:10.1007/11546849_47

Filippo Furfaro¹⁸,
Giuseppe M. Mazzeo¹⁸ &
Cristina Sirangelo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3589))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1570 Accesses
1 Citations

Abstract

A new technique for constructing multi-dimensional histograms is proposed. This technique first invokes a density-based clustering algorithm to locate dense and sparse regions of the input data. Then the data distribution inside each of these regions is summarized by partitioning it into non-overlapping blocks laid onto a grid. The granularity of this grid is chosen depending on the underlying data distribution: the more homogeneous the data, the coarser the grid. Our approach is compared with state-of-the-art histograms on both synthetic and real-life data and is shown to be more effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acharya, S., Poosala, V., Ramaswamy, S.: Selectivity estimation in spatial databases. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), Philadelphia (PA), USA (1999)
Google Scholar
Chaudhuri, S.: An Overview of Query Optimization in Relational Systems. In: Proc. Symposium on Principles of Database Systems (PODS), Seattle (WA), USA (1998)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discorvering clusters in large spatial databases with noise. In: Proc. 2nd International Conference on Knowledge Discovery and Data Mining (KDD), Portland (OR), USA (1996)
Google Scholar
Garofalakis, M., Gibbons, P.B.: Wavelet Synopses with Error Guarantees. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), Madison (WI), USA (2002)
Google Scholar
Gunopulos, D., Kollios, G., Tsotras, V.J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. The VLDB Journal 14(2) (April 2005)
Google Scholar
Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD), San José (CA), USA (1995)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, Chichester (2005)
Google Scholar
Korn, F., Johnson, T., Jagadish, H.V.: Range Selectivity Estimation for Continuous Attributes. In: Proc. 11th International Conference on Scientific and Statistical Database Management (SSDBM), Cleveland (OH), USA (1999)
Google Scholar
Mamoulis, N., Papadias, D.: Selectivity Estimation of Complex Spatial Queries. In: Proc. 7th International Symposium on Advances in Spatial and Temporal Databases (SSTD), Redondo Beach (CA), USA (2001)
Google Scholar
Poosala, V., Ioannidis, Y.E.: Selectivity estimation without the attribute value independence assumption. In: Proc. 23rd International Conference on Very Large Data Bases (VLDB), Athens, Greece (1997)
Google Scholar
Shanmugasundaram, J., Fayyad, U., Bradley, P.S.: Compressed data cubes for OLAP aggregate query approximation on continuous dimensions. In: Proc. 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Diego (CA), USA (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

DEIS, University of Calabria, 87030, Rende, Italy
Filippo Furfaro, Giuseppe M. Mazzeo & Cristina Sirangelo

Authors

Filippo Furfaro
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe M. Mazzeo
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Sirangelo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, A-1040, Wien, Austria
A Min Tjoa
Department of Software and Computing Systems, University of Alicante, Spain
Juan Trujillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Furfaro, F., Mazzeo, G.M., Sirangelo, C. (2005). Clustering-Based Histograms for Multi-dimensional Data. In: Tjoa, A.M., Trujillo, J. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2005. Lecture Notes in Computer Science, vol 3589. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11546849_47

Download citation

DOI: https://doi.org/10.1007/11546849_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28558-8
Online ISBN: 978-3-540-31732-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics