Abstract
The unsupervised nature of cluster analysis means that objects can be clustered in many ways, allowing different clustering algorithms to generate vastly different results. To address this, clustering comparison methods have traditionally been used to quantify the degree of similarity between alternative clusterings. However, existing techniques utilize only the point memberships to calculate the similarity, which can lead to unintuitive results. They also cannot be applied to analyze clusterings which only partially share points, which can be the case in stream clustering. In this paper we introduce a new measure named ADCO, which takes into account density profiles for each attribute and aims to address these problems. We provide experiments to demonstrate this new measure can often provide a more reasonable similarity comparison between different clusterings than existing methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: 29th VLDB Conference (2003)
Aggarwal, C.: A framework for diagnosing changes in evolving data streams. In: Intern. Conf. on Management of Data, pp. 575–586 (2003)
Fred, A., Jain, A.: Combining Multiple Clusterings Using Evidence Accumulation. Transac. on Pattern Analysis and Machine Intelligence 27, 835–850 (2005)
Fred, A., Jain, A.: Robust data clustering. In: Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 128–133 (2003)
Grossman, S.: Elementary Linear Algebra. Saunders College Publishing (1994)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Journ. of Intelligent Info. Sys. 17, 107–145 (2001)
Hamers, L., Hemeryck, Y.: Similarity measures in scientometric research: the Jaccard index vs. Salton’s cosine formula. Info. Process. and Manage. 25, 315–318 (1989)
Hubert, L.: Comparing partitions. Journ. of classification, 193–218 (1985)
Karypis, G., Aggarwal, R., Kumar, V.: Multilevel hypergraph partitioning: application in VLSI domain. In: Ann. Conf. on Design Automation, pp. 526–529 (1997)
Meila, M.: Comparing Clusterings. Statistics Technical Report (2005), http://www.stat.washington.edu/www/research/reports/2002/
Meila, M.: Comparing Clusterings - An Axiomatic View. In: 22nd International Conference on Machine Learning (2005)
O’Callaghan, L., Mishra, N., Meyerson, A.: Streaming-Data Algorithms for High-Quality Clustering. In: Intern. Conf. on Data Engineering (2002)
Rand, W.: Objective criteria for the evaluation of clustering methods. Journ. of the American Statistical Association 66, 846–850 (1971)
Ratanamahatana, C.: CloNI: Clustering of square root of N interval discretization. Data Mining IV, Info. and Comm. Tech. 29 (2003)
Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. Jour. on Machine Learning 3, 583–617 (2002)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1999)
Zhou, D., Li, J., Zha, H.: A new Mallows distance based metric for comparing clusterings. In: Intern. Conf. on Machine Learning (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bae, E., Bailey, J., Dong, G. (2006). Clustering Similarity Comparison Using Density Profiles. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_38
Download citation
DOI: https://doi.org/10.1007/11941439_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)