Advertisement

Clustering Similarity Comparison Using Density Profiles

  • Eric Bae
  • James Bailey
  • Guozhu Dong
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4304)

Abstract

The unsupervised nature of cluster analysis means that objects can be clustered in many ways, allowing different clustering algorithms to generate vastly different results. To address this, clustering comparison methods have traditionally been used to quantify the degree of similarity between alternative clusterings. However, existing techniques utilize only the point memberships to calculate the similarity, which can lead to unintuitive results. They also cannot be applied to analyze clusterings which only partially share points, which can be the case in stream clustering. In this paper we introduce a new measure named ADCO, which takes into account density profiles for each attribute and aims to address these problems. We provide experiments to demonstrate this new measure can often provide a more reasonable similarity comparison between different clusterings than existing methods.

Keywords

Jaccard Index Rand Index Subspace Clusterings Alternative Clusterings Stream Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C., Han, J., Wang, J., Yu, P.: A framework for clustering evolving data streams. In: 29th VLDB Conference (2003)Google Scholar
  2. 2.
    Aggarwal, C.: A framework for diagnosing changes in evolving data streams. In: Intern. Conf. on Management of Data, pp. 575–586 (2003)Google Scholar
  3. 3.
    Fred, A., Jain, A.: Combining Multiple Clusterings Using Evidence Accumulation. Transac. on Pattern Analysis and Machine Intelligence 27, 835–850 (2005)CrossRefGoogle Scholar
  4. 4.
    Fred, A., Jain, A.: Robust data clustering. In: Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 128–133 (2003)Google Scholar
  5. 5.
    Grossman, S.: Elementary Linear Algebra. Saunders College Publishing (1994)Google Scholar
  6. 6.
    Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Journ. of Intelligent Info. Sys. 17, 107–145 (2001)MATHCrossRefGoogle Scholar
  7. 7.
    Hamers, L., Hemeryck, Y.: Similarity measures in scientometric research: the Jaccard index vs. Salton’s cosine formula. Info. Process. and Manage. 25, 315–318 (1989)CrossRefGoogle Scholar
  8. 8.
    Hubert, L.: Comparing partitions. Journ. of classification, 193–218 (1985)Google Scholar
  9. 9.
    Karypis, G., Aggarwal, R., Kumar, V.: Multilevel hypergraph partitioning: application in VLSI domain. In: Ann. Conf. on Design Automation, pp. 526–529 (1997)Google Scholar
  10. 10.
    Meila, M.: Comparing Clusterings. Statistics Technical Report (2005), http://www.stat.washington.edu/www/research/reports/2002/
  11. 11.
    Meila, M.: Comparing Clusterings - An Axiomatic View. In: 22nd International Conference on Machine Learning (2005)Google Scholar
  12. 12.
    O’Callaghan, L., Mishra, N., Meyerson, A.: Streaming-Data Algorithms for High-Quality Clustering. In: Intern. Conf. on Data Engineering (2002)Google Scholar
  13. 13.
    Rand, W.: Objective criteria for the evaluation of clustering methods. Journ. of the American Statistical Association 66, 846–850 (1971)CrossRefGoogle Scholar
  14. 14.
    Ratanamahatana, C.: CloNI: Clustering of square root of N interval discretization. Data Mining IV, Info. and Comm. Tech. 29 (2003)Google Scholar
  15. 15.
    Strehl, A., Ghosh, J.: Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions. Jour. on Machine Learning 3, 583–617 (2002)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1999)Google Scholar
  17. 17.
    Zhou, D., Li, J., Zha, H.: A new Mallows distance based metric for comparing clusterings. In: Intern. Conf. on Machine Learning (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Eric Bae
    • 1
  • James Bailey
    • 1
  • Guozhu Dong
    • 2
  1. 1.NICTA Victoria Laboratory, Department of Computer Science and Software EngineeringUniversity of MelbourneAustralia
  2. 2.Department of Computer Science and EngineeringWright State UniversityUSA

Personalised recommendations