Density-Based Clustering Based on Hierarchical Density Estimates

  • Ricardo J. G. B. Campello
  • Davoud Moulavi
  • Joerg Sander
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7819)

Abstract

We propose a theoretically and practically improved density-based, hierarchical clustering method, providing a clustering hierarchy from which a simplified tree of significant clusters can be constructed. For obtaining a “flat” partition consisting of only the most significant clusters (possibly corresponding to different density thresholds), we propose a novel cluster stability measure, formalize the problem of maximizing the overall stability of selected clusters, and formulate an algorithm that computes an optimal solution to this problem. We demonstrate that our approach outperforms the current, state-of-the-art, density-based clustering methods on a wide variety of real world data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2006)Google Scholar
  2. 2.
    Sander, J.: Density-based clustering. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine Learning, pp. 270–273. Springer (2010)Google Scholar
  3. 3.
    Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Int. Conf. Knowl. Discovery and Data Mining (1996)Google Scholar
  4. 4.
    Hinneburg, A., Keim, D.A.: A general approach to clustering in large databases with noise. Knowl. and Info. Sys. 5, 387–415 (2003)CrossRefGoogle Scholar
  5. 5.
    Sun, H., Huang, J., Han, J., Deng, H., Zhao, P., Feng, B.: gSkeletonClu: Density-based network clustering via structure-connected tree division or agglomeration. In: IEEE Int. Conf. Data Mining (2010)Google Scholar
  6. 6.
    Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: ordering points to identify the clustering structure. SIGMOD Rec. 28, 49–60 (1999)CrossRefGoogle Scholar
  7. 7.
    Pei, T., Jasra, A., Hand, D., Zhu, A.X., Zhou, C.: Decode: a new method for discovering clusters of different densities in spatial data. Data Mining and Knowl. Discovery 18, 337–369 (2009)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Stuetzle, W., Nugent, R.: A generalized single linkage method for estimating the cluster tree of a density. J. Comp. and Graph. Stat. 19(2), 397–418 (2010)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Sander, J., Qin, X., Lu, Z., Niu, N., Kovarsky, A.: Automatic extraction of clusters from hierarchical clustering representations. In: Pacific-Asia Conf. of Advances in Knowl. Discovery and Data Mining (2003)Google Scholar
  10. 10.
    Gupta, G., Liu, A., Ghosh, J.: Automated hierarchical density shaving: A robust automated clustering and visualization framework for large biological data sets. IEEE/ACM Trans. Comp. Biology and Bioinf. 7(2), 223–237 (2010)CrossRefGoogle Scholar
  11. 11.
    Lelis, L., Sander, J.: Semi-supervised density-based clustering. In: IEEE Int. Conf. Data Mining (2009)Google Scholar
  12. 12.
    Herbin, M., Bonnet, N., Vautrot, P.: Estimation of the number of clusters and influence zones. Patt. Rec. Letters 22(14), 1557–1568 (2001)MATHCrossRefGoogle Scholar
  13. 13.
    Gupta, G., Liu, A., Ghosh, J.: Hierarchical density shaving: A clustering and visualization framework for large biological datasets. In: IEEE ICDM Workshop on Data Mining in Bioinf. (2006)Google Scholar
  14. 14.
    Hartigan, J.A.: Clustering Algorithms. John Wiley & Sons (1975)Google Scholar
  15. 15.
    Muller, D.W., Sawitzki, G.: Excess mass estimates and tests for multimodality. J. Amer. Stat. Association 86(415), 738–746 (1991)MathSciNetGoogle Scholar
  16. 16.
    Yeung, K.Y., Fraley, C., Murua, A., Raftery, A.E., Ruzzo, W.L.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17(10), 977–987 (2001)CrossRefGoogle Scholar
  17. 17.
    Yeung, K., Medvedovic, M., Bumgarner, R.: Clustering gene-expression data with repeated measurements. Genome Biol. 4(5) (2003)Google Scholar
  18. 18.
    Frank, A., Asuncion, A.: UCI machine learning repository (2010)Google Scholar
  19. 19.
    Naldi, M., Campello, R., Hruschka, E., Carvalho, A.: Efficiency issues of evolutionary k-means. Applied Soft Computing 11(2), 1938–1952 (2011)CrossRefGoogle Scholar
  20. 20.
    Paulovich, F., Nonato, L., Minghim, R., Levkowitz, H.: Least square projection: A fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Visual. & Comp. Graphics 14(3), 564–575 (2008)CrossRefGoogle Scholar
  21. 21.
    Geusebroek, J.M., Burghouts, G., Smeulders, A.: The Amsterdam library of object images. Int. J. of Computer Vision 61, 103–112 (2005)CrossRefGoogle Scholar
  22. 22.
    Horta, D., Campello, R.J.: Automatic aspect discrimination in data clustering. Pattern Recognition 45, 4370–4388Google Scholar
  23. 23.
    Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Int. Conf. Knowl. Discovery and Data Mining (1999)Google Scholar
  24. 24.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classification 2(1), 193–218 (1985)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ricardo J. G. B. Campello
    • 1
  • Davoud Moulavi
    • 1
  • Joerg Sander
    • 1
  1. 1.Dept. of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations