Advertisement

Machine Learning

, Volume 107, Issue 8–10, pp 1621–1645 | Cite as

Local contrast as an effective means to robust clustering against varying densities

  • Bo Chen
  • Kai Ming Ting
  • Takashi Washio
  • Ye Zhu
Article
Part of the following topical collections:
  1. Special Issue of the ECML PKDD 2018 Journal Track

Abstract

Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP.

Keywords

Local contrast Density-based clustering Varying densities 

Notes

Acknowledgements

Bo Chen is supported by Monash Data61 Postgraduate Research Scholarship and Faculty of IT Tuition Fee Scholarship, Monash University.

References

  1. Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 49–60). New York, NY: ACM.Google Scholar
  2. Assent, I., Krieger, R., Müller, E., & Seidl, T. (2007). Dusc: Dimensionality unbiased subspace clustering. In Proceedings of the 7th international conference on data mining (pp. 409–414). IEEE.Google Scholar
  3. Borah, B., & Bhattacharyya, D. (2008). DDSC: A density differentiated spatial clustering technique. Journal of Computers, 3(2), 72–79.CrossRefGoogle Scholar
  4. Brito, M., Chavez, E., Quiroz, A., & Yukich, J. (1997). Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Statistics & Probability Letters, 35(1), 33–42.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Chang, H., & Yeung, D. Y. (2008). Robust path-based spectral clustering. Pattern Recognition, 41(1), 191–203.CrossRefzbMATHGoogle Scholar
  6. Cherkassky, V., & Mulier, F. M. (2007). Learning from data: Concepts, theory, and methods. Hoboken: Wiley.CrossRefzbMATHGoogle Scholar
  7. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B: Statistical Methodology, pp. 1–38.Google Scholar
  8. Ertöz, L., Steinbach, M., & Kumar, V. (2003a). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Proceedings of the 2003 SIAM international conference on data mining (pp. 47–58).Google Scholar
  9. Ertöz, L., Steinbach, M., & Kumar, V. (2003b). Finding topics in collections of documents: A shared nearest neighbor approach. Clustering and Information Retrieval, 11, 83–103.MathSciNetGoogle Scholar
  10. Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 226–231).Google Scholar
  11. Ferilli, S., Biba, M., Basile, T., Di Mauro, N., & Esposito, F. (2008). K-nearest neighbor classification on first-order logic descriptions. In Proceedings of the IEEE international conference on data mining workshops (pp. 202–210).Google Scholar
  12. Fukunaga, K. (1990). Introduction to statistical pattern recognition (2nd ed.). San Diego, CA: Academic Press Professional Inc.zbMATHGoogle Scholar
  13. Gionis, A., Mannila, H., & Tsaparas, P. (2007). Clustering aggregation. ACM Transactions on Knowledge Discovery from Data, 1(1), 4.CrossRefGoogle Scholar
  14. Han, J., & Kamber, M. (2011). Data mining: Concepts and techniques (3rd ed.). Los Altos, CA: Morgan Kaufmann.zbMATHGoogle Scholar
  15. Hinneburg, A., & Gabriel, H. H. (2007). DENCLUE 2.0: Fast clustering based on kernel density estimation. In Advances in intelligent data analysis (Vol. VII, pp. 70–80). Springer.Google Scholar
  16. Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefzbMATHGoogle Scholar
  17. Jain, A. K. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651–666.CrossRefGoogle Scholar
  18. Jain, A. K., & Law, M. H. (2005). Data clustering: A user’s dilemma. In Pattern recognition and machine intelligence (pp. 1–10). Springer.Google Scholar
  19. Jarvis, R. A., & Patrick, E. A. (1973). Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers, 100(11), 1025–1034.CrossRefGoogle Scholar
  20. Kailing, K., Kriegel, H. P., & Kröger, P. (2004). Density-connected subspace clustering for high-dimensional data. In Proceedings of the international conference on data mining (pp. 246–256). SIAM.Google Scholar
  21. Kuhn, H. W. (1955). The hungarian method for the assignment problem. Naval Research Logistics, 2(1–2), 83–97.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 31 May 2017.
  23. Ma, E. W., & Chow, T. W. (2004). A new shifting grid clustering algorithm. Pattern Recognition, 37(3), 503–514.CrossRefzbMATHGoogle Scholar
  24. Müller, E., Günnemann, S., Assent, I., & Seidl, T. (2009). Evaluating clustering in subspace projections of high dimensional data. Proceedings of the VLDB Endowment, 2, 1270–1281.CrossRefGoogle Scholar
  25. Ram, A., Sharma, A., Jalal, A. S, Agrawal, A., & Singh, R. (2009). An enhanced density based spatial clustering of applications with noise. In Proceedings of the IEEE international advance computing conference (pp. 1475–1478).Google Scholar
  26. Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492–1496.CrossRefGoogle Scholar
  27. Schikuta, E. (1996). Grid-clustering: An efficient hierarchical clustering method for very large data sets. In Proceedings of the 13th IEEE international conference on pattern recognition (Vol. 2, pp. 101–105).Google Scholar
  28. Tan, J., & Wang, R. (2013). Smooth splicing: A robust snn-based method for clustering high-dimensional data. Mathematical Problems in Engineering, 2013, 1–9.Google Scholar
  29. Xie, J., Gao, H., Xie, W., Liu, X., & Grant, P. W. (2016). Robust clustering by detecting density peaks and assigning points based on fuzzy weighted k-nearest neighbors. Information Sciences, 354, 19–40.CrossRefGoogle Scholar
  30. Xu, D., & Tian, Y. (2015). A comprehensive survey of clustering algorithms. Annals of Data Science, 2(2), 165–193.MathSciNetCrossRefGoogle Scholar
  31. Zhu, Y., Ting, K. M., & Carman, M. J. (2016). Density-ratio based clustering for discovering clusters with varying densities. Pattern Recognition, 60, 983–997.CrossRefGoogle Scholar
  32. Zimek, A., & Vreeken, J. (2015). The blind men and the elephant: On meeting the problem of multiple truths in data from clustering and pattern mining perspectives. Machine Learning, 98(1–2), 121–155.MathSciNetCrossRefzbMATHGoogle Scholar
  33. Zitzler, E., Laumanns, M., Bleuler, S. (2004). A tutorial on evolutionary multiobjective optimization. In Metaheuristics for multiobjective optimisation (pp. 3–37). Springer.Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Faculty of Information TechnologyMonash UniversityClaytonAustralia
  2. 2.School of Engineering and Information TechnologyFederation University AustraliaChurchillAustralia
  3. 3.The Institute of Scientific and Industrial ResearchOsaka UniversityIbarakishiJapan
  4. 4.School of Information TechnologyDeakin UniversityBurwoodAustralia

Personalised recommendations