Local contrast as an effective means to robust clustering against varying densities
Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP.
KeywordsLocal contrast Density-based clustering Varying densities
Bo Chen is supported by Monash Data61 Postgraduate Research Scholarship and Faculty of IT Tuition Fee Scholarship, Monash University.
- Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD international conference on management of data (pp. 49–60). New York, NY: ACM.Google Scholar
- Assent, I., Krieger, R., Müller, E., & Seidl, T. (2007). Dusc: Dimensionality unbiased subspace clustering. In Proceedings of the 7th international conference on data mining (pp. 409–414). IEEE.Google Scholar
- Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B: Statistical Methodology, pp. 1–38.Google Scholar
- Ertöz, L., Steinbach, M., & Kumar, V. (2003a). Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In Proceedings of the 2003 SIAM international conference on data mining (pp. 47–58).Google Scholar
- Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining (pp. 226–231).Google Scholar
- Ferilli, S., Biba, M., Basile, T., Di Mauro, N., & Esposito, F. (2008). K-nearest neighbor classification on first-order logic descriptions. In Proceedings of the IEEE international conference on data mining workshops (pp. 202–210).Google Scholar
- Hinneburg, A., & Gabriel, H. H. (2007). DENCLUE 2.0: Fast clustering based on kernel density estimation. In Advances in intelligent data analysis (Vol. VII, pp. 70–80). Springer.Google Scholar
- Jain, A. K., & Law, M. H. (2005). Data clustering: A user’s dilemma. In Pattern recognition and machine intelligence (pp. 1–10). Springer.Google Scholar
- Kailing, K., Kriegel, H. P., & Kröger, P. (2004). Density-connected subspace clustering for high-dimensional data. In Proceedings of the international conference on data mining (pp. 246–256). SIAM.Google Scholar
- Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 31 May 2017.
- Ram, A., Sharma, A., Jalal, A. S, Agrawal, A., & Singh, R. (2009). An enhanced density based spatial clustering of applications with noise. In Proceedings of the IEEE international advance computing conference (pp. 1475–1478).Google Scholar
- Schikuta, E. (1996). Grid-clustering: An efficient hierarchical clustering method for very large data sets. In Proceedings of the 13th IEEE international conference on pattern recognition (Vol. 2, pp. 101–105).Google Scholar
- Tan, J., & Wang, R. (2013). Smooth splicing: A robust snn-based method for clustering high-dimensional data. Mathematical Problems in Engineering, 2013, 1–9.Google Scholar
- Zitzler, E., Laumanns, M., Bleuler, S. (2004). A tutorial on evolutionary multiobjective optimization. In Metaheuristics for multiobjective optimisation (pp. 3–37). Springer.Google Scholar