Density Based Clustering (DBC) methods are capable of identifying arbitrary shaped data clusters in the presence of noise. DBC methods are based on the notion of local neighborhood density estimation. A major drawback of DBC methods is their poor performance in high-dimensions. In this work, a novel DBC method that performs well in high-dimensions is presented. The novelty of the proposed method can be summed up as follows: a hybrid first-second order optimization algorithm for identifying high-density data points; an adaptive scan radius for identifying reachable points. Theoretical results on the validity of the proposed method are presented in this work. The effectiveness and efficiency of the proposed approach are illustrated via rigorous experimental evaluations. The proposed method is compared with the well known DBC methods on synthetic and real data from the literature. Both internal and external cluster validation measures are used to evaluate the performance of the proposed method.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, pp. 281–297 (1967)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Gan, G., Ma, C., Wu, J.: Data clustering: theory, algorithms, and applications, vol. 20. Siam (2007)
Yang, M-S: A survey of fuzzy clustering. Math. Comput. Modell. 18(11), 1–16 (1993)
Kriegel, H.-P., Kröger, P., Sander, J., Zimek, A.: Density-based clustering. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (3), 231–240 (2011)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, pp. 226–231 (1996)
Campello, R.JGB, Moulavi, D., Zimek, A., Sander, J.: A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies. Data Min. Knowl. Disc. 27(3), 344–371 (2013)
Mount, D.M.: http://www.cs.umd.edu/~mount/ANN/ (2010)
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2), 169–194 (1998)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: Optics: ordering points to identify the clustering structure. In: ACM Sigmod record. ACM, pp. 49–60 (1999)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM, pp. 70–81 (2000)
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Amer. Stat. Assoc. 97(458), 611–631 (2002)
Spurek, P., Tabor, J., Byrski, K.: Active function cross-entropy clustering. Expert Syst. Appl. 72, 49–66 (2017)
Ertöz, L, Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining. SIAM, pp. 47–58 (2003)
Azzalini, A., Torelli, N.: Clustering via nonparametric density estimation. Stat. Comput. 17(1), 71–80 (2007)
Azzalini, A., Menardi, G., et al.: Clustering via nonparametric density estimation: The r package pdfcluster. J. Stat. Softw. 57(11), 1–26 (2014)
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344(6191), 1492–1496 (2014)
Tabor, J., Spurek, P.: Cross-entropy clustering. Pattern Recogn. 47(9), 3046–3059 (2014)
Sander, J.: Density-based clustering, pp 270–273. Springer US, Boston (2010). https://doi.org/10.1007/978-0-387-30164-8_211
Celebi, M.E.: Partitional clustering algorithms. Springer (2014)
Ultsch, A.: Clustering with som: U*c. In: Proceedings of the 5th Workshop on Self-Organizing Maps, vol. 2, pp. 75–82 (2005)
Leisch, F., Dimitriadou, E.: Package ‘mlbench’ (2013)
Franti, P., Virmajoki, O., Hautamaki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28 (11), 1875–1881 (2006)
The author would like to acknowledge the research support provided by King Fahd University of Petroleum & Minerals (KFUPM).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Syed, M.N. Neighborhood density information in clustering. Ann Math Artif Intell (2021). https://doi.org/10.1007/s10472-021-09744-4
- Data clustering
- Nonlinear optimization
- Density estimation
Mathematics Subject Classification 2010