Abstract
The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.
Similar content being viewed by others
References
Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: DASFAA , vol 4443, pp 152–163. Springer
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGKDD record, vol 28, pp 61–72. ACM
Agrawal R, Gehrke JE, Gunopulos D, Raghavan P (1999) Automatic subspace clustering of high dimensional data for data mining applications, Dec 14. US Patent 6,003,029
Aksehirli E, Goethals B, Müller E (2015) Efficient cluster detection by ordered neighborhoods. In: International conference on big data analytics and knowledge discovery, pp 15–27. Springer
Assent I, Krieger R, Müller E, Seidl T (2007) Dusc: dimensionality unbiased subspace clustering. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 409–414. IEEE
Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining, 2004. ICDM’04, pp 27–34. IEEE
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446
Chu Y-H, Huang J-W, Chuang K-T, Yang D-N, Chen M-S (2010) Density conscious subspace clustering for high-dimensional data. IEEE Trans Knowl Data Eng 22(1):16–30
De Raedt L, Jaeger M, Lee SD, Mannila H (2010) A theory of inductive query answering. In: Inductive databases and constraint-based data mining, pp 79–103. Springer
Deng Z, Choi K-S, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106
Dongen S (2000) Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science)
Esmin AA, Coelho RA, Matwin S (2015) A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif Intell Rev 44(1):23–45
Gan G, Ng MK-P (2015) Subspace clustering using affinity propagation. Pattern Recognit 48(4):1455–1464
Gan G, Ng MK-P (2015) Subspace clustering with automatic feature grouping. Pattern Recognit 48(11):3703–3713
Goil S, Nagesh H, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452. ACM
Hu J, Pei J (2017) Subspace multi-clustering: a review. Knowl Inf Syst 56:1–28
Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 246–256
Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining, pp 8–pp. IEEE
Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2, pp 1214–1219. IEEE
Mai ST, Assent I, Storgaard M (2016) Anydbc: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1025–1034. ACM
McWilliams B, Montana G (2014) Subspace clustering of high-dimensional data: a predictive approach. Data Min Knowl Discov 28(3):736–772
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 977–986. ACM
Nie F, Wang X, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: AAAI, pp 1969–1976
Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: International conference on machine learning, pp 1062–1070
Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998. SIAM
Procopiuc CM, Jones M, Agarwal PK, Murali T (2002) A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 418–427. ACM
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13:1620–1628
Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714
Zhu B, Mozo A, Ordozgoiti B (2016) PSCEG: an unbiased parallel subspace clustering algorithm using exact grids. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
Zhu L, Cao L, Yang J, Lei J (2014) Evolving soft subspace clustering. Appl Soft Comput 14:210–228
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fatehi, K., Rezvani, M. & Fateh, M. ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data. Pattern Anal Applic 23, 1651–1663 (2020). https://doi.org/10.1007/s10044-020-00884-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-020-00884-7