ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Fatehi, Kavan; Rezvani, Mohsen; Fateh, Mansoor

doi:10.1007/s10044-020-00884-7

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Theoretical advances
Published: 22 April 2020

Volume 23, pages 1651–1663, (2020)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Kavan Fatehi¹,
Mohsen Rezvani² &
Mansoor Fateh²

282 Accesses
3 Citations
Explore all metrics

Abstract

The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Two-Step Non-redundant Subspace Clustering Approach

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Article 30 October 2019

B. Jaya Lakshmi, K. B. Madhuri & M. Shashi

Towards Robust Arbitrarily Oriented Subspace Clustering

References

Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: DASFAA , vol 4443, pp 152–163. Springer
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGKDD record, vol 28, pp 61–72. ACM
Agrawal R, Gehrke JE, Gunopulos D, Raghavan P (1999) Automatic subspace clustering of high dimensional data for data mining applications, Dec 14. US Patent 6,003,029
Aksehirli E, Goethals B, Müller E (2015) Efficient cluster detection by ordered neighborhoods. In: International conference on big data analytics and knowledge discovery, pp 15–27. Springer
Assent I, Krieger R, Müller E, Seidl T (2007) Dusc: dimensionality unbiased subspace clustering. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 409–414. IEEE
Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining, 2004. ICDM’04, pp 27–34. IEEE
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Article MathSciNet Google Scholar
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446
Article Google Scholar
Chu Y-H, Huang J-W, Chuang K-T, Yang D-N, Chen M-S (2010) Density conscious subspace clustering for high-dimensional data. IEEE Trans Knowl Data Eng 22(1):16–30
Article Google Scholar
De Raedt L, Jaeger M, Lee SD, Mannila H (2010) A theory of inductive query answering. In: Inductive databases and constraint-based data mining, pp 79–103. Springer
Deng Z, Choi K-S, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106
Article MathSciNet Google Scholar
Dongen S (2000) Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science)
Esmin AA, Coelho RA, Matwin S (2015) A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif Intell Rev 44(1):23–45
Article Google Scholar
Gan G, Ng MK-P (2015) Subspace clustering using affinity propagation. Pattern Recognit 48(4):1455–1464
Article Google Scholar
Gan G, Ng MK-P (2015) Subspace clustering with automatic feature grouping. Pattern Recognit 48(11):3703–3713
Article Google Scholar
Goil S, Nagesh H, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452. ACM
Hu J, Pei J (2017) Subspace multi-clustering: a review. Knowl Inf Syst 56:1–28
Google Scholar
Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 246–256
Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining, pp 8–pp. IEEE
Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2, pp 1214–1219. IEEE
Mai ST, Assent I, Storgaard M (2016) Anydbc: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1025–1034. ACM
McWilliams B, Montana G (2014) Subspace clustering of high-dimensional data: a predictive approach. Data Min Knowl Discov 28(3):736–772
Article MathSciNet Google Scholar
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 977–986. ACM
Nie F, Wang X, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: AAAI, pp 1969–1976
Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: International conference on machine learning, pp 1062–1070
Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998. SIAM
Procopiuc CM, Jones M, Agarwal PK, Murali T (2002) A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 418–427. ACM
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Article Google Scholar
Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397
Article MathSciNet Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13:1620–1628
Article Google Scholar
Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714
Article Google Scholar
Zhu B, Mozo A, Ordozgoiti B (2016) PSCEG: an unbiased parallel subspace clustering algorithm using exact grids. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)
Zhu L, Cao L, Yang J, Lei J (2014) Evolving soft subspace clustering. Appl Soft Comput 14:210–228
Article Google Scholar

Download references

Author information

Authors and Affiliations

Yazd University, Yazd, Iran
Kavan Fatehi
Shahrood University of Technology, Shahrood, Iran
Mohsen Rezvani & Mansoor Fateh

Authors

Kavan Fatehi
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Rezvani
View author publications
You can also search for this author in PubMed Google Scholar
Mansoor Fateh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohsen Rezvani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fatehi, K., Rezvani, M. & Fateh, M. ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data. Pattern Anal Applic 23, 1651–1663 (2020). https://doi.org/10.1007/s10044-020-00884-7

Download citation

Received: 01 January 2018
Accepted: 30 March 2020
Published: 22 April 2020
Issue Date: November 2020
DOI: https://doi.org/10.1007/s10044-020-00884-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

Abstract

Access this article

Similar content being viewed by others

A Two-Step Non-redundant Subspace Clustering Approach

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Towards Robust Arbitrarily Oriented Subspace Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

A Two-Step Non-redundant Subspace Clustering Approach

Efficient hybrid algorithms for density based subspace clustering to deal with density divergence for improved quality and conciseness

Towards Robust Arbitrarily Oriented Subspace Clustering

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation