Skip to main content
Log in

ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The curse of dimensionality in high-dimensional data is one of the major challenges in data clustering. Recently, a considerable amount of literature has been published on subspace clustering to address this challenge. The main objective of the subspace clustering is to discover clusters embedded in any possible combination of the attributes. Previous studies have mostly been generating redundant subspace clusters, leading to clustering accuracy loss and also increasing the running time. In this paper, a bottom-up density-based approach is proposed for clustering of high-dimensional data. We employ the cluster structure as a similarity measure to generate the optimal subspaces which result in raising the accuracy of the subspace clustering. Using this idea, we propose an iterative algorithm to discover similar subspaces using the similarity in the features of subspaces. At each iteration of this algorithm, it first determines similar subspaces, then combines them to generate higher-dimensional subspaces, and finally re-clusters the subspaces. The algorithm repeats these steps and converges to the final clusters. Experiments on various synthetic and real datasets show that the results of the proposed approach are significantly better in both quality and runtime comparing to the state of the art on clustering high-dimensional data. The accuracy of the proposed method is around 34% higher than the CLIQUE algorithm and around 6% higher than DiSH.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Achtert E, Böhm C, Kriegel H-P, Kröger P, Müller-Gorman I, Zimek A (2007) Detection and visualization of subspace cluster hierarchies. In: DASFAA , vol 4443, pp 152–163. Springer

  2. Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: ACM SIGKDD record, vol 28, pp 61–72. ACM

  3. Agrawal R, Gehrke JE, Gunopulos D, Raghavan P (1999) Automatic subspace clustering of high dimensional data for data mining applications, Dec 14. US Patent 6,003,029

  4. Aksehirli E, Goethals B, Müller E (2015) Efficient cluster detection by ordered neighborhoods. In: International conference on big data analytics and knowledge discovery, pp 15–27. Springer

  5. Assent I, Krieger R, Müller E, Seidl T (2007) Dusc: dimensionality unbiased subspace clustering. In: Seventh IEEE international conference on data mining, 2007. ICDM 2007, pp 409–414. IEEE

  6. Bohm C, Railing K, Kriegel H-P, Kroger P (2004) Density connected clustering with local subspace preferences. In: Fourth IEEE international conference on data mining, 2004. ICDM’04, pp 27–34. IEEE

  7. Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78

    Article  MathSciNet  Google Scholar 

  8. Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446

    Article  Google Scholar 

  9. Chu Y-H, Huang J-W, Chuang K-T, Yang D-N, Chen M-S (2010) Density conscious subspace clustering for high-dimensional data. IEEE Trans Knowl Data Eng 22(1):16–30

    Article  Google Scholar 

  10. De Raedt L, Jaeger M, Lee SD, Mannila H (2010) A theory of inductive query answering. In: Inductive databases and constraint-based data mining, pp 79–103. Springer

  11. Deng Z, Choi K-S, Jiang Y, Wang J, Wang S (2016) A survey on soft subspace clustering. Inf Sci 348:84–106

    Article  MathSciNet  Google Scholar 

  12. Dongen S (2000) Performance criteria for graph clustering and Markov cluster experiments. CWI (Centre for Mathematics and Computer Science)

  13. Esmin AA, Coelho RA, Matwin S (2015) A review on particle swarm optimization algorithm and its variants to clustering high-dimensional data. Artif Intell Rev 44(1):23–45

    Article  Google Scholar 

  14. Gan G, Ng MK-P (2015) Subspace clustering using affinity propagation. Pattern Recognit 48(4):1455–1464

    Article  Google Scholar 

  15. Gan G, Ng MK-P (2015) Subspace clustering with automatic feature grouping. Pattern Recognit 48(11):3703–3713

    Article  Google Scholar 

  16. Goil S, Nagesh H, Choudhary A (1999) Mafia: efficient and scalable subspace clustering for very large data sets. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 443–452. ACM

  17. Hu J, Pei J (2017) Subspace multi-clustering: a review. Knowl Inf Syst 56:1–28

    Google Scholar 

  18. Kailing K, Kriegel H-P, Kröger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM international conference on data mining, SIAM, pp 246–256

  19. Kriegel H-P, Kroger P, Renz M, Wurst S (2005) A generic framework for efficient subspace clustering of high-dimensional data. In: Fifth IEEE international conference on data mining, pp 8–pp. IEEE

  20. Kuncheva LI, Hadjitodorov ST (2004) Using diversity in cluster ensembles. In: 2004 IEEE international conference on systems, man and cybernetics, vol 2, pp 1214–1219. IEEE

  21. Mai ST, Assent I, Storgaard M (2016) Anydbc: an efficient anytime density-based clustering algorithm for very large complex datasets. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1025–1034. ACM

  22. McWilliams B, Montana G (2014) Subspace clustering of high-dimensional data: a predictive approach. Data Min Knowl Discov 28(3):736–772

    Article  MathSciNet  Google Scholar 

  23. Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 977–986. ACM

  24. Nie F, Wang X, Jordan MI, Huang H (2016) The constrained Laplacian rank algorithm for graph-based clustering. In: AAAI, pp 1969–1976

  25. Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: International conference on machine learning, pp 1062–1070

  26. Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H-P (2012) Density-based projected clustering over high dimensional data streams. In: Proceedings of the 2012 SIAM international conference on data mining, pp 987–998. SIAM

  27. Procopiuc CM, Jones M, Agarwal PK, Murali T (2002) A monte carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD international conference on management of data, pp 418–427. ACM

  28. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496

    Article  Google Scholar 

  29. Sim K, Gopalkrishnan V, Zimek A, Cong G (2013) A survey on enhanced subspace clustering. Data Min Knowl Discov 26(2):332–397

    Article  MathSciNet  Google Scholar 

  30. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  31. Wu B, Wilamowski BM (2017) A fast density and grid based clustering method for data with arbitrary shapes and noise. IEEE Trans Ind Inf 13:1620–1628

    Article  Google Scholar 

  32. Yu Z, Luo P, You J, Wong H-S, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714

    Article  Google Scholar 

  33. Zhu B, Mozo A, Ordozgoiti B (2016) PSCEG: an unbiased parallel subspace clustering algorithm using exact grids. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)

  34. Zhu L, Cao L, Yang J, Lei J (2014) Evolving soft subspace clustering. Appl Soft Comput 14:210–228

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohsen Rezvani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fatehi, K., Rezvani, M. & Fateh, M. ASCRClu: an adaptive subspace combination and reduction algorithm for clustering of high-dimensional data. Pattern Anal Applic 23, 1651–1663 (2020). https://doi.org/10.1007/s10044-020-00884-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-020-00884-7

Keywords

Navigation