A General Framework for Dimensionality Reduction of K-Means Clustering

Wu, Tong; Xiao, Yanni; Guo, Muhan; Nie, Feiping

doi:10.1007/s00357-019-09342-4

A General Framework for Dimensionality Reduction of K-Means Clustering

Published: 23 August 2019

Volume 37, pages 616–631, (2020)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Tong Wu ORCID: orcid.org/0000-0001-8599-6078¹,
Yanni Xiao¹,
Muhan Guo² &
…
Feiping Nie²

907 Accesses
9 Citations
Explore all metrics

Abstract

Dimensionality reduction plays an important role in many machine learning and pattern recognition applications. Linear discriminant analysis (LDA) is the most popular supervised dimensionality reduction technique which searches for the projection matrix that makes the data points of different classes to be far from each other while requiring data points of the same class to be close to each other. In this paper, trace ratio LDA is combined with K-means clustering into a unified framework, in which K-means clustering is employed to generate class labels for unlabeled data and LDA is used to investigate low-dimensional representation of data. Therefore, by combining the subspace clustering with dimensionality reduction together, the optimal subspace can be obtained. Differing from other existing dimensionality reduction methods, our novel framework is suitable for different scenarios: supervised, semi-supervised, and unsupervised dimensionality reduction cases. Experimental results on benchmark datasets validate the effectiveness and superiority of our algorithm compared with other relevant techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximizing adjusted covariance: new supervised dimension reduction for classification

Article 02 April 2024

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Notes

http://archive.ics.uci.edu/ml/.

References

Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems 14 (pp. 585–591): MIT Press.
Cai, D., He, X., Han, J. (2007). Semi-supervised discriminant analysis. In 2007 IEEE 11th international conference on computer vision (pp. 1–7): IEEE.
Cai, D., Zhang, C., He, X. (2010). Unsupervised feature selection for multi-cluster data. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 333–342).
Chen, P., Jiao, L., Liu, F., Zhao, J., Zhao, Z., Liu, S. (2017). Semi-supervised double sparse graphs based discriminant analysis for dimensionality reduction. Pattern Recognition, 61, 361–378.
Article Google Scholar
Cui, Y., & Fan, L. (2012). A novel supervised dimensionality reduction algorithm: graph-based fisher analysis. Pattern Recognition, 45(4), 1471–1481.
Article Google Scholar
Delac, K., Grgic, M., Grgic, S. (2005). Independent comparative study of pca, ica, and lda on the feret data set. International Journal of Imaging Systems & Technology, 15(5), 252–260.
Article Google Scholar
Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3(02), 185–205.
Article Google Scholar
Feng, Z., Yang, M., Zhang, L., Liu, Y., Zhang, D. (2013). Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recognition, 46(8), 2134–2143.
Article Google Scholar
Fukunaga, K. (1972). Introduction to statistical pattern recognition, 2nd edn. New York: Academic Press.
MATH Google Scholar
He, X., Cai, D., Yan, S., Zhang, H.-J. (2005). Neighborhood preserving embedding. In Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, (Vol. 2 pp. 1208–1213): IEEE.
Hoi, S., Liu, W., Lyu, M., Ma, W.-Y. (2006). Learning distance metrics with contextual constraints for image retrieval. In 2006 IEEE computer society conference on computer vision and pattern recognition, (Vol. 2 pp. 2072–2078): IEEE.
Hou, C., Nie, F., Li, X., Yi, D., Wu, Y. (2014). Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Transactions on Cybernetics, 44(6), 793.
Article Google Scholar
Jia, Y., Nie, F., Zhang, C. (2009). Trace ratio problem revisited. IEEE Transactions on Neural Networks, 20(4), 729–735.
Article Google Scholar
Kokiopoulou, E., & Saad, Y. (2007). Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2143–2156.
Article Google Scholar
Li, H., Jiang, T., Zhang, K. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transactions on Neural Networks, 17(1), 157–165.
Article Google Scholar
Lin, Y.-Y., Liu, T.-L., Chen, H.-T. (2005). Semantic manifold learning for image retrieval. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 249–258): ACM.
Liu, W., Jiang, W., Chang, S.-F. (2008). Relevance aggregation projections for image retrieval. In Proceedings of the 2008 international conference on content-based image and video retrieval (pp. 119–126): ACM.
Lyons, M.J., Budynek, J., Akamatsu, S. (1999). Automatic classification of single facial images. Pattern Analysis & Machine Intelligence IEEE Transactions on, 21 (12), 1357–1362.
Article Google Scholar
Mahapatra, D. (2017). Semi-supervised learning and graph cuts for consensus based medical image segmentation. Pattern Recognition, 63, 700–709.
Article Google Scholar
Mardia, K.V., Kent, J.T., Bibby, J.M. (2001). Multivariate analysis.
Nie, F., Xiang, S., Jia, Y., Zhang, C. (2009). Semi-supervised orthogonal discriminant analysis via label propagation. Pattern Recognition, 42(11), 2615–2627.
Article Google Scholar
Nie, F., Xiang, S., Zhang, C. (2007). Neighborhood minmax projections. In International Joint Conference on Artifical Intelligence (pp. 993–998).
Niyogi, X. (2004). Locality preserving projections. In Neural information processing systems, (Vol. 16 p. 153): MIT.
Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E., Batchelor, T.T. (2003). Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63(7), 1602–7.
Google Scholar
Pedronette, D.C.G., Gonçalves, F.M.F., Guilherme, I.R. (2018). Unsupervised manifold learning through reciprocal knn graph and connected components for image retrieval tasks. Pattern Recognition, 75, 161–174.
Article Google Scholar
Raducanu, B., & Dornaika, F. (2012). A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognition, 45(6), 2432–2444.
Article Google Scholar
Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Article Google Scholar
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203.
Article Google Scholar
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8(May), 1027–1061.
MATH Google Scholar
Sugiyama, M., Idé, T., Nakajima, S., Sese, J. (2010). Semi-supervised local fisher discriminant analysis for dimensionality reduction. Machine Learning, 78 (1-2), 35–61.
Article MathSciNet Google Scholar
Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Article Google Scholar
Wang, D., Nie, F., Huang, H., Yan, J., Risacher, S.L., Saykin, A.J., Shen, L. (2013). Structural brain network constrained neuroimaging marker identification for predicting cognitive functions. Inf Process Med Imaging, 23, 536–547.
Google Scholar
Wang, H., Nie, F., Huang, H., Kim, S., Nho, K., Risacher, S.L., Saykin, A.J., Shen, L. (2012). Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the adni cohort. Bioinformatics, 28(2), 229.
Article Google Scholar
Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L. (2011). Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In International conference on computer vision (pp. 557–562).
Wang, H., Yan, S., Xu, D., Tang, X. (2007). Trace ratio vs. ratio trace for dimensionality reduction. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Wang, S., Lu, J., Gu, X., Du, H., Yang, J. (2016). Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recognition, 57, 179–189.
Article Google Scholar
Wang, X., Liu, Y., Nie, F., Huang, H. (2015). Discriminative unsupervised dimensionality reduction. In Proceedings of the 24th international conference on artificial intelligence (pp. 3925–3931): AAAI Press.
Wu, H., & Prasad, S. (2018). Semi-supervised dimensionality reduction of hyperspectral imagery using pseudo-labels. Pattern Recognition, 74, 212–224.
Article Google Scholar
Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.
Article Google Scholar
Yu, G., Zhang, G., Domeniconi, C., Yu, Z., You, J. (2012). Semi-supervised classification based on random subspace dimensionality reduction. Pattern Recognition, 45(3), 1119–1135.
Article Google Scholar
Yu, J., & Tian, Q. (2006). Learning image manifolds by semantic subspace projection. In Proceedings of the 14th ACM international conference on multimedia (pp. 297–306): ACM.
Zhang, D., Zhou, Z.-H., Chen, S. (2007). Semi-supervised dimensionality reduction. In SDM, SIAM (pp. 629–634).
Zhang, H., Wu, Q.M.J., Chow, T.W.S., Zhao, M. (2012). A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognition, 45(5), 1866–1876.
Article Google Scholar
Zhang, Z., Zhang, Y., Li, F., Zhao, M., Zhang, L., Yan, S. (2017). Discriminative sparse flexible manifold embedding with novel graph for robust visual representation and label propagation. Pattern Recognition, 61, 492–510.
Article Google Scholar
Zhuang, X., & Dai, D. (2007). Improved discriminate analysis for high-dimensional data and its application to face recognition. Pattern Recognition, 40(5), 1570–1578.
Article Google Scholar

Download references

Acknowledgements

This work was supported by CSC funding under grant 201806280140 and by National Natural Science Foundation of China under grant 11631012.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an, 710049, Shaanxi, People’s Republic of China
Tong Wu & Yanni Xiao
School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, People’s Republic of China
Muhan Guo & Feiping Nie

Authors

Tong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yanni Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Muhan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Feiping Nie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanni Xiao.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, T., Xiao, Y., Guo, M. et al. A General Framework for Dimensionality Reduction of K-Means Clustering. J Classif 37, 616–631 (2020). https://doi.org/10.1007/s00357-019-09342-4

Download citation

Published: 23 August 2019
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00357-019-09342-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A General Framework for Dimensionality Reduction of K-Means Clustering

Abstract

Access this article

Similar content being viewed by others

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A General Framework for Dimensionality Reduction of K-Means Clustering

Abstract

Access this article

Similar content being viewed by others

Maximizing adjusted covariance: new supervised dimension reduction for classification

Feature dimensionality reduction: a review

Feature selection techniques for machine learning: a survey of more than two decades of research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation