Abstract
Dimensionality reduction plays an important role in many machine learning and pattern recognition applications. Linear discriminant analysis (LDA) is the most popular supervised dimensionality reduction technique which searches for the projection matrix that makes the data points of different classes to be far from each other while requiring data points of the same class to be close to each other. In this paper, trace ratio LDA is combined with K-means clustering into a unified framework, in which K-means clustering is employed to generate class labels for unlabeled data and LDA is used to investigate low-dimensional representation of data. Therefore, by combining the subspace clustering with dimensionality reduction together, the optimal subspace can be obtained. Differing from other existing dimensionality reduction methods, our novel framework is suitable for different scenarios: supervised, semi-supervised, and unsupervised dimensionality reduction cases. Experimental results on benchmark datasets validate the effectiveness and superiority of our algorithm compared with other relevant techniques.
Similar content being viewed by others
References
Belkin, M., & Niyogi, P. (2001). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems 14 (pp. 585–591): MIT Press.
Cai, D., He, X., Han, J. (2007). Semi-supervised discriminant analysis. In 2007 IEEE 11th international conference on computer vision (pp. 1–7): IEEE.
Cai, D., Zhang, C., He, X. (2010). Unsupervised feature selection for multi-cluster data. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 333–342).
Chen, P., Jiao, L., Liu, F., Zhao, J., Zhao, Z., Liu, S. (2017). Semi-supervised double sparse graphs based discriminant analysis for dimensionality reduction. Pattern Recognition, 61, 361–378.
Cui, Y., & Fan, L. (2012). A novel supervised dimensionality reduction algorithm: graph-based fisher analysis. Pattern Recognition, 45(4), 1471–1481.
Delac, K., Grgic, M., Grgic, S. (2005). Independent comparative study of pca, ica, and lda on the feret data set. International Journal of Imaging Systems & Technology, 15(5), 252–260.
Ding, C., & Peng, H. (2005). Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology, 3(02), 185–205.
Feng, Z., Yang, M., Zhang, L., Liu, Y., Zhang, D. (2013). Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recognition, 46(8), 2134–2143.
Fukunaga, K. (1972). Introduction to statistical pattern recognition, 2nd edn. New York: Academic Press.
He, X., Cai, D., Yan, S., Zhang, H.-J. (2005). Neighborhood preserving embedding. In Tenth IEEE international conference on computer vision (ICCV’05) Volume 1, (Vol. 2 pp. 1208–1213): IEEE.
Hoi, S., Liu, W., Lyu, M., Ma, W.-Y. (2006). Learning distance metrics with contextual constraints for image retrieval. In 2006 IEEE computer society conference on computer vision and pattern recognition, (Vol. 2 pp. 2072–2078): IEEE.
Hou, C., Nie, F., Li, X., Yi, D., Wu, Y. (2014). Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Transactions on Cybernetics, 44(6), 793.
Jia, Y., Nie, F., Zhang, C. (2009). Trace ratio problem revisited. IEEE Transactions on Neural Networks, 20(4), 729–735.
Kokiopoulou, E., & Saad, Y. (2007). Orthogonal neighborhood preserving projections: a projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2143–2156.
Li, H., Jiang, T., Zhang, K. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transactions on Neural Networks, 17(1), 157–165.
Lin, Y.-Y., Liu, T.-L., Chen, H.-T. (2005). Semantic manifold learning for image retrieval. In Proceedings of the 13th annual ACM international conference on multimedia (pp. 249–258): ACM.
Liu, W., Jiang, W., Chang, S.-F. (2008). Relevance aggregation projections for image retrieval. In Proceedings of the 2008 international conference on content-based image and video retrieval (pp. 119–126): ACM.
Lyons, M.J., Budynek, J., Akamatsu, S. (1999). Automatic classification of single facial images. Pattern Analysis & Machine Intelligence IEEE Transactions on, 21 (12), 1357–1362.
Mahapatra, D. (2017). Semi-supervised learning and graph cuts for consensus based medical image segmentation. Pattern Recognition, 63, 700–709.
Mardia, K.V., Kent, J.T., Bibby, J.M. (2001). Multivariate analysis.
Nie, F., Xiang, S., Jia, Y., Zhang, C. (2009). Semi-supervised orthogonal discriminant analysis via label propagation. Pattern Recognition, 42(11), 2615–2627.
Nie, F., Xiang, S., Zhang, C. (2007). Neighborhood minmax projections. In International Joint Conference on Artifical Intelligence (pp. 993–998).
Niyogi, X. (2004). Locality preserving projections. In Neural information processing systems, (Vol. 16 p. 153): MIT.
Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., Mclaughlin, M.E., Batchelor, T.T. (2003). Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Research, 63(7), 1602–7.
Pedronette, D.C.G., Gonçalves, F.M.F., Guilherme, I.R. (2018). Unsupervised manifold learning through reciprocal knn graph and connected components for image retrieval tasks. Pattern Recognition, 75, 161–174.
Raducanu, B., & Dornaika, F. (2012). A supervised non-linear dimensionality reduction approach for manifold learning. Pattern Recognition, 45(6), 2432–2444.
Roweis, S.T., & Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), 203.
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis. Journal of Machine Learning Research, 8(May), 1027–1061.
Sugiyama, M., Idé, T., Nakajima, S., Sese, J. (2010). Semi-supervised local fisher discriminant analysis for dimensionality reduction. Machine Learning, 78 (1-2), 35–61.
Tenenbaum, J.B., De Silva, V., Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Wang, D., Nie, F., Huang, H., Yan, J., Risacher, S.L., Saykin, A.J., Shen, L. (2013). Structural brain network constrained neuroimaging marker identification for predicting cognitive functions. Inf Process Med Imaging, 23, 536–547.
Wang, H., Nie, F., Huang, H., Kim, S., Nho, K., Risacher, S.L., Saykin, A.J., Shen, L. (2012). Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the adni cohort. Bioinformatics, 28(2), 229.
Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L. (2011). Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In International conference on computer vision (pp. 557–562).
Wang, H., Yan, S., Xu, D., Tang, X. (2007). Trace ratio vs. ratio trace for dimensionality reduction. In IEEE conference on computer vision and pattern recognition (pp. 1–8).
Wang, S., Lu, J., Gu, X., Du, H., Yang, J. (2016). Semi-supervised linear discriminant analysis for dimension reduction and classification. Pattern Recognition, 57, 179–189.
Wang, X., Liu, Y., Nie, F., Huang, H. (2015). Discriminative unsupervised dimensionality reduction. In Proceedings of the 24th international conference on artificial intelligence (pp. 3925–3931): AAAI Press.
Wu, H., & Prasad, S. (2018). Semi-supervised dimensionality reduction of hyperspectral imagery using pseudo-labels. Pattern Recognition, 74, 212–224.
Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., Lin, S. (2007). Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.
Yu, G., Zhang, G., Domeniconi, C., Yu, Z., You, J. (2012). Semi-supervised classification based on random subspace dimensionality reduction. Pattern Recognition, 45(3), 1119–1135.
Yu, J., & Tian, Q. (2006). Learning image manifolds by semantic subspace projection. In Proceedings of the 14th ACM international conference on multimedia (pp. 297–306): ACM.
Zhang, D., Zhou, Z.-H., Chen, S. (2007). Semi-supervised dimensionality reduction. In SDM, SIAM (pp. 629–634).
Zhang, H., Wu, Q.M.J., Chow, T.W.S., Zhao, M. (2012). A two-dimensional neighborhood preserving projection for appearance-based face recognition. Pattern Recognition, 45(5), 1866–1876.
Zhang, Z., Zhang, Y., Li, F., Zhao, M., Zhang, L., Yan, S. (2017). Discriminative sparse flexible manifold embedding with novel graph for robust visual representation and label propagation. Pattern Recognition, 61, 492–510.
Zhuang, X., & Dai, D. (2007). Improved discriminate analysis for high-dimensional data and its application to face recognition. Pattern Recognition, 40(5), 1570–1578.
Acknowledgements
This work was supported by CSC funding under grant 201806280140 and by National Natural Science Foundation of China under grant 11631012.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, T., Xiao, Y., Guo, M. et al. A General Framework for Dimensionality Reduction of K-Means Clustering. J Classif 37, 616–631 (2020). https://doi.org/10.1007/s00357-019-09342-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-019-09342-4