Abstract
Feature selection has attracted a great deal of interest over the past decades. By selecting meaningful feature subsets, the performance of learning algorithms can be effectively improved. Because label information is expensive to obtain, unsupervised feature selection methods are more widely used than the supervised ones. The key to unsupervised feature selection is to find features that effectively reflect the underlying data distribution. However, due to the inevitable redundancies and noise in a dataset, the intrinsic data distribution is not best revealed when using all features. To address this issue, we propose a novel unsupervised feature selection algorithm via joint local learning and group sparse regression (JLLGSR). JLLGSR incorporates local learning based clustering with group sparsity regularized regression in a single formulation, and seeks features that respect both the manifold structure and group sparse structure in the data space. An iterative optimization method is developed in which the weights finally converge on the important features and the selected features are able to improve the clustering results. Experiments on multiple real-world datasets (images, voices, and web pages) demonstrate the effectiveness of JLLGSR.
Similar content being viewed by others
References
Belkin M, Niyogi P, 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. 14th Int Conf on Neural Information Processing Systems: Natural and Synthetic, p. 585–591.
Bellman RE, 1961. Adaptive Control Processes: a Guided Tour. Princeton University Press, Princeton, NJ.
Cai D, Zhang C, He X, 2010. Unsupervised feature selection for multi-cluster data. 16th Int Conf on Knowledge Discovery and Data Mining, p. 333–342. https://doi.org/10.1145/1835804.1835848
Chang XJ, Nie FP, Yang Y, et al., 2016. Convex sparse PCA for unsupervised feature learning. ACM Trans Knowl Dis Data, 11(1):3. https://doi.org/10.1145/2910585
Cheung Y, Zeng H, 2009. Local kernel regression score for selecting features of high-dimensional data. IEEE Trans Knowl Data Eng, 21(12):1798–1802. https://doi.org/10.1109/TKDE.2009.23
Doquire G, Verleysen M, 2013. Mutual information-based feature selection for multilabel classification. Neuro-computing, 122:148–155. https://doi.org/10.1016/j.neucom.2013.06.035
Du L, Shen YD, 2015. Unsupervised feature selection with adaptive structure learning. 21st Int Conf on Knowledge Discovery and Data Mining, p. 209–218. https://doi.org/10.1145/2783258.2783345
Fanty M, Cole R, 1990. Spoken letter recognition. Conf on Advances in Neural Information Processing Systems, p. 220–226. https://doi.org/10.3115/116580.116725
Georghiades AS, Belhumeur PN, Kriegman DJ, 2001. From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Patt Anal Mach Intell, 23(6):643–660. https://doi.org/10.1109/34.927464
Guyon I, Elisseeff A, 2003. An introduction to variable and feature selection. J Mach Learn Res, 3:1157–1182. https://doi.org/10.1162/153244303322753616
Guyon I, Weston J, Barnhill S, et al., 2002. Gene selection for cancer classification using support vector machines. Mach Learn, 46(1–3):389–422. https://doi.org/10.1023/A:1012487302797
Han YH, Wu F, Tian Q, et al., 2012. Image annotation by input-output structural grouping sparsity. IEEE Trans Image Proc, 21(6):3066–3079. https://doi.org/10.1109/TIP.2012.2183880
Han YH, Yang Y, Yan Y, et al., 2015. Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neur Netw Learn Syst, 26(2):252–264. https://doi.org/10.1109/TNNLS.2014.2314123
He X, Niyogi P, 2004. Locality preserving projections. Conf on Advances in Neural Information Processing Systems, p. 153–160.
He X, Cai D, Niyogi P, 2005. Laplacian score for feature selection. Conf on Advances in Neural Information Processing Systems, p. 507–514.
Hou CP, Nie FP, Li XL, et al., 2014. Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans Cybern, 44(6):793–804. https://doi.org/10.1109/TCYB.2013.2272642
Hull JJ, 1994. A database for handwritten text recognition research. IEEE Trans Patt Anal Mach Intell, 16(5):550–554. https://doi.org/10.1109/34.291440
Jiang Y, Ren JT, 2011. Eigenvalue sensitive feature selection. 28th Int Conf on Machine Learning, p. 89–96.
Jolliffe IT, 2002. Principal Component Analysis (2nd Ed.). Springer, New York.
Krizhevsky A, 2009. Learning Multiple Layers of Features from Tiny Images. Science Department, University of Toronto, Tech, Toronto.
Kuhn HW, 1955. The Hungarian method for the assignment problem. Nav Res Log Q, 2(1–2):83–97. https://doi.org/10.1002/nav.3800020109
Lee KC, Ho J, Kriegman DJ, 2005. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Patt Anal Mach Intell, 27(5):684–698. https://doi.org/10.1109/TPAMI.2005.92
Luo MN, Nie FP, Chang XJ, et al., 2018. Adaptive unsupervised feature selection with structure regularization. IEEE Trans Neur Netw Learn Syst, 29(4):944–956. https://doi.org/10.1109/TNNLS.2017.2650978
Munkres J, 1957. Algorithms for the assignment and transportation problems. J Soc Ind Appl Math, 5(1):32–38. https://doi.org/10.1137/0105003
Nie FP, Xiang SM, Jia YQ, et al., 2008. Trace ratio criterion for feature selection. 23rd Int Conf on Artificial Intelligence, p. 671–676.
Nie FP, Xiang SM, Song YQ, et al., 2009. Orthogonal locality minimizing globality maximizing projections for feature extraction. Opt Eng, 48(1):017202. https://doi.org/10.1117/1.3067869
Nie FP, Huang H, Cai X, et al., 2010a. Efficient and robust feature selection via joint l 2,1-norms minimization. 23rd Int Conf on Neural Information Processing Systems, p. 1813–1821.
Nie FP, Xu D, Tsang IWH, et al., 2010b. Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction. IEEE Trans Image Proc, 19(7):1921–1932. https://doi.org/10.1109/TIP.2010.2044958
Nie FP, Zeng ZN, Tsang IW, et al., 2011. Spectral embedded clustering: a framework for in-sample and out-of-sample spectral clustering. IEEE Trans Neur Netw, 22(11):1796–1808. https://doi.org/10.1109/TNN.2011.2162000
Nie FP, Wang XQ, Jordan MI, et al., 2016a. The constrained Laplacian rank algorithm for graph-based clustering. 30th AAAI Conf on Artificial Intelligence, p. 1969–1976.
Nie FP, Zhu W, Li XI, 2016b. Unsupervised feature selection with structured graph optimization. 30th AAAI Conf on Artificial Intelligence, p. 1302–1308.
Peng HC, Long FH, Ding C, 2005. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Patt Anal Mach Intell, 27(8):1226–1238. https://doi.org/10.1109/TPAMI.2005.159
Roweis ST, Saul LK, 2000. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326. https://doi.org/10.1126/science.290.5500.2323
Sun YJ, Todorovic S, Goodison S, 2010. Local-learning-based feature selection for high-dimensional data analysis. IEEE Trans Patt Anal Mach Intell, 32(9):1610–1626. https://doi.org/10.1109/TPAMI.2009.190
Tan MK, Wang L, Tsang IW, 2010. Learning sparse SVM for feature selection on very high dimensional datasets. 27th Int Conf on Machine Learning, p. 1047–1054.
Tenenbaum JB, de Silva V, Langford JC, 2000. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323. https://doi.org/10.1126/science.290.5500.2319
Tibshirani R, 1996. Regression shrinkage and selection via the Lasso. J R Stat Soc B, 58(1):267–288.
Verleysen M, 2003. Learning high-dimensional data. In: Ablameyko S, Goras L, Gori M (Eds.), Limitations and Future Trends in Neural Computation. IOS Press, Amsterdam, p. 141–162.
Wang D, Nie FP, Huang H, 2014. Unsupervised feature selection via unified trace ratio formulation and K-means clustering (TRACK). European Conf on Machine Learning and Knowledge Discovery in Databases, p. 306–321. https://doi.org/10.1007/978-3-662-44845-8_20
Wu Y, Wang C, Bu JJ, et al., 2016. Group sparse feature selection on local learning based clustering. Neurocom-puting, 171:1118–1130. https://doi.org/10.1016/j.neucom.2015.07.045
Yang Y, Shen HT, Ma ZG, et al., 2011. l 2,1-norm regularized discriminative feature selection for unsupervised learning. 22nd Int Joint Conf on Artificial Intelligence, p. 1589–1594. https://doi.org/10.5591/978-1-57735-516-8/ijcai11-267
Zeng H, Cheung YM, 2009. Feature selection for local learning based clustering. 13th Pacific-Asia Conf on Advances in Knowledge Discovery and Data Mining, p. 414–425. https://doi.org/10.1007/978-3-642-01307-2_38
Zeng H, Cheung YM, 2011. Feature selection and kernel learning for local learning-based clustering. IEEE Trans Patt Anal Mach Intell, 33(8):1532–1547. https://doi.org/10.1109/TPAMI.2010.215
Zhao Z, Liu H, 2007. Spectral feature selection for supervised and unsupervised learning. 24th Int Conf on Machine Learning, p. 1151–1157. https://doi.org/10.1145/1273496.1273641
Zou H, Hastie T, 2005. Regularization and variable selection via the elastic net. J R Stat Soc Ser B, 67(2):301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Acknowledgements
The experiment is supported by Cheng-wei YAO in the Experiment Center of the College of Computer Science and Technology, Zhejiang University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project supported by Alibaba-Zhejiang University Joint Institute of Frontier Technologies and Zhejiang Provincial Key Research and Development Plan (No. 2017C01012)
Rights and permissions
About this article
Cite this article
Wu, Y., Wang, C., Zhang, Yq. et al. Unsupervised feature selection via joint local learning and group sparse regression. Frontiers Inf Technol Electronic Eng 20, 538–553 (2019). https://doi.org/10.1631/FITEE.1700804
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1700804