Skip to main content
Log in

Feature selection with MCP\(^2\) regularization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Feature selection, as a fundamental component of building robust models, plays an important role in many machine learning and data mining tasks. Recently, with the development of sparsity research, both theoretical and empirical studies have suggested that the sparsity is one of the intrinsic properties of real world data and sparsity regularization has been applied into feature selection models successfully. In view of the remarkable performance of non-convex regularization, in this paper, we propose a novel non-convex yet Lipschitz continuous sparsity regularization term, named MCP\(^2\), and apply it into feature selection. To solve the resulting non-convex model, a new algorithm in the framework of the ConCave–Convex Procedure is given at the same time. Experimental results on benchmark datasets demonstrate the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://featureselection.asu.edu/datasets.php.

  2. http://www.kasrl.org/jaffe_info.html.

References

  1. Argyriou A, Evgeniou T, Pontil M (2007) Multi-task feature learning. In: Advances in neural information processing systems, pp 41–48

  2. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271

    Article  MathSciNet  Google Scholar 

  3. Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 13 th international conference on machine learning, vol 98. pp 82–90

  4. Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 333–342

  5. Cai X, Nie F, Huang H, Ding C (2011) Multi-class l2, 1-norm support vector machine. In: Data mining (ICDM), 2011 IEEE 11th international conference on. IEEE, pp 91–100

  6. Collobert R, Sinz F, Weston J, Bottou L (2006) Large scale transductive svms. J Mach Learn Res 7:1687–1712

    MathSciNet  MATH  Google Scholar 

  7. Constantinopoulos C, Titsias MK, Likas A (2006) Bayesian feature and model selection for gaussian mixture models. IEEE Trans Pattern Anal Mach Intell 6:1013–1018

    Article  Google Scholar 

  8. Ding C, Zhou D, He X, Zha H (2006) R 1-pca: rotational invariant l 1-norm principal component analysis for robust subspace factorization. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 281–288

  9. Du X, Yan Y, Pan P, Long G, Zhao L (2016) Multiple graph unsupervised feature selection. Signal Process 120:754–760

    Article  Google Scholar 

  10. Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley, Hoboken

    MATH  Google Scholar 

  11. Esser E, Lou Y, Xin J (2013) A method for finding structured sparse solutions to nonnegative least squares problems with applications. SIAM J Imaging Sci 6(4):2010–2046

    Article  MathSciNet  Google Scholar 

  12. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  Google Scholar 

  13. Fung G, Mangasarian OL (2000) Data selection for support vector machine classifiers. In: Proceedings of the 6th ACM SIGKDD international conference on knowledge discovery data mining, pp 64–70

  14. Gao S, Ye Q, Ye N (2011) 1-norm least squares twin support vector machines. Neurocomputing 74(17):3590–3597

    Article  Google Scholar 

  15. Gui J, Sun Z, Ji S, Tao D, Tan T (2016) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507

    Article  MathSciNet  Google Scholar 

  16. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  17. Han Y, Yang Y, Yan Y, Ma Z, Sebe N, Zhou X (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26(2):252–264

    Article  MathSciNet  Google Scholar 

  18. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514

  19. Jiang W, Nie F, Huang H (2015) Robust dictionary learning with capped l1-norm. In: IJCAI, pp 3590–3596

  20. Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150

    Article  Google Scholar 

  21. Li Z, Yang Y, Liu J, Zhou X, Lu H (2012) Unsupervised feature selection using nonnegative spectral analysis. In: AAAI

  22. Ma Z, Nie F, Yang Y, Uijlings JR, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multimed 14(4):1021–1030

    Article  Google Scholar 

  23. Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint \(\ell _{2, 1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821

  24. Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. AAAI 2:671–676

    Google Scholar 

  25. Obozinski G, Taskar B, Jordan M (2006) Multi-task feature selection. Statistics Department, UC Berkeley, Tech. Rep

  26. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  27. Rakotomamonjy A (2003) Variable selection using svm based criteria. J Mach Learn Res 3:1357–1370

    MathSciNet  MATH  Google Scholar 

  28. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of relieff and rrelieff. Mach Learn 53(1–2):23–69

    Article  Google Scholar 

  29. Sahami M (1998) Using machine learning to improve information access. Ph.D. thesis, Stanford University

  30. Shi C, Ruan Q, Guo S, Tian Y (2015) Sparse feature selection based on \(l_{2, 1/2}\)-matrix norm for web image annotation. Neurocomputing 151:424–433

    Google Scholar 

  31. Shi Y, Miao J, Wang Z, Zhang P, Niu L, Feature selection with \(\ell _{2,1--2}\) regularization. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2017.2785403

  32. Tan J, Zhang Z, Zhen L, Zhang C, Deng N (2013) Adaptive feature selection via a new version of support vector machine. Neural Comput Appl 23(3–4):937–945

    Article  Google Scholar 

  33. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  34. Wang S, Tang J, Liu H (2015) Embedded unsupervised feature selection. In: Twenty-ninth AAAI conference on artificial intelligence

  35. Wright S (1965) The interpretation of population structure by f-statistics with special regard to systems of mating. Evolution 19:395–420

    Article  Google Scholar 

  36. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754

    Article  Google Scholar 

  37. Ye YF, Shao YH, Deng NY, Li CN, Hua XY (2017) Robust lp-norm least squares support vector regression with feature selection. Appl Math Comput 305:32–52

    MathSciNet  MATH  Google Scholar 

  38. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B Stat Methodol 68(1):49–67

    Article  MathSciNet  Google Scholar 

  39. Zhang CH et al (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942

    Article  MathSciNet  Google Scholar 

  40. Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Ind Inform 13(2):520–531

    Article  Google Scholar 

  41. Zhang H, Chow TW, Wu QJ (2016) Organizing books and authors by multilayer som. IEEE Trans Neural Netw Learn Syst 27(12):2537–2550

    Article  Google Scholar 

  42. Zhang H, Wang S, Zhao M, Xu X, Ye Y, Locality reconstruction models for book representation. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2018.2808953

  43. Zhang M, Ding CH, Zhang Y, Nie F (2014) Feature selection at the discrete limit. In: AAAI, pp 1355–1361

  44. Zhao Z, Liu H (2007) Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th international conference on machine learning. ACM, pp 1151–1157

  45. Zhen Y, Yeung DY (2012) Co-regularized hashing for multimodal data. In: Advances in neural information processing systems, pp 1376–1384

  46. Zhu P, Hu Q, Zhang C, Zuo W (2016) Coupled dictionary learning for unsupervised feature selection. In: AAAI, pp 2422–2428

  47. Zhu P, Xu Q, Hu Q, Zhang C, Zhao H (2018) Multi-label feature selection with missing labels. Pattern Recognit 74:488–502

    Article  Google Scholar 

  48. Zhu P, Zhu W, Wang W, Zuo W, Hu Q (2017) Non-convex regularized self-representation for unsupervised feature selection. Image Vis Comput 60:22–29

    Article  Google Scholar 

Download references

Acknowledgements

This work was in part supported by National Natural Science Foundation of China under Grants 91546201, 71331005, 11671379 and 11331012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lingfeng Niu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, Y., Miao, J. & Niu, L. Feature selection with MCP\(^2\) regularization. Neural Comput & Applic 31, 6699–6709 (2019). https://doi.org/10.1007/s00521-018-3500-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-018-3500-7

Keywords

Navigation