Structured sparse multi-view feature selection based on weighted hinge loss

  • Nan Wang
  • Yiming Xue
  • Qiang Lin
  • Ping ZhongEmail author


In applications, using features obtained from multiple views to describe objects has become popular because multiple views contain much more information than the single view. As the dimensions of the data sets are high, which may cause expensive time consumption and memory space, how to identify the representative views and features becomes a crucial problem. Multi-view feature selection that can integrate multiple views to select important and relevant features to improve performance has attracted more and more attentions in recent years. Previous supervised multi-view feature selection methods usually establish the models by concatenating multiple views into long vectors. However, this concatenation is not physically meaningful and implies that different views play the similar roles for specific tasks. In this paper, we propose a novel supervised multi-view feature selection method based on the weighted hinge loss (WHMVFS) that can learn the corresponding weight for each view and implement sparsity from the group and individual point of views under the structured sparsity framework. The newly proposed multi-view weighted hinge loss penalty not only has the ability to select more discriminative features for classification, but also can make the involved optimization problem be decomposed into several small scale subproblems, which can be easily solved by an iterative algorithm, and the convergence of the iterative algorithm is also proved. Experimental results conducted on real-world data sets show the effectiveness of the proposed method.


Multi-view feature selection Weighted hinge loss Structured sparse Classification 



The work is supported by the National Natural Science Foundation of China (Grant No. 61872368, U1536121,11171346). We thank Jing Zhong for her assistance of collecting data during the preparation of the revision. We also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.


  1. 1.
    Cai X, Nie F, Huang H (2013) Multi-view K-means clustering on big data. In: International joint conference on artificial intelligence. AAAI Press, pp 2598–2604Google Scholar
  2. 2.
    Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: Twenty-Eighth AAAI conference on artificial intelligence. AAAI Press, pp 1171–1177Google Scholar
  3. 3.
    Chen J, Zhou J, Ye J (2011) Integrating low-rank and group-sparse structures for robust multi-task learning. In: ACM SIGKDD International conference on knowledge discovery and data mining, pp 42–50Google Scholar
  4. 4.
    Cheng X, Zhu Y, Song J, Wen G, He W (2017) A novel low-rank hypergraph feature selection for multi-view classification. Neurocomputing 253:115–121CrossRefGoogle Scholar
  5. 5.
    Chua T, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM International conference on image and video retrieva. ACM, p 48Google Scholar
  6. 6.
    Gui J, Tao D, Sun Z, Luo Y, You X, Tang Y (2014) Group sparse multiview patch alignment framework with view consistency for image classification. IEEE Trans Image Process Publ IEEE Signal Process Soc 23(7):3126–3137MathSciNetzbMATHGoogle Scholar
  7. 7.
    Gui J, Sun Z, Ji S, Tao D, Tan T (2016) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28 (7):1490–1507MathSciNetCrossRefGoogle Scholar
  8. 8.
    Jebara T (2011) Multi-task sparsity via maximum entropy discrimination. J Mach Learn Res 12:75– 110MathSciNetzbMATHGoogle Scholar
  9. 9.
    Li X, Chen M, Nie F, Wang Q (2017) A multiview-based parameter free framework for group detection. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, pp 4147–4153Google Scholar
  10. 10.
    Lichman M (2013) UCI machine learning repository [], Irvine, CA: University of California School of Information and Computer Science
  11. 11.
    Liu J, Ji S, Ye J (2012) Multi-task feature learning via efficient l 2,1-norm minimization. In: Conference on uncertainty in artificial intelligence. AUAI Press, pp 339–348Google Scholar
  12. 12.
    Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint L 2,1-norms minimization. In: International conference on neural information processing systems. Curran Associates Inc, pp 1813–1821Google Scholar
  13. 13.
    Tibshirani R (2011) Regression shrinkage and selection via the lasso. J R Stat Soc 73(3):273–282MathSciNetCrossRefGoogle Scholar
  14. 14.
    Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2010) Sparsity and smoothness via the fused lasso. J R Stat Soc 67(1):91–108MathSciNetCrossRefGoogle Scholar
  15. 15.
    Wang H, Nie F, Huang H, Risacher S (2011) Sparse Multi-task regression and feature selection to identify brain imaging predictors for memory performance. In: International conference on computer vision. IEEE Computer Society, pp 557–562Google Scholar
  16. 16.
    Wang H, Nie F, Huang H, Risacher SL, Saykin AJ, Shen L (2012) Identifying disease sensitive and quantitative trait-relevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning. Bioinformatics 28(12):i127–i136CrossRefGoogle Scholar
  17. 17.
    Wang H, Nie F, Huang H (2013) Multi-view clustering and feature learning via structured sparsity. In: International conference on machine learning, pp 352–360Google Scholar
  18. 18.
    Wang H, Nie F, Huang H, Ding C (2013) Heterogeneous visual features fusion via sparse multimodal machine. In: IEEE Conference on computer vision and pattern recognition. IEEE Computer Society, pp 3097–3102Google Scholar
  19. 19.
    Wang X, Zhang X, Zeng Z, Wu Q, Zhang J (2016) Unsupervised spectral feature selection with l 1-norm graph. Neurocomputing 200:47–54CrossRefGoogle Scholar
  20. 20.
    Wang X, Chen R, Hong C, Zeng Z, Zhou Z (2017) Semi-supervised multi-label feature selection via label correlation analysis with l 1-norm graph embedding. Image Vis Comput 63:10–23CrossRefGoogle Scholar
  21. 21.
    Wang Q, Zhang F, Li X (2018) Optimal clustering framework for hyperspectral band selection. IEEE Trans Geosci Remote Sens 56(10):5910–5922CrossRefGoogle Scholar
  22. 22.
    Wang Q, Qin Z, Nie F, Li X Spectral embedded adaptive neighbors clustering. IEEE Transactions on Neural Networks and Learning Systems,
  23. 23.
    Xiao L, Sun Z, He R, Tan T (2013) Coupled feature selection for cross-sensor iris recognition. In: IEEE Sixth international conference on biometrics: theory, applications and systems. IEEE, pp 1–6Google Scholar
  24. 24.
    Xiao L, Sun Z, He R, Tan T (2014) Margin based feature selection for cross-sensor iris recognition via linear programming. Pattern Recognition. IEEE, pp 246–250Google Scholar
  25. 25.
    Xu Y, Wang C, Lai J (2016) Weighted multi-view clustering with feature selection. Pattern Recogn 53:25–35CrossRefGoogle Scholar
  26. 26.
    Xu J, Han J, Nie F, Li X (2017) Re-weighted discriminatively embedded K-means for multi-view clustering. IEEE Trans Image Process 26(6):3016–3027MathSciNetCrossRefGoogle Scholar
  27. 27.
    Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimed 15(3):661–669CrossRefGoogle Scholar
  28. 28.
    Yang Z, Wang H, Han Y, Zhu X (2017) Discriminative multi-task multi-view feature selection and fusion for multimedia analysis. Multimed Tools Appl 1:1–23Google Scholar
  29. 29.
    Ye J, Zhao Z, Wu M (2007) Discriminative K-means for clustering. In: International conference on neural information processing systems. Curran Associates Inc, pp 1649–1656Google Scholar
  30. 30.
    Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc 68(1):49–67MathSciNetCrossRefGoogle Scholar
  31. 31.
    Zhang M, Yang Y, Shen F, Zhang H, Wang Y (2017) Multi-view feature selection and classification for Alzheimers Disease diagnosis. Multimed Tools Appl 76 (8):10761–10775CrossRefGoogle Scholar
  32. 32.
    Zou H (2006) The adaptive lasso and its oracle properties. Publ Am Stat Assoc 101(476):1418–1429MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of ScienceChina Agricultural UniversityBeijingChina
  2. 2.College of Information and Electrical EngineeringChina Agricultural UniversityBeijingChina

Personalised recommendations