Distribution-dependent feature selection for deep neural networks

Zhao, Xuebin; Li, Weifu; Chen, Hong; Wang, Yingjie; Chen, Yanhong; John, Vijay

doi:10.1007/s10489-021-02663-1

Distribution-dependent feature selection for deep neural networks

Published: 22 July 2021

Volume 52, pages 4432–4442, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Xuebin Zhao¹,
Weifu Li¹,
Hong Chen¹,
Yingjie Wang²,
Yanhong Chen³ &
…
Vijay John⁴

500 Accesses
Explore all metrics

Abstract

While deep neural networks (DNNs) have achieved impressive performance on a wide variety of tasks, the black-box nature hinders their applicability to high-risk, decision-making fields. In such fields, besides accurate prediction, it is also desired to provide interpretable insights into DNNs, e.g., screening important features based on their contributions to predictive accuracy. To improve the interpretability of DNNs, this paper originally proposes a new feature selection algorithm for DNNs by integrating the knockoff technique and the distribution information of irrelevant features. With the help of knockoff features and central limit theorem, we state that the irrelevant feature’s statistic follows a known Gaussian distribution under mild conditions. This information is applied in hypothesis testing to discover key features associated with the DNNs. Empirical evaluations on simulated data demonstrate that the proposed method can select more true informative features with higher F₁ scores. Meanwhile, the Friedman test and the post-hoc Nemenyi test are employed to validate the superiority of the proposed method. Then we apply our method to Coronal Mass Ejections (CME) data and uncover the key features which contribute to the DNN-based CME arrival time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning based readmission and mortality prediction in heart failure patients

Article Open access 31 October 2023

A Hybrid Feature Selection for Improving Prediction Performance with a Brain Stroke Case Study

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Article 04 September 2023

Notes

https://github.com/younglululu/DeepPINK

References

Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv:abs/1803.08375
Barber RF, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085
Article MathSciNet Google Scholar
Barber RF, Candès EJ (2019) A knockoff filter for high-dimensional selective inference. Ann Stat 47(5):2504–2537
Article MathSciNet Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57(1):289–300
MathSciNet MATH Google Scholar
Candès E, Fan Y, Janson L, Lv J (2018) Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80:551–577
Article MathSciNet Google Scholar
Cao B, Shen D, Sun JT, Yang Q, Chen Z (2007) Feature selection in a kernel space. In: Proceedings of the 24th International Conference on Machine Learning, pp 121–128
Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27
Article Google Scholar
Chen H, Guo C, Xiong H, Wang Y (2021) Sparse additive machine with ramp loss. Anal Appl 19(03):509–528
Article MathSciNet Google Scholar
Chen H, Wang Y (2018) Kernel-based sparse regression with the correntropy-induced loss. Appl Comput Harmon Anal 44(1):144–164
Article MathSciNet Google Scholar
Chen H, Wang Y, Zheng F, Deng C, Huang H (2021) Sparse modal additive model. IEEE Trans Neural Netw Learn Syst 32(6):2373–2387
Article MathSciNet Google Scholar
Chen J, Stern M, Wainwright MJ, Jordan MI (2017) Kernel feature selection via conditional covariance minimization. In: Advances in neural information processing systems 30, pp 6946–6955
Collins M, Schapire R, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 48(1):253–285
Article Google Scholar
Cox DR (1958) The regression analysis of binary sequences. J Royal Stat Soc: Series B (Methodological) 20(2):215–242
MathSciNet MATH Google Scholar
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Fan Y, Demirkaya E, Li G, Lv J (2020) Rank: Large-scale inference with graphical nonlinear knockoffs. J Am Stat Assoc 115(529):362–379
Article MathSciNet Google Scholar
Fan Y, Lv J, Sharifvaghefi M, Uematsu Y (2020) Ipad: Stable interpretable forecasting with knockoffs inference. J Am Stat Assoc 115(532):1822–1834
Article MathSciNet Google Scholar
Friedl M, Brodley C (1997) Decision tree classification of land cover from remotely sensed data. Remote Sens Environ 61(3):399–409
Article Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article Google Scholar
González J, Ortega J, Damas M, Martín-Smith P, Gan JQ (2019) A new multi-objective wrapper method for feature selection - accuracy and stability analysis for bci. Neurocomputing 333:407–418
Article Google Scholar
Hocking RR (1976) A biometrics invited paper. the analysis and selection of variables in linear regression. Biometrics 32(1):1–49
Article MathSciNet Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324
Article Google Scholar
Liu H, Liu C, Wang JTL, Wang H (2020) Predicting coronal mass ejections using SDO/HMI vector magnetic data products and recurrent neural networks. The Astrophysical Journal 890(1):12
Article Google Scholar
Liu J, Ye Y, Shen C, Wang Y, Erdélyi R (2018) A new tool for CME arrival time prediction using machine learning algorithms: CAT-PUMA. The Astrophysical Journal 855(2):109
Article Google Scholar
Liu W, Ke Y, Liu J, Li R (2020) Model-free feature screening and fdr control with knockoff features. J Am Stat Assoc 0(0):1–16
Google Scholar
Lu Y, Fan Y, Lv J, Stafford Noble W (2018) Deeppink: reproducible feature selection in deep neural networks. In: Advances in neural information processing systems 31, Curran Associates Inc, pp 8676–8686
Nemenyi P (1963) Distribution-free Multiple Comparisons. Princeton University
Nicholson WB, Wilms I, Bien J, Matteson DS (2020) High dimensional forecasting via interpretable vector autoregression. J Mach Learn Res 21(166):1–52
MathSciNet MATH Google Scholar
Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, USA
MATH Google Scholar
Romano Y, Sesia M, Candès E (2019) Deep knockoffs. J Am Stat Assoc 0(0):1–12
MATH Google Scholar
Sesia M, Katsevich E, Bates S, Candès E, Sabatti C (2020) Multi-resolution localization of causal variants across the genome. Nat Commun 11(1093)
Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, vol 9, pp 3145–3153
Shrikumar A, Greenside P, Shcherbina A, Kundaje A (2016) Not just a black box: Learning important features through propagating activation differences. arXiv:abs/1605.01713
Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: Visualising image classification models and saliency maps. In: 2Nd international conference on learning representations, ICLR 2014, pp 1–8
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol 10, pp 3319– 3328
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B 58(1):267–288
MathSciNet MATH Google Scholar
Wang Y, Liu J, Jiang Y, Erdélyi R (2019) CME Arrival time prediction using convolutional neural network. The Astrophysical Journal 881(1):15
Article Google Scholar
Zahavy T, Kang B, Sivak A, Feng J, Xu H, Mannor S (2018) Ensemble robustness and generalization of stochastic deep learning algorithms. In: 6Th international conference on learning representations, ICLR 2018
Zhang XL, Zhang Q, Chen M, Sun Y, Qin X, Li H (2018) A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method. Neurocomputing 275:2426–2439
Article Google Scholar
Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2020) Unsupervised feature selection by self-paced learning regularization. Pattern Recogn Lett 132:4–11
Article Google Scholar
Zhu G, Zhao T (2021) Deep-gknock: Nonlinear group-feature selection with deep neural networks. Neural Netw 135:139– 147
Article Google Scholar

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant Nos 12071166 and 11771130, by the Fundamental Research Funds for the Central Universities of China under Grant 2662020LXQD002 and Grant 2662019FW003. The corresponding author is Hong Chen.

Author information

Authors and Affiliations

College of Science, Huazhong Agricultural University, Wuhan, 430062, China
Xuebin Zhao, Weifu Li & Hong Chen
College of Informatics, Huazhong Agricultural University, Wuhan, 430062, China
Yingjie Wang
National Space Science Center, Chinese Academy of Sciences, Beijing, 100190, China
Yanhong Chen
Research Center for Smart Vehicles, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku, Nagoya, Aichi, 468-8511, Japan
Vijay John

Authors

Xuebin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weifu Li
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanhong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Vijay John
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, X., Li, W., Chen, H. et al. Distribution-dependent feature selection for deep neural networks. Appl Intell 52, 4432–4442 (2022). https://doi.org/10.1007/s10489-021-02663-1

Download citation

Accepted: 05 July 2021
Published: 22 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02663-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distribution-dependent feature selection for deep neural networks

Abstract

Access this article

Similar content being viewed by others

Machine learning based readmission and mortality prediction in heart failure patients

A Hybrid Feature Selection for Improving Prediction Performance with a Brain Stroke Case Study

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Distribution-dependent feature selection for deep neural networks

Abstract

Access this article

Similar content being viewed by others

Machine learning based readmission and mortality prediction in heart failure patients

A Hybrid Feature Selection for Improving Prediction Performance with a Brain Stroke Case Study

Feature importance measure of a multilayer perceptron based on the presingle-connection layer

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation