Skip to main content
Log in

Distribution-dependent feature selection for deep neural networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

While deep neural networks (DNNs) have achieved impressive performance on a wide variety of tasks, the black-box nature hinders their applicability to high-risk, decision-making fields. In such fields, besides accurate prediction, it is also desired to provide interpretable insights into DNNs, e.g., screening important features based on their contributions to predictive accuracy. To improve the interpretability of DNNs, this paper originally proposes a new feature selection algorithm for DNNs by integrating the knockoff technique and the distribution information of irrelevant features. With the help of knockoff features and central limit theorem, we state that the irrelevant feature’s statistic follows a known Gaussian distribution under mild conditions. This information is applied in hypothesis testing to discover key features associated with the DNNs. Empirical evaluations on simulated data demonstrate that the proposed method can select more true informative features with higher F1 scores. Meanwhile, the Friedman test and the post-hoc Nemenyi test are employed to validate the superiority of the proposed method. Then we apply our method to Coronal Mass Ejections (CME) data and uncover the key features which contribute to the DNN-based CME arrival time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://github.com/younglululu/DeepPINK

References

  1. Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv:abs/1803.08375

  2. Barber RF, Candès EJ (2015) Controlling the false discovery rate via knockoffs. Ann Stat 43(5):2055–2085

    Article  MathSciNet  Google Scholar 

  3. Barber RF, Candès EJ (2019) A knockoff filter for high-dimensional selective inference. Ann Stat 47(5):2504–2537

    Article  MathSciNet  Google Scholar 

  4. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57(1):289–300

    MathSciNet  MATH  Google Scholar 

  5. Candès E, Fan Y, Janson L, Lv J (2018) Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80:551–577

    Article  MathSciNet  Google Scholar 

  6. Cao B, Shen D, Sun JT, Yang Q, Chen Z (2007) Feature selection in a kernel space. In: Proceedings of the 24th International Conference on Machine Learning, pp 121–128

  7. Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27

    Article  Google Scholar 

  8. Chen H, Guo C, Xiong H, Wang Y (2021) Sparse additive machine with ramp loss. Anal Appl 19(03):509–528

    Article  MathSciNet  Google Scholar 

  9. Chen H, Wang Y (2018) Kernel-based sparse regression with the correntropy-induced loss. Appl Comput Harmon Anal 44(1):144–164

    Article  MathSciNet  Google Scholar 

  10. Chen H, Wang Y, Zheng F, Deng C, Huang H (2021) Sparse modal additive model. IEEE Trans Neural Netw Learn Syst 32(6):2373–2387

    Article  MathSciNet  Google Scholar 

  11. Chen J, Stern M, Wainwright MJ, Jordan MI (2017) Kernel feature selection via conditional covariance minimization. In: Advances in neural information processing systems 30, pp 6946–6955

  12. Collins M, Schapire R, Singer Y (2002) Logistic regression, adaboost and bregman distances. Mach Learn 48(1):253–285

    Article  Google Scholar 

  13. Cox DR (1958) The regression analysis of binary sequences. J Royal Stat Soc: Series B (Methodological) 20(2):215–242

    MathSciNet  MATH  Google Scholar 

  14. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  15. Fan Y, Demirkaya E, Li G, Lv J (2020) Rank: Large-scale inference with graphical nonlinear knockoffs. J Am Stat Assoc 115(529):362–379

    Article  MathSciNet  Google Scholar 

  16. Fan Y, Lv J, Sharifvaghefi M, Uematsu Y (2020) Ipad: Stable interpretable forecasting with knockoffs inference. J Am Stat Assoc 115(532):1822–1834

    Article  MathSciNet  Google Scholar 

  17. Friedl M, Brodley C (1997) Decision tree classification of land cover from remotely sensed data. Remote Sens Environ 61(3):399–409

    Article  Google Scholar 

  18. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  Google Scholar 

  19. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  20. González J, Ortega J, Damas M, Martín-Smith P, Gan JQ (2019) A new multi-objective wrapper method for feature selection - accuracy and stability analysis for bci. Neurocomputing 333:407–418

    Article  Google Scholar 

  21. Hocking RR (1976) A biometrics invited paper. the analysis and selection of variables in linear regression. Biometrics 32(1):1–49

    Article  MathSciNet  Google Scholar 

  22. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    Article  Google Scholar 

  23. Liu H, Liu C, Wang JTL, Wang H (2020) Predicting coronal mass ejections using SDO/HMI vector magnetic data products and recurrent neural networks. The Astrophysical Journal 890(1):12

    Article  Google Scholar 

  24. Liu J, Ye Y, Shen C, Wang Y, Erdélyi R (2018) A new tool for CME arrival time prediction using machine learning algorithms: CAT-PUMA. The Astrophysical Journal 855(2):109

    Article  Google Scholar 

  25. Liu W, Ke Y, Liu J, Li R (2020) Model-free feature screening and fdr control with knockoff features. J Am Stat Assoc 0(0):1–16

    Google Scholar 

  26. Lu Y, Fan Y, Lv J, Stafford Noble W (2018) Deeppink: reproducible feature selection in deep neural networks. In: Advances in neural information processing systems 31, Curran Associates Inc, pp 8676–8686

  27. Nemenyi P (1963) Distribution-free Multiple Comparisons. Princeton University

  28. Nicholson WB, Wilms I, Bien J, Matteson DS (2020) High dimensional forecasting via interpretable vector autoregression. J Mach Learn Res 21(166):1–52

    MathSciNet  MATH  Google Scholar 

  29. Rijsbergen CJV (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, USA

    MATH  Google Scholar 

  30. Romano Y, Sesia M, Candès E (2019) Deep knockoffs. J Am Stat Assoc 0(0):1–12

    MATH  Google Scholar 

  31. Sesia M, Katsevich E, Bates S, Candès E, Sabatti C (2020) Multi-resolution localization of causal variants across the genome. Nat Commun 11(1093)

  32. Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, vol 9, pp 3145–3153

  33. Shrikumar A, Greenside P, Shcherbina A, Kundaje A (2016) Not just a black box: Learning important features through propagating activation differences. arXiv:abs/1605.01713

  34. Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: Visualising image classification models and saliency maps. In: 2Nd international conference on learning representations, ICLR 2014, pp 1–8

  35. Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol 10, pp 3319– 3328

  36. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Series B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  37. Wang Y, Liu J, Jiang Y, Erdélyi R (2019) CME Arrival time prediction using convolutional neural network. The Astrophysical Journal 881(1):15

    Article  Google Scholar 

  38. Zahavy T, Kang B, Sivak A, Feng J, Xu H, Mannor S (2018) Ensemble robustness and generalization of stochastic deep learning algorithms. In: 6Th international conference on learning representations, ICLR 2018

  39. Zhang XL, Zhang Q, Chen M, Sun Y, Qin X, Li H (2018) A two-stage feature selection and intelligent fault diagnosis method for rotating machinery using hybrid filter and wrapper method. Neurocomputing 275:2426–2439

    Article  Google Scholar 

  40. Zheng W, Zhu X, Wen G, Zhu Y, Yu H, Gan J (2020) Unsupervised feature selection by self-paced learning regularization. Pattern Recogn Lett 132:4–11

    Article  Google Scholar 

  41. Zhu G, Zhao T (2021) Deep-gknock: Nonlinear group-feature selection with deep neural networks. Neural Netw 135:139– 147

    Article  Google Scholar 

Download references

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant Nos 12071166 and 11771130, by the Fundamental Research Funds for the Central Universities of China under Grant 2662020LXQD002 and Grant 2662019FW003. The corresponding author is Hong Chen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, X., Li, W., Chen, H. et al. Distribution-dependent feature selection for deep neural networks. Appl Intell 52, 4432–4442 (2022). https://doi.org/10.1007/s10489-021-02663-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02663-1

Keywords

Navigation