Skip to main content
Log in

Deep Neural Networks Regularization Using a Combination of Sparsity Inducing Feature Selection Methods

  • Published:
Neural Processing Letters Aims and scope Submit manuscript


Deep learning is an important subcategory of machine learning approaches in which there is a hope of replacing man-made features with fully automatic extracted features. However, in deep learning, we are generally facing a very high dimensional feature space. This may lead to overfitting problem which is tried to be prevented by applying regularization techniques. In this framework, the sparse representation based feature selection and regularization methods are very attractive. This is because of the nature of the sparse methods which represent a data with as less as possible non-zero coefficients. In this paper, we utilize a variety of sparse representation based methods for regularizing of deep neural networks. For this purpose, first, the effects of three basic sparsity inducing methods are studied. These are the Least Square Regression, Sparse Group Lasso (SGL) and Correntropy inducing Robust Feature Selection (CRFS) methods. Then, in order to improve the regularization process, three combinations of the basic methods are proposed. This study is performed considering a simple fully connected deep neural network and a VGG-like network. Our experimental results show that, overall, the combined methods outperform the basic ones. Considering two important factors of the amount of induced sparsity and classification accuracy, the combination of the CRFS and SGL methods leads to very successful results in deep neural network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others


  1. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  2. Krogh A, Hertz JA (1992) A simple weight decay can improve generalization. In: Advances in neural information processing systems, pp 950–957

  3. MacKay DJC (1995) Probable networks and plausible predictions: a review of practical Bayesian methods for supervised neural networks. Netw Comput Neural Syst 6:469–505

    Article  Google Scholar 

  4. Weigend AS, Rumelhart DE, Huberman BA (1991) Generalization by weight-elimination with application to forecasting. In: Advances in neural information processing systems, pp 875–882

  5. Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. In: Advances in neural information processing systems, pp 1135–1143

  6. Denil M, Shakibi B, Dinh L, De Freitas N, et al. (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, pp 2148–2156

  7. Sainath TN, Kingsbury B, Sindhwani V, Arisoy E, Ramabhadran B (2013) Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6655–6659

  8. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531

  9. Wang H, Nie F, Huang H, Ding C (2013) Heterogeneous visual features fusion via sparse multimodal machine. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3097–3102

  10. Scardapane S, Comminiello D, Hussain A, Uncini A (2017) Group sparse regularization for deep neural networks. Neurocomputing 241:81–89

    Article  Google Scholar 

  11. Wen W, Wu C, Wang Y, Chen Y, Li H (2016) Learning structured sparsity in deep neural networks. In: Advances in neural information processing systems, pp 2074–2082

  12. Friedman J, Hastie T, Tibshirani R (2010) A note on the group lasso and a sparse group lasso. arXiv:10010736

  13. He R, Tan T, Wang L, Zheng WS (2012) l 2, 1 regularized correntropy for robust feature selection. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2504–2511

  14. Strutz T (2010) Data fitting and uncertainty: a practical introduction to weighted least squares and beyond. Vieweg and Teubner, Berlin

    Google Scholar 

  15. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754

    Article  Google Scholar 

  16. Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge

    MATH  Google Scholar 

  17. Prechelt L (1998) Early stopping-but when? In: Neural networks: tricks of the trade. Springer, pp 55–69

  18. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  19. Nie F, Huang H, Cai X, Ding CH (2010) Efficient and robust feature selection via joint \(l_{2,1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821

  20. Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc Ser B (Stat Methodol) 68(1):49–67

    Article  MathSciNet  Google Scholar 

  21. Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group lasso. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 1095–1103

  22. Gui J, Sun Z, Ji S, Tao D, Tan T (2017) Feature selection based on structured sparsity: a comprehensive study. IEEE Trans Neural Netw Learn Syst 28(7):1490–1507

    Article  MathSciNet  Google Scholar 

  23. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical reports

  24. Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142

    Article  Google Scholar 

  25. Bayer C, Enge-Rosenblatt O, Bator M, Mönks U, Dicks A, Lohweg V (2013) Sensorless drive diagnosis using automated feature extraction, significance ranking and reduction. ETFA 2013:1–4

    Google Scholar 

  26. Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151

    Article  Google Scholar 

  27. Bhattacharjee A, Richards WG, Staunton J, Li C, Monti S, Vasa P, Ladd C, Beheshti J, Bueno R, Gillette M et al (2001) Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci 98(24):13790–13795

    Article  Google Scholar 

  28. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323

  29. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  30. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12(Oct):2825–2830

    MathSciNet  MATH  Google Scholar 

  31. Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D, Bengio Y (2010) Theano: A CPU and GPU math compiler in python. In: Proceedings of the 9th python in science conference, vol 1

  32. Chollet F, et al (2015) Keras.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Mohammad Taghi Sadeghi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farokhmanesh, F., Sadeghi, M.T. Deep Neural Networks Regularization Using a Combination of Sparsity Inducing Feature Selection Methods. Neural Process Lett 53, 701–720 (2021).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: