Skip to main content
Log in

Sparse-penalized deep neural networks estimator under weak dependence

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

We consider the nonparametric regression and the classification problems for \(\psi \)-weakly dependent processes. This weak dependence structure is more general than conditions such as, mixing, association\(\cdots \) A penalized estimation method for sparse deep neural networks is performed. In both nonparametric regression and binary classification problems, we establish oracle inequalities for the excess risk of the sparse-penalized deep neural networks estimators. Convergence rates of the excess risk of these estimators are also derived. The simulation results displayed show that, the proposed estimators can work well than the non penalized estimators, and that, there is a gain of using this estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Bauer B, Kohler M (2019) On deep learning as a remedy for the curse of dimensionality in nonparametric regression

  • Chen J, Du Y, Liu L, Zhang P, Zhang W (2019) Bbs posts time series analysis based on sample entropy and deep neural networks. Entropy 21(1):57

    Article  Google Scholar 

  • Dedecker J, Doukhan P, Lang G, José Rafael LR, Louhichi S, Prieur C (2007) Weak dependence. In: Weak dependence: with examples and applications. Springer, pp 9–20

  • Diop ML, Kengne W (2022) Inference and model selection in general causal time series with exogenous covariates. Electron J Stat 16(1):116–157

    Article  MathSciNet  Google Scholar 

  • Diop ML, Kengne W (2022b) Statistical learning for \(\psi \)-weakly dependent processes. arXiv preprint arXiv:2210.00088

  • Doukhan P, Louhichi S (1999) A new weak dependence condition and applications to moment inequalities. Stoch Process Appl 84(2):313–342

    Article  MathSciNet  Google Scholar 

  • Helmbold DP, Long PM (2015) On the inductive bias of dropout. J Mach Learn Res 16(1):3403–3454

    MathSciNet  Google Scholar 

  • Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag 29(6):82–97

    Article  Google Scholar 

  • Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012b) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

  • Hu T, Wang W, Lin C, Cheng G (2021) Regularization matters: A nonparametric perspective on overparametrized neural network. In: International conference on artificial intelligence and statistics, PMLR, pp 829–837

  • Hwang E, Shin DW (2014) A note on exponential inequalities of \(\psi \)-weakly dependent sequences. Commun Stat Appl Methods 21(3):245–251

    MathSciNet  Google Scholar 

  • Imaizumi M, Fukumizu K (2022) Advantage of deep neural networks for estimating functions with singularity on hypersurfaces. J Mach Learn Res 23:1–54

    MathSciNet  Google Scholar 

  • Kengne W (2023) Excess risk bound for deep learning under weak dependence. arXiv preprint arXiv:2302.07503

  • Kengne W, Modou W (2023) Deep learning for \(psi \)-weakly dependent processes. arXiv preprint arXiv:2302.00333

  • Kim Y, Ohn I, Kim D (2021) Fast convergence rates of deep neural networks for classification. Neural Netw 138:179–197

    Article  Google Scholar 

  • Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

  • Kohler M, Krzyzak A (2020) On the rate of convergence of a deep recurrent neural network estimate in a regression problem with dependent data. arXiv preprint arXiv:2011.00328

  • Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90

    Article  Google Scholar 

  • Kurisu D, Fukami R, Koike Y (2022) Adaptive deep learning for nonparametric time series regression. arXiv preprint arXiv:2207.02546

  • Li M, Soltanolkotabi M, Oymak S (2020) Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: International conference on artificial intelligence and statistics, PMLR, pp 4313–4324

  • Ma M, Safikhani A (2022) Theoretical analysis of deep neural networks for temporally dependent observations. arXiv preprint arXiv:2210.11530

  • Mou W, Zhou Y, Gao J, Wang L (2018) Dropout training, data-dependent regularization, and generalization bounds. In: International conference on machine learning, PMLR, pp 3645–3653

  • Ohn I, Kim Y (2019) Smooth function approximation by deep neural networks with general activation functions. Entropy 21(7):627

    Article  MathSciNet  Google Scholar 

  • Ohn I, Kim Y (2022) Nonconvex sparse regularization for deep neural networks and its optimality. Neural Comput 34(2):476–517

    Article  MathSciNet  Google Scholar 

  • Prechelt L (2002) Early stopping-but when? In: Neural networks: tricks of the trade, Springer, pp 55–69

  • Schmidt-Hieber J (2019) Deep relu network approximation of functions on a manifold. arXiv preprint arXiv:1908.00695

  • Schmidt-Hieber J (2020) Nonparametric regression using deep neural networks with relu activation function. Ann Stat 48(4):1875–1897

    MathSciNet  Google Scholar 

  • Sitaula C, Ghimire N (2017) An analysis of early stopping and dropout regularization in deep learning. Int J Concept Comput Inf Technol 5(1):17–20

    Google Scholar 

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  Google Scholar 

  • Tsuji K, Suzuki T (2021) Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space. Electron J Stat 15(1):1869–1908

    Article  MathSciNet  Google Scholar 

  • Wainwright MJ (2019) High-dimensional statistics: a non-asymptotic viewpoint, vol 48. Cambridge University Press, Cambridge

    Google Scholar 

  • Wang H, Yang W, Zhao Z, Luo T, Wang J, Tang Y (2019) Rademacher dropout: An adaptive dropout for deep neural network via optimizing generalization gap. Neurocomputing 357:177–187

    Article  Google Scholar 

  • Zhang T (2010) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11(3):1

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the Editors and the Referee for many relevant suggestions and comments which helped to improve the contents of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Kengne.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

M. Wade: Supported by the MME-DII center of excellence (ANR-11-LABEX-0023-01).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kengne, W., Wade, M. Sparse-penalized deep neural networks estimator under weak dependence. Metrika (2024). https://doi.org/10.1007/s00184-024-00965-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00184-024-00965-1

Keywords

Mathematics Subject Classification

Navigation