Abstract
We consider the nonparametric regression and the classification problems for \(\psi \)-weakly dependent processes. This weak dependence structure is more general than conditions such as, mixing, association\(\cdots \) A penalized estimation method for sparse deep neural networks is performed. In both nonparametric regression and binary classification problems, we establish oracle inequalities for the excess risk of the sparse-penalized deep neural networks estimators. Convergence rates of the excess risk of these estimators are also derived. The simulation results displayed show that, the proposed estimators can work well than the non penalized estimators, and that, there is a gain of using this estimator.
Similar content being viewed by others
References
Bauer B, Kohler M (2019) On deep learning as a remedy for the curse of dimensionality in nonparametric regression
Chen J, Du Y, Liu L, Zhang P, Zhang W (2019) Bbs posts time series analysis based on sample entropy and deep neural networks. Entropy 21(1):57
Dedecker J, Doukhan P, Lang G, José Rafael LR, Louhichi S, Prieur C (2007) Weak dependence. In: Weak dependence: with examples and applications. Springer, pp 9–20
Diop ML, Kengne W (2022) Inference and model selection in general causal time series with exogenous covariates. Electron J Stat 16(1):116–157
Diop ML, Kengne W (2022b) Statistical learning for \(\psi \)-weakly dependent processes. arXiv preprint arXiv:2210.00088
Doukhan P, Louhichi S (1999) A new weak dependence condition and applications to moment inequalities. Stoch Process Appl 84(2):313–342
Helmbold DP, Long PM (2015) On the inductive bias of dropout. J Mach Learn Res 16(1):3403–3454
Hinton G, Deng L, Yu D, Dahl GE, Ar Mohamed, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN et al (2012) Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process Mag 29(6):82–97
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012b) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580
Hu T, Wang W, Lin C, Cheng G (2021) Regularization matters: A nonparametric perspective on overparametrized neural network. In: International conference on artificial intelligence and statistics, PMLR, pp 829–837
Hwang E, Shin DW (2014) A note on exponential inequalities of \(\psi \)-weakly dependent sequences. Commun Stat Appl Methods 21(3):245–251
Imaizumi M, Fukumizu K (2022) Advantage of deep neural networks for estimating functions with singularity on hypersurfaces. J Mach Learn Res 23:1–54
Kengne W (2023) Excess risk bound for deep learning under weak dependence. arXiv preprint arXiv:2302.07503
Kengne W, Modou W (2023) Deep learning for \(psi \)-weakly dependent processes. arXiv preprint arXiv:2302.00333
Kim Y, Ohn I, Kim D (2021) Fast convergence rates of deep neural networks for classification. Neural Netw 138:179–197
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Kohler M, Krzyzak A (2020) On the rate of convergence of a deep recurrent neural network estimate in a regression problem with dependent data. arXiv preprint arXiv:2011.00328
Krizhevsky A, Sutskever I, Hinton GE (2017) Imagenet classification with deep convolutional neural networks. Commun ACM 60(6):84–90
Kurisu D, Fukami R, Koike Y (2022) Adaptive deep learning for nonparametric time series regression. arXiv preprint arXiv:2207.02546
Li M, Soltanolkotabi M, Oymak S (2020) Gradient descent with early stopping is provably robust to label noise for overparameterized neural networks. In: International conference on artificial intelligence and statistics, PMLR, pp 4313–4324
Ma M, Safikhani A (2022) Theoretical analysis of deep neural networks for temporally dependent observations. arXiv preprint arXiv:2210.11530
Mou W, Zhou Y, Gao J, Wang L (2018) Dropout training, data-dependent regularization, and generalization bounds. In: International conference on machine learning, PMLR, pp 3645–3653
Ohn I, Kim Y (2019) Smooth function approximation by deep neural networks with general activation functions. Entropy 21(7):627
Ohn I, Kim Y (2022) Nonconvex sparse regularization for deep neural networks and its optimality. Neural Comput 34(2):476–517
Prechelt L (2002) Early stopping-but when? In: Neural networks: tricks of the trade, Springer, pp 55–69
Schmidt-Hieber J (2019) Deep relu network approximation of functions on a manifold. arXiv preprint arXiv:1908.00695
Schmidt-Hieber J (2020) Nonparametric regression using deep neural networks with relu activation function. Ann Stat 48(4):1875–1897
Sitaula C, Ghimire N (2017) An analysis of early stopping and dropout regularization in deep learning. Int J Concept Comput Inf Technol 5(1):17–20
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Tsuji K, Suzuki T (2021) Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space. Electron J Stat 15(1):1869–1908
Wainwright MJ (2019) High-dimensional statistics: a non-asymptotic viewpoint, vol 48. Cambridge University Press, Cambridge
Wang H, Yang W, Zhao Z, Luo T, Wang J, Tang Y (2019) Rademacher dropout: An adaptive dropout for deep neural network via optimizing generalization gap. Neurocomputing 357:177–187
Zhang T (2010) Analysis of multi-stage convex relaxation for sparse regularization. J Mach Learn Res 11(3):1
Acknowledgements
The authors are grateful to the Editors and the Referee for many relevant suggestions and comments which helped to improve the contents of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
M. Wade: Supported by the MME-DII center of excellence (ANR-11-LABEX-0023-01).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kengne, W., Wade, M. Sparse-penalized deep neural networks estimator under weak dependence. Metrika (2024). https://doi.org/10.1007/s00184-024-00965-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00184-024-00965-1