Fast deep learning with tight frame wavelets

Cao, Haitao

doi:10.1007/s00521-023-09260-y

Fast deep learning with tight frame wavelets

Original Article
Published: 22 December 2023

Volume 36, pages 4885–4905, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Haitao Cao ORCID: orcid.org/0000-0002-8626-8686¹

109 Accesses
Explore all metrics

Abstract

The cost function gradient vanishing or exploding problem and slow convergence speed are key issues when training deep neural networks (DNNs). In this paper, we investigate the forward and backward propagation processes of DNN training and explore the properties of the activation function and derivative function (ADF) employed. The outputs’ distribution of ADF with near-zero mean is proposed to reduce gradient problems. Additionally, the constant energy transfer of propagating data in the training process is also proposed to speed up convergence further. Based on wavelet frame theory, we derive a novel ADF, i.e., tight frame wavelet activation function (TFWAF) and tight frame wavelet derivative function (TFWDF) of the Mexican hat wavelet, to stabilize and accelerate DNN training. The nonlinearity of wavelet functions can strengthen the learning capacity of DNN models, while the sparse property of wavelets derived can reduce the overfitting problem and enhance the robustness of models. Experiments demonstrate that the proposed method stabilizes the DNN training process and accelerates convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed wavelet neural networks

Article 04 November 2021

A Review of Research Progress and Application of Wavelet Neural Networks

Incorporating Discrete Wavelet Transformation Decomposition Convolution into Deep Network to Achieve Light Training

Data availability

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Notes

http://yann.lecun.com/exdb/mnist/.

References

Chen Z, Zhang P (2022) Lightweight full-band and sub-band fusion network for real time speech enhancement. Proc Interspeech 2022:921–925
Article Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30
Zhang S, Lei M, Yan Z, Dai L (2018) Deep-fsmn for large vocabulary continuous speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5869–5873
Hu Y, Liu Y, Lv S, Xing M, Zhang S, Fu Y, Wu J, Zhang B, Xie L (2020) Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Duta IC, Liu L, Zhu F, Shao L (2021) Improved residual networks for image and video recognition. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 9415–9422
Bengio Y et al (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Article MathSciNet Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, pp 770–778
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. arXiv e-prints
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166
Article CAS PubMed Google Scholar
Erhan D, Manzagol P-A, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial intelligence and statistics, pp 153–160
LeCun YA, Bottou L, Orr GB, Müller K-R (2012) Efficient backprop. In: Neural networks: tricks of the trade. Springer, Berlin, pp 9–48
Bengio Y, LeCun Y et al (2007) Scaling learning algorithms towards AI. Large-scale Kernel Mach 34(5):1–41
Google Scholar
Erhan D, Bengio Y, Courville AC, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(3):625–660
MathSciNet Google Scholar
Dauphin YN, Bengio Y (2013) Big neural networks waste capacity. arXiv preprint arXiv:1301.3583
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet Google Scholar
Ziegler T, Fritsche M, Kuhn L, Donhauser K (2019) Efficient smoothing of dilated convolutions for image segmentation. arXiv preprint arXiv:1903.07992
Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516
Article Google Scholar
Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: a light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9190–9200
Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244
Article MathSciNet Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289
Raiko T, Valpola H, LeCun Y (2012) Deep learning made easier by linear transformations in perceptrons. In: Artificial intelligence and statistics, pp 924–932
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Advances in neural information processing systems, vol 30
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE international conference on computer vision, pp 1026–1034
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814
Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, pp 1310–1318
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th international conference on machine learning, vol 30, p 3
Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941
Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681 4(2):10–48550
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics, pp 315–323
Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853
Krizhevsky A, Hinton G (2010) Convolutional deep belief networks on cifar-10. Technical report, Toronto University
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, pp 249–256
Luo G (2010) An efficient DSP implantation of wavelet audio coding for digital communication. In: 4th international conference on digital society, pp 66–71
Vyas A, Paik J (2018) Applications of multiscale transforms to image denoising: survey. In: 2018 international conference on electronics, information, and communication, pp 1–3
Cao H, Luo G (2016) Wavelet speech enhancement based on improved Teager energy operator and local variance analysis. In: Artificial intelligence science and technology, pp 541–552
Zhang Q, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889–898
Article CAS PubMed Google Scholar
Zhang Q (1997) Using wavelet network in nonparametric estimation. IEEE Trans Neural Netw 8(2):227–236
Article CAS PubMed Google Scholar
Billings SA, Wei H-L (2005) A new class of wavelet networks for nonlinear system identification. IEEE Trans Neural Netw 16(4):862–874
Article PubMed Google Scholar
Wei H-L, Billings SA, Zhao Y, Guo L (2010) An adaptive wavelet neural network for spatio-temporal system identification. Neural Netw 23(10):1286–1299
Article CAS PubMed Google Scholar
Alexandridis AK, Zapranis AD (2013) Wavelet neural networks: a practical guide. Neural Netw 42:1–27
Article PubMed Google Scholar
Sidney Burrus C, Gopinath RA, Guo H (1998) Introduction to wavelets and wavelet transforms: a primer, 1st edn. Prentice Hall, New Jersey
Google Scholar
Donoho DL (1993) Unconditional bases are optimal bases for data compression and for statistical estimation. Appl Comput Harmonic Anal 1(1):100–115
Article MathSciNet Google Scholar
Daubechies I (1990) The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inf Theory 36(5):961–1005
Article ADS MathSciNet Google Scholar
Chui CK, Shi X (1993) Inequalities of Littlewood–Paley type for frames and wavelets. SIAM J Math Anal 24(1):263–277
Article MathSciNet Google Scholar
Daubechies I (1992) 3. Ten lectures on wavelets. SIAM, Pennsylvania, pp 53–105
Book Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324
Article Google Scholar
Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120
Liu H, Wang D, Liu J, Liu S (2015) Range tunable optical fiber micro-fabry-pérot interferometer for pressure sensing. IEEE Photonics Technol Lett 28(4):402–405
Article ADS MathSciNet Google Scholar
Lin H, Luo G, Cao H, Fang X, Zhou F (2019) Complex nonlinear system modelling and parameters identification by deep neural networks. DEStech Transactions on Computer Science and Engineering (AICAE)
Warden P (2018) Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209
Kim B, Chang S, Lee J, Sung D (2021) Broadcasted residual learning for efficient keyword spotting. arXiv preprint arXiv:2106.04140
Majumdar S, Ginsburg B (2020) Matchboxnet: 1d time-channel separable convolutional neural network architecture for speech commands recognition. arXiv preprint arXiv:2004.08531

Download references

Funding

This study was funded by the Guangzhou Municipal Science and Technology Bureau of China (Research Grant No. 202002030133).

Author information

Authors and Affiliations

School of Information Engineering, Guangzhou Panyu Polytechnic, No. 1342 Shiliang Street, Guangzhou, 511483, Guangdong, China
Haitao Cao

Authors

Haitao Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haitao Cao.

Ethics declarations

Conflict of interest

The authors certify that there is no actual or potential conflict of interest in relation to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cao, H. Fast deep learning with tight frame wavelets. Neural Comput & Applic 36, 4885–4905 (2024). https://doi.org/10.1007/s00521-023-09260-y

Download citation

Received: 29 March 2023
Accepted: 06 November 2023
Published: 22 December 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s00521-023-09260-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast deep learning with tight frame wavelets

Abstract

Access this article

Similar content being viewed by others

Distributed wavelet neural networks

A Review of Research Progress and Application of Wavelet Neural Networks

Incorporating Discrete Wavelet Transformation Decomposition Convolution into Deep Network to Achieve Light Training

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fast deep learning with tight frame wavelets

Abstract

Access this article

Similar content being viewed by others

Distributed wavelet neural networks

A Review of Research Progress and Application of Wavelet Neural Networks

Incorporating Discrete Wavelet Transformation Decomposition Convolution into Deep Network to Achieve Light Training

Data availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation