Skip to main content
Log in

Fast deep learning with tight frame wavelets

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The cost function gradient vanishing or exploding problem and slow convergence speed are key issues when training deep neural networks (DNNs). In this paper, we investigate the forward and backward propagation processes of DNN training and explore the properties of the activation function and derivative function (ADF) employed. The outputs’ distribution of ADF with near-zero mean is proposed to reduce gradient problems. Additionally, the constant energy transfer of propagating data in the training process is also proposed to speed up convergence further. Based on wavelet frame theory, we derive a novel ADF, i.e., tight frame wavelet activation function (TFWAF) and tight frame wavelet derivative function (TFWDF) of the Mexican hat wavelet, to stabilize and accelerate DNN training. The nonlinearity of wavelet functions can strengthen the learning capacity of DNN models, while the sparse property of wavelets derived can reduce the overfitting problem and enhance the robustness of models. Experiments demonstrate that the proposed method stabilizes the DNN training process and accelerates convergence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Figure 16
Figure 17

Similar content being viewed by others

Data availability

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. http://yann.lecun.com/exdb/mnist/.

References

  1. Chen Z, Zhang P (2022) Lightweight full-band and sub-band fusion network for real time speech enhancement. Proc Interspeech 2022:921–925

    Article  Google Scholar 

  2. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, vol 30

  3. Zhang S, Lei M, Yan Z, Dai L (2018) Deep-fsmn for large vocabulary continuous speech recognition. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5869–5873

  4. Hu Y, Liu Y, Lv S, Xing M, Zhang S, Fu Y, Wu J, Zhang B, Xie L (2020) Dccrn: Deep complex convolution recurrent network for phase-aware speech enhancement. arXiv preprint arXiv:2008.00264

  5. Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934

  6. Duta IC, Liu L, Zhu F, Shao L (2021) Improved residual networks for image and video recognition. In: 2020 25th international conference on pattern recognition (ICPR). IEEE, pp 9415–9422

  7. Bengio Y et al (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    Article  MathSciNet  Google Scholar 

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, pp 770–778

  9. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. arXiv e-prints

  10. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  CAS  PubMed  Google Scholar 

  11. Erhan D, Manzagol P-A, Bengio Y, Bengio S, Vincent P (2009) The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Artificial intelligence and statistics, pp 153–160

  12. LeCun YA, Bottou L, Orr GB, Müller K-R (2012) Efficient backprop. In: Neural networks: tricks of the trade. Springer, Berlin, pp 9–48

  13. Bengio Y, LeCun Y et al (2007) Scaling learning algorithms towards AI. Large-scale Kernel Mach 34(5):1–41

    Google Scholar 

  14. Erhan D, Bengio Y, Courville AC, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11(3):625–660

    MathSciNet  Google Scholar 

  15. Dauphin YN, Bengio Y (2013) Big neural networks waste capacity. arXiv preprint arXiv:1301.3583

  16. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  Google Scholar 

  17. Ziegler T, Fritsche M, Kuhn L, Donhauser K (2019) Efficient smoothing of dilated convolutions for image segmentation. arXiv preprint arXiv:1903.07992

  18. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev 53(8):5455–5516

    Article  Google Scholar 

  19. Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116–131

  20. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  21. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  22. Mehta S, Rastegari M, Shapiro L, Hajishirzi H (2019) Espnetv2: a light-weight, power efficient, and general purpose convolutional neural network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9190–9200

  23. Shimodaira H (2000) Improving predictive inference under covariate shift by weighting the log-likelihood function. J Stat Plan Inference 90(2):227–244

    Article  MathSciNet  Google Scholar 

  24. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456

  25. Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289

  26. Raiko T, Valpola H, LeCun Y (2012) Deep learning made easier by linear transformations in perceptrons. In: Artificial intelligence and statistics, pp 924–932

  27. Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. In: Advances in neural information processing systems, vol 30

  28. Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450

  29. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: 2015 IEEE international conference on computer vision, pp 1026–1034

  30. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814

  31. Le QV, Jaitly N, Hinton GE (2015) A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941

  32. Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: International conference on machine learning, pp 1310–1318

  33. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of the 30th international conference on machine learning, vol 30, p 3

  34. Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941

  35. Misra D (2019) Mish: A self regularized non-monotonic neural activation function. arXiv preprint arXiv:1908.08681 4(2):10–48550

  36. Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the 14th international conference on artificial intelligence and statistics, pp 315–323

  37. Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900

  38. Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853

  39. Krizhevsky A, Hinton G (2010) Convolutional deep belief networks on cifar-10. Technical report, Toronto University

  40. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th international conference on artificial intelligence and statistics, pp 249–256

  41. Luo G (2010) An efficient DSP implantation of wavelet audio coding for digital communication. In: 4th international conference on digital society, pp 66–71

  42. Vyas A, Paik J (2018) Applications of multiscale transforms to image denoising: survey. In: 2018 international conference on electronics, information, and communication, pp 1–3

  43. Cao H, Luo G (2016) Wavelet speech enhancement based on improved Teager energy operator and local variance analysis. In: Artificial intelligence science and technology, pp 541–552

  44. Zhang Q, Benveniste A (1992) Wavelet networks. IEEE Trans Neural Netw 3(6):889–898

    Article  CAS  PubMed  Google Scholar 

  45. Zhang Q (1997) Using wavelet network in nonparametric estimation. IEEE Trans Neural Netw 8(2):227–236

    Article  CAS  PubMed  Google Scholar 

  46. Billings SA, Wei H-L (2005) A new class of wavelet networks for nonlinear system identification. IEEE Trans Neural Netw 16(4):862–874

    Article  PubMed  Google Scholar 

  47. Wei H-L, Billings SA, Zhao Y, Guo L (2010) An adaptive wavelet neural network for spatio-temporal system identification. Neural Netw 23(10):1286–1299

    Article  CAS  PubMed  Google Scholar 

  48. Alexandridis AK, Zapranis AD (2013) Wavelet neural networks: a practical guide. Neural Netw 42:1–27

    Article  PubMed  Google Scholar 

  49. Sidney Burrus C, Gopinath RA, Guo H (1998) Introduction to wavelets and wavelet transforms: a primer, 1st edn. Prentice Hall, New Jersey

    Google Scholar 

  50. Donoho DL (1993) Unconditional bases are optimal bases for data compression and for statistical estimation. Appl Comput Harmonic Anal 1(1):100–115

    Article  MathSciNet  Google Scholar 

  51. Daubechies I (1990) The wavelet transform, time-frequency localization and signal analysis. IEEE Trans Inf Theory 36(5):961–1005

    Article  ADS  MathSciNet  Google Scholar 

  52. Chui CK, Shi X (1993) Inequalities of Littlewood–Paley type for frames and wavelets. SIAM J Math Anal 24(1):263–277

    Article  MathSciNet  Google Scholar 

  53. Daubechies I (1992) 3. Ten lectures on wavelets. SIAM, Pennsylvania, pp 53–105

    Book  Google Scholar 

  54. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324

    Article  Google Scholar 

  55. Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120

  56. Liu H, Wang D, Liu J, Liu S (2015) Range tunable optical fiber micro-fabry-pérot interferometer for pressure sensing. IEEE Photonics Technol Lett 28(4):402–405

    Article  ADS  MathSciNet  Google Scholar 

  57. Lin H, Luo G, Cao H, Fang X, Zhou F (2019) Complex nonlinear system modelling and parameters identification by deep neural networks. DEStech Transactions on Computer Science and Engineering (AICAE)

  58. Warden P (2018) Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209

  59. Kim B, Chang S, Lee J, Sung D (2021) Broadcasted residual learning for efficient keyword spotting. arXiv preprint arXiv:2106.04140

  60. Majumdar S, Ginsburg B (2020) Matchboxnet: 1d time-channel separable convolutional neural network architecture for speech commands recognition. arXiv preprint arXiv:2004.08531

Download references

Funding

This study was funded by the Guangzhou Municipal Science and Technology Bureau of China (Research Grant No. 202002030133).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haitao Cao.

Ethics declarations

Conflict of interest

The authors certify that there is no actual or potential conflict of interest in relation to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, H. Fast deep learning with tight frame wavelets. Neural Comput & Applic 36, 4885–4905 (2024). https://doi.org/10.1007/s00521-023-09260-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09260-y

Keywords

Navigation