Skip to main content
Log in

A Primer on Deep Learning Architectures and Applications in Speech Processing

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

In the recent past years, deep-learning-based machine learning methods have demonstrated remarkable success for a wide range of learning tasks in multiple domains. They are suitable for complex classification and regression problems in applications such as computer vision, speech recognition and other pattern analysis branches. The purpose of this article is to contribute a timely review and introduction of state-of-the-art and popular discriminative DNN, CNN and RNN deep learning techniques, the basic framework and algorithms, hardware implementations, applications in speech, and the overall benefits of deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://www.aldec.com/en/company/blog/167--fpgas-vs-gpus-for-machine-learning-applications-which-one-is-better.

References

  1. O. Abdel-Hamid, A.R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu, Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)

    Article  Google Scholar 

  2. F. Abuzaid., Optimizing cpu performance for convolutional neural networks. Online. Available: http://cs231n.stanford.edu/reports/2015/pdfs/fabuzaid final report.pdf

  3. M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in The 49th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE Press, New Jersey, 2016)

  4. P. Angelov, A. Sperduti, Challenges in deep learning, in Proceedings of ESANN (2016), pp. 489–495

  5. A. Ansari, K. Gunnam, T. Ogunfunmi, An efficient reconfigurable hardware accelerator for convolutional neural networks, in 51st Asilomar Conference on Signals, Systems, and Computers (IEEE, 2017), pp. 1337–1341

  6. A. Ansari, T. Ogunfunmi, An Efficient Network Agnostic Architecture Design and Analysis for Convolutional Neural Networks. submitted to the IEEE JETCAS, Special Issue on Customized sub-systems and circuits for deep learning (2019)

  7. A. Bhandare, M. Bhide, P. Gokhale, R. Chandavarkar, Applications of convolutional neural networks. Int. J. Comput. Sci. Inf. Technol. 7, 2206–2215 (2016)

    Google Scholar 

  8. C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006)

    MATH  Google Scholar 

  9. S. Böck, M. Schedl, Polyphonic piano note transcription with recurrent neural networks, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 121–124

  10. W.M. Campbell, K.T. Assaleh, C.C. Broun, Speaker recognition with polynomial classifiers. IEEE Trans. Speech Audio Process. 10(4), 205–212 (2002)

    Article  Google Scholar 

  11. S. Chakradhar, M. Sankaradas, V. Jakkula, S. Cadambi, A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38(3), 247–257 (2010)

    Article  Google Scholar 

  12. J.H. Chen, A. Gersho, Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans. Speech Audio Process. 3(1), 59–71 (1995)

    Article  Google Scholar 

  13. Y.H. Chen, J. Emer, V. Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Archit. News 44(3), 367–379 (2016)

    Article  Google Scholar 

  14. Y.H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)

    Article  Google Scholar 

  15. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, cudnn: efficient primitives for deep learning (2014). arXiv preprint arXiv:1410.0759

  16. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1724–1734

  17. J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv preprint. arXiv:1412.3555

  18. D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUS). arXiv preprint arXiv:1511.07289

  19. J. Cong, Z. Fang, M. Lo, H. Wang, J. Xu, S. Zhang, Understanding performance differences of FPGAs and GPUs, in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). (IEEE, 2018), pp. 93–96

  20. N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  21. L. Deng, N. Jaitly, Deep discriminative and generative models for speech pattern recognition, in Handbook of Pattern Recognition and Computer Vision, ed. by C.H. Chen (World Scientific, Singapore, 2016), pp. 27–52

    Chapter  Google Scholar 

  22. R. Dey, F.M. Salemt, Gate-variants of Gated Recurrent Unit (GRU) neural networks, in 60th International Midwest Symposium on Circuits and Systems (MWSCAS) (2017), pp. 1597–1600

  23. J.S. Edwards, R.P. Ramachandran, U. Thayasivam, Robust speaker verification with a two classifier format and feature enhancement, in IEEE international symposium on circuits and systems (ISCAS) (2017), pp. 1–4

  24. M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)

    Article  MATH  Google Scholar 

  25. D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  26. A. Fazel, S. Chakrabartty, An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011)

    Article  Google Scholar 

  27. J. Fowers, G. Brown, P. Cooke, G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (2012), pp. 47–56

  28. S.W. Fu, Y. Tsao, X. Lu, H. Kawai, Raw waveform-based speech enhancement by fully convolutional networks (2017). arXiv preprint arXiv:1703.02205

  29. S.W. Fu, Y. Tsao, X. Lu, SNR-aware convolutional neural network modeling for speech enhancement, in Interspeech (2016), pp. 3768–3772

  30. F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, in 9th International Conference on Artificial Neural Networks (ICANN) (1999), pp. 850–855

  31. P.K. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19(3), 600–613 (2011)

    Article  Google Scholar 

  32. X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (2011), pp. 315–323

  33. I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep Learning (MIT Press, Cambridge, 2016)

    MATH  Google Scholar 

  34. S. Han, B. Dally, Efficient methods and hardware for deep learning. University Lecture (2017)

  35. S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network, in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2016), pp. 243–254

  36. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778

  37. G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  38. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  39. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  40. C.W. Huang, S. Narayanan, Characterizing types of convolution in deep convolutional recurrent neural networks for robust speech emotion recognition (2017). arXiv preprint arXiv:1706.02901

  41. N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, R. Boyle, et al., In-datacenter performance analysis of a tensor processing unit, in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), pp. 1–12

  42. P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)

    Article  Google Scholar 

  43. H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 10(8), 591–604 (2002)

    Article  Google Scholar 

  44. D.P. Kingma, M. Welling, Auto-encoding variational bayes (2013). arXiv preprint. arXiv:1312.6114

  45. T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)

    Article  Google Scholar 

  46. P.W. Koh, P. Liang, Understanding black-box predictions via influence functions, in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (JMLR. org, 2017), pp. 1885–1894

  47. A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105

  48. H.T. Kung, B. McDanel, S.Q. Zhang, Mapping systolic arrays onto 3D circuit structures: accelerating convolutional neural network inference, in IEEE Workshop on Signal Processing Systems (2018)

  49. G. Lacey, G.W. Taylor, S. Areibi, Deep learning on fpgas: past, present, and future (2016). arXiv preprint arXiv:1602.04283

  50. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  51. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  52. Z. Li, J. Eichel, A. Mishra, A. Achkar, K. Naik, A CPU-based algorithm for traffic optimization based on sparse convolutional neural networks, in Electrical and Computer Engineering (CCECE), 2017 IEEE 30th Canadian Conference IEEE (2017), pp. 1–5

  53. M. Lin, Q. Chen, S. Yan, Network in network (2013). arXiv preprint. arXiv:1312.4400

  54. Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning (2015). arXiv preprint. arXiv:1506.00019

  55. P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007)

    Book  Google Scholar 

  56. A. Makhzani, B. Frey, K-sparse autoencoders (2013). arXiv preprint. arXiv:1312.5663

  57. T. May, S. Van De Par, A. Kohlrausch, Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans. Audio, SpeechLang. Process. 20(1), 108–121 (2012)

    Article  Google Scholar 

  58. A. McCree, Reducing speech coding distortion for speaker identification, in Ninth International Conference on Spoken Language Processing (2006)

  59. W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)

    Article  MathSciNet  MATH  Google Scholar 

  60. M. McLaren, Y. Lei, N. Scheffer, L. Ferrer, Application of convolutional neural networks to speaker recognition in noisy conditions, in Fifteenth Annual Conference of the International Speech Communication Association (2014)

  61. J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)

    Article  Google Scholar 

  62. V. Mitra, H. Franco, Time-frequency convolutional networks for robust speech recognition. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (IEEE, 2015), pp. 317–323

  63. H. Muckenhirn, M.M. Doss, S. Marcell, Towards directly modeling raw speech signal for speaker verification using CNNs, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 4884–4888

  64. R.W. Mudrowsky, R.P. Ramachandran, U. Thayasivam, S.S. Shetty, Robust speaker recognition in the presence of speech coding distortion for remote access applications, in Proceedings of the International Conference on Data Mining (DMIN) (2016), p. 176

  65. C. Murphy, Y. Fu, Xilinx all programmable devices: a superior platform for compute-intensive systems. Xilinx White Paper (2017)

  66. V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML) (2010), pp. 807–814

  67. E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, D. Marr, Accelerating recurrent neural networks in analytics servers: comparison of FPGA, CPU, GPU, and ASIC, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, 2016), pp. 1–4

  68. E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, G. Boudoukh, et al., Can FPGAs beat GPUs in accelerating next-generation deep neural networks? in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2017), pp. 5–14

  69. R. Ondusko, M. Marbach, R.P. Ramachandran, L.M. Head, Blind signal-to-noise ratio estimation of speech based on vector quantizer classifiers and decision level fusion. J. Signal Process. Syst. 89(2), 335–345 (2017)

    Article  Google Scholar 

  70. K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11), 1–4 (2015)

    Google Scholar 

  71. G. Parascandolo, T. Heittola, H. Huttunen, T. Virtanen, Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)

    Article  Google Scholar 

  72. M. Parchami, W.P. Zhu, B. Champagne, E. Plourde, Recent developments in speech enhancement in the short-time Fourier transform domain. IEEE Circuits Syst. Mag. 16(3), 45–77 (2016)

    Article  Google Scholar 

  73. C. Poultney, S. Chopra, Y.L. Cun, Efficient learning of sparse representations with an energy-based model, in Advances in Neural Information Processing Systems (2007), pp. 1137–1144

  74. Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)

    Article  Google Scholar 

  75. V. Ramamoorthy, N.S. Jayant, R.V. Cox, M.M. Sondhi, Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback. IEEE J. Select. Areas Commun. 6(2), 364–382 (1988)

    Article  Google Scholar 

  76. S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: explicit invariance during feature extraction, in Proceedings of the 28th International Conference on International Conference on Machine Learning (Omnipress, 2011), pp. 833–840

  77. F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Article  Google Scholar 

  78. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science (1985)

  79. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, A.C. Berg, Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  80. S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, in Advances in Neural Information Processing Systems (2017), pp. 3856–3866

  81. T.N. Sainath, R.J. Weiss, A. Senior, K.W. Wilson, O. Vinyals, Learning the speech front-end with raw waveform CLDNNs, in Sixteenth Annual Conference of the International Speech Communication Association (2015)

  82. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Article  Google Scholar 

  83. Y. Shen, M. Ferdman, P. Milder, Maximizing CNN accelerator efficiency through resource partitioning, in ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), pp. 535–547

  84. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556

  85. B.Y. Smolenski, R.P. Ramachandran, Usable speech processing: a filterless approach in the presence of interference. IEEE Circuits Syst. Mag. 11(2), 8–22 (2011)

    Article  Google Scholar 

  86. B.V. Srinivasan, Y. Luo, D. Garcia-Romero, D.N. Zotkin, R.A. Duraiswami, symmetric kernel partial least squares framework for speaker recognition. IEEE Trans. Audio Speech Lang. Process. 21(7), 1415–1423 (2013)

    Article  Google Scholar 

  87. K. Sundararajan, D.L. Woodard, Deep learning for biometrics: a survey. ACM Comput. Surv. (CSUR) 51(3), 65 (2018)

    Article  Google Scholar 

  88. V. Sze, Y.H. Chen, T.J. Yang, J.S. Emer, Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)

    Article  Google Scholar 

  89. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 1–9

  90. R. Togneri, D. Pullella, An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst. Mag. 11(2), 23–61 (2011)

    Article  Google Scholar 

  91. G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M.A. Nicolaou, B. Schuller, S. Zafeiriou, Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016) pp. 5200–5204

  92. Z. Tufekci, J.N. Gowdy, S. Gurbuz, E. Patterson, Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition. Speech Commun. 48(10), 1294–1307 (2006)

    Article  Google Scholar 

  93. V. Vanhoucke, A. Senior, M.Z. Mao, Improving the speed of neural networks on CPUs, in Proceedings of Deep Learning and Unsupervised Feature Learning NIPS Workshop (2011)

  94. P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, Extracting and composing robust features with denoising autoencoders, in ACM Proceedings of the 25th International Conference on Machine Learning(2008), pp. 1096–1103

  95. P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)

    MathSciNet  MATH  Google Scholar 

  96. N. Wang, P.C. Ching, N. Zheng, T. Lee, Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans. Audio Speech Lang. Process. 19(1), 196–205 (2011)

    Article  Google Scholar 

  97. Y. Wang, L. Neves, F. Metze, Audio-based multimedia event detection using deep recurrent neural networks, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 2742–2746

  98. F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in International Conference on Latent Variable Analysis and Signal Separation (Springer, Cham, 2015), pp. 91–99

  99. P. Werbos, Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University (1974)

  100. Y. Xu, J. Du, L.R. Dai, C.H. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(1), 7–19 (2015)

    Article  Google Scholar 

  101. N. Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman, M. Sturge-Apple, Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In spoken language technology workshop (slt) (2012), pp. 455–460

  102. R. Zazo Candil, T.N. Sainath, G. Simko, C. Parada, Feature learning with raw-waveform CLDNNs for voice activity detection, in Interspeech (2016), pp. 3668–3672

  103. M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 818–833

  104. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing fpga-based accelerator design for deep convolutional neural networks, in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2015), pp. 161–170

  105. Z. Zhao, H. Liu, T. Fingscheidt, Convolutional neural networks to enhance coded speech (2018). arXiv preprint arXiv:1806.09411

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tokunbo Ogunfunmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ogunfunmi, T., Ramachandran, R.P., Togneri, R. et al. A Primer on Deep Learning Architectures and Applications in Speech Processing. Circuits Syst Signal Process 38, 3406–3432 (2019). https://doi.org/10.1007/s00034-019-01157-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01157-3

Keywords

Navigation