A Primer on Deep Learning Architectures and Applications in Speech Processing

Ogunfunmi, Tokunbo; Ramachandran, Ravi Prakash; Togneri, Roberto; Zhao, Yuanjun; Xia, Xianjun

doi:10.1007/s00034-019-01157-3

A Primer on Deep Learning Architectures and Applications in Speech Processing

Published: 11 June 2019

Volume 38, pages 3406–3432, (2019)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Tokunbo Ogunfunmi¹,
Ravi Prakash Ramachandran²,
Roberto Togneri³,
Yuanjun Zhao³ &
…
Xianjun Xia³

1163 Accesses
16 Citations
Explore all metrics

Abstract

In the recent past years, deep-learning-based machine learning methods have demonstrated remarkable success for a wide range of learning tasks in multiple domains. They are suitable for complex classification and regression problems in applications such as computer vision, speech recognition and other pattern analysis branches. The purpose of this article is to contribute a timely review and introduction of state-of-the-art and popular discriminative DNN, CNN and RNN deep learning techniques, the basic framework and algorithms, hardware implementations, applications in speech, and the overall benefits of deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning and deep learning

Article Open access 08 April 2021

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Article Open access 31 March 2021

Notes

https://www.aldec.com/en/company/blog/167--fpgas-vs-gpus-for-machine-learning-applications-which-one-is-better.

References

O. Abdel-Hamid, A.R. Mohamed, H. Jiang, L. Deng, G. Penn, D. Yu, Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014)
Article Google Scholar
F. Abuzaid., Optimizing cpu performance for convolutional neural networks. Online. Available: http://cs231n.stanford.edu/reports/2015/pdfs/fabuzaid final report.pdf
M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in The 49th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE Press, New Jersey, 2016)
P. Angelov, A. Sperduti, Challenges in deep learning, in Proceedings of ESANN (2016), pp. 489–495
A. Ansari, K. Gunnam, T. Ogunfunmi, An efficient reconfigurable hardware accelerator for convolutional neural networks, in 51st Asilomar Conference on Signals, Systems, and Computers (IEEE, 2017), pp. 1337–1341
A. Ansari, T. Ogunfunmi, An Efficient Network Agnostic Architecture Design and Analysis for Convolutional Neural Networks. submitted to the IEEE JETCAS, Special Issue on Customized sub-systems and circuits for deep learning (2019)
A. Bhandare, M. Bhide, P. Gokhale, R. Chandavarkar, Applications of convolutional neural networks. Int. J. Comput. Sci. Inf. Technol. 7, 2206–2215 (2016)
Google Scholar
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006)
MATH Google Scholar
S. Böck, M. Schedl, Polyphonic piano note transcription with recurrent neural networks, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2012), pp. 121–124
W.M. Campbell, K.T. Assaleh, C.C. Broun, Speaker recognition with polynomial classifiers. IEEE Trans. Speech Audio Process. 10(4), 205–212 (2002)
Article Google Scholar
S. Chakradhar, M. Sankaradas, V. Jakkula, S. Cadambi, A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38(3), 247–257 (2010)
Article Google Scholar
J.H. Chen, A. Gersho, Adaptive postfiltering for quality enhancement of coded speech. IEEE Trans. Speech Audio Process. 3(1), 59–71 (1995)
Article Google Scholar
Y.H. Chen, J. Emer, V. Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput. Archit. News 44(3), 367–379 (2016)
Article Google Scholar
Y.H. Chen, T. Krishna, J.S. Emer, V. Sze, Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 127–138 (2017)
Article Google Scholar
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, E. Shelhamer, cudnn: efficient primitives for deep learning (2014). arXiv preprint arXiv:1410.0759
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1724–1734
J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling (2014). arXiv preprint. arXiv:1412.3555
D.A. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by exponential linear units (ELUS). arXiv preprint arXiv:1511.07289
J. Cong, Z. Fang, M. Lo, H. Wang, J. Xu, S. Zhang, Understanding performance differences of FPGAs and GPUs, in 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). (IEEE, 2018), pp. 93–96
N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
L. Deng, N. Jaitly, Deep discriminative and generative models for speech pattern recognition, in Handbook of Pattern Recognition and Computer Vision, ed. by C.H. Chen (World Scientific, Singapore, 2016), pp. 27–52
Chapter Google Scholar
R. Dey, F.M. Salemt, Gate-variants of Gated Recurrent Unit (GRU) neural networks, in 60th International Midwest Symposium on Circuits and Systems (MWSCAS) (2017), pp. 1597–1600
J.S. Edwards, R.P. Ramachandran, U. Thayasivam, Robust speaker verification with a two classifier format and feature enhancement, in IEEE international symposium on circuits and systems (ISCAS) (2017), pp. 1–4
M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit. 44(3), 572–587 (2011)
Article MATH Google Scholar
D. Erhan, Y. Bengio, A. Courville, P.A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
MathSciNet MATH Google Scholar
A. Fazel, S. Chakrabartty, An overview of statistical pattern recognition techniques for speaker verification. IEEE Circuits Syst. Mag. 11(2), 62–81 (2011)
Article Google Scholar
J. Fowers, G. Brown, P. Cooke, G. Stitt, A performance and energy comparison of FPGAs, GPUs, and multicores for sliding-window applications, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (2012), pp. 47–56
S.W. Fu, Y. Tsao, X. Lu, H. Kawai, Raw waveform-based speech enhancement by fully convolutional networks (2017). arXiv preprint arXiv:1703.02205
S.W. Fu, Y. Tsao, X. Lu, SNR-aware convolutional neural network modeling for speech enhancement, in Interspeech (2016), pp. 3768–3772
F.A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, in 9th International Conference on Artificial Neural Networks (ICANN) (1999), pp. 850–855
P.K. Ghosh, A. Tsiartas, S. Narayanan, Robust voice activity detection using long-term signal variability. IEEE Trans. Audio Speech Lang. Process. 19(3), 600–613 (2011)
Article Google Scholar
X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (2011), pp. 315–323
I. Goodfellow, Y. Bengio, A. Courville, Y. Bengio, Deep Learning (MIT Press, Cambridge, 2016)
MATH Google Scholar
S. Han, B. Dally, Efficient methods and hardware for deep learning. University Lecture (2017)
S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network, in ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2016), pp. 243–254
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778
G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet MATH Google Scholar
G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
C.W. Huang, S. Narayanan, Characterizing types of convolution in deep convolutional recurrent neural networks for robust speech emotion recognition (2017). arXiv preprint arXiv:1706.02901
N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, R. Boyle, et al., In-datacenter performance analysis of a tensor processing unit, in 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), pp. 1–12
P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)
Article Google Scholar
H.K. Kim, R.V. Cox, R.C. Rose, Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 10(8), 591–604 (2002)
Article Google Scholar
D.P. Kingma, M. Welling, Auto-encoding variational bayes (2013). arXiv preprint. arXiv:1312.6114
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
P.W. Koh, P. Liang, Understanding black-box predictions via influence functions, in Proceedings of the 34th International Conference on Machine Learning-Volume 70 (JMLR. org, 2017), pp. 1885–1894
A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems (2012), pp. 1097–1105
H.T. Kung, B. McDanel, S.Q. Zhang, Mapping systolic arrays onto 3D circuit structures: accelerating convolutional neural network inference, in IEEE Workshop on Signal Processing Systems (2018)
G. Lacey, G.W. Taylor, S. Areibi, Deep learning on fpgas: past, present, and future (2016). arXiv preprint arXiv:1602.04283
Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Z. Li, J. Eichel, A. Mishra, A. Achkar, K. Naik, A CPU-based algorithm for traffic optimization based on sparse convolutional neural networks, in Electrical and Computer Engineering (CCECE), 2017 IEEE 30th Canadian Conference IEEE (2017), pp. 1–5
M. Lin, Q. Chen, S. Yan, Network in network (2013). arXiv preprint. arXiv:1312.4400
Z.C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning (2015). arXiv preprint. arXiv:1506.00019
P.C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007)
Book Google Scholar
A. Makhzani, B. Frey, K-sparse autoencoders (2013). arXiv preprint. arXiv:1312.5663
T. May, S. Van De Par, A. Kohlrausch, Noise-robust speaker recognition combining missing data techniques and universal background modeling. IEEE Trans. Audio, SpeechLang. Process. 20(1), 108–121 (2012)
Article Google Scholar
A. McCree, Reducing speech coding distortion for speaker identification, in Ninth International Conference on Spoken Language Processing (2006)
W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
Article MathSciNet MATH Google Scholar
M. McLaren, Y. Lei, N. Scheffer, L. Ferrer, Application of convolutional neural networks to speaker recognition in noisy conditions, in Fifteenth Annual Conference of the International Speech Communication Association (2014)
J. Ming, T.J. Hazen, J.R. Glass, D.A. Reynolds, Robust speaker recognition in noisy conditions. IEEE Trans. Audio Speech Lang. Process. 15(5), 1711–1723 (2007)
Article Google Scholar
V. Mitra, H. Franco, Time-frequency convolutional networks for robust speech recognition. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) (IEEE, 2015), pp. 317–323
H. Muckenhirn, M.M. Doss, S. Marcell, Towards directly modeling raw speech signal for speaker verification using CNNs, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), pp. 4884–4888
R.W. Mudrowsky, R.P. Ramachandran, U. Thayasivam, S.S. Shetty, Robust speaker recognition in the presence of speech coding distortion for remote access applications, in Proceedings of the International Conference on Data Mining (DMIN) (2016), p. 176
C. Murphy, Y. Fu, Xilinx all programmable devices: a superior platform for compute-intensive systems. Xilinx White Paper (2017)
V. Nair, G.E. Hinton, Rectified linear units improve restricted Boltzmann machines, in Proceedings of the 27th International Conference on Machine Learning (ICML) (2010), pp. 807–814
E. Nurvitadhi, J. Sim, D. Sheffield, A. Mishra, S. Krishnan, D. Marr, Accelerating recurrent neural networks in analytics servers: comparison of FPGA, CPU, GPU, and ASIC, in 2016 26th International Conference on Field Programmable Logic and Applications (FPL) (IEEE, 2016), pp. 1–4
E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, G. Boudoukh, et al., Can FPGAs beat GPUs in accelerating next-generation deep neural networks? in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2017), pp. 5–14
R. Ondusko, M. Marbach, R.P. Ramachandran, L.M. Head, Blind signal-to-noise ratio estimation of speech based on vector quantizer classifiers and decision level fusion. J. Signal Process. Syst. 89(2), 335–345 (2017)
Article Google Scholar
K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11), 1–4 (2015)
Google Scholar
G. Parascandolo, T. Heittola, H. Huttunen, T. Virtanen, Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Trans. Audio Speech Lang. Process. 25(6), 1291–1303 (2017)
Article Google Scholar
M. Parchami, W.P. Zhu, B. Champagne, E. Plourde, Recent developments in speech enhancement in the short-time Fourier transform domain. IEEE Circuits Syst. Mag. 16(3), 45–77 (2016)
Article Google Scholar
C. Poultney, S. Chopra, Y.L. Cun, Efficient learning of sparse representations with an energy-based model, in Advances in Neural Information Processing Systems (2007), pp. 1137–1144
Y. Qian, M. Bi, T. Tan, K. Yu, Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2263–2276 (2016)
Article Google Scholar
V. Ramamoorthy, N.S. Jayant, R.V. Cox, M.M. Sondhi, Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback. IEEE J. Select. Areas Commun. 6(2), 364–382 (1988)
Article Google Scholar
S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, Contractive auto-encoders: explicit invariance during feature extraction, in Proceedings of the 28th International Conference on International Conference on Machine Learning (Omnipress, 2011), pp. 833–840
F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)
Article Google Scholar
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science (1985)
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, A.C. Berg, Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
S. Sabour, N. Frosst, G.E. Hinton, Dynamic routing between capsules, in Advances in Neural Information Processing Systems (2017), pp. 3856–3866
T.N. Sainath, R.J. Weiss, A. Senior, K.W. Wilson, O. Vinyals, Learning the speech front-end with raw waveform CLDNNs, in Sixteenth Annual Conference of the International Speech Communication Association (2015)
J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
Y. Shen, M. Ferdman, P. Milder, Maximizing CNN accelerator efficiency through resource partitioning, in ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) (2017), pp. 535–547
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint. arXiv:1409.1556
B.Y. Smolenski, R.P. Ramachandran, Usable speech processing: a filterless approach in the presence of interference. IEEE Circuits Syst. Mag. 11(2), 8–22 (2011)
Article Google Scholar
B.V. Srinivasan, Y. Luo, D. Garcia-Romero, D.N. Zotkin, R.A. Duraiswami, symmetric kernel partial least squares framework for speaker recognition. IEEE Trans. Audio Speech Lang. Process. 21(7), 1415–1423 (2013)
Article Google Scholar
K. Sundararajan, D.L. Woodard, Deep learning for biometrics: a survey. ACM Comput. Surv. (CSUR) 51(3), 65 (2018)
Article Google Scholar
V. Sze, Y.H. Chen, T.J. Yang, J.S. Emer, Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp. 1–9
R. Togneri, D. Pullella, An overview of speaker identification: accuracy and robustness issues. IEEE Circuits Syst. Mag. 11(2), 23–61 (2011)
Article Google Scholar
G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M.A. Nicolaou, B. Schuller, S. Zafeiriou, Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016) pp. 5200–5204
Z. Tufekci, J.N. Gowdy, S. Gurbuz, E. Patterson, Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition. Speech Commun. 48(10), 1294–1307 (2006)
Article Google Scholar
V. Vanhoucke, A. Senior, M.Z. Mao, Improving the speed of neural networks on CPUs, in Proceedings of Deep Learning and Unsupervised Feature Learning NIPS Workshop (2011)
P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, Extracting and composing robust features with denoising autoencoders, in ACM Proceedings of the 25th International Conference on Machine Learning(2008), pp. 1096–1103
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
N. Wang, P.C. Ching, N. Zheng, T. Lee, Robust speaker recognition using denoised vocal source and vocal tract features. IEEE Trans. Audio Speech Lang. Process. 19(1), 196–205 (2011)
Article Google Scholar
Y. Wang, L. Neves, F. Metze, Audio-based multimedia event detection using deep recurrent neural networks, in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 2742–2746
F. Weninger, H. Erdogan, S. Watanabe, E. Vincent, J. Le Roux, J.R. Hershey, B. Schuller, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, in International Conference on Latent Variable Analysis and Signal Separation (Springer, Cham, 2015), pp. 91–99
P. Werbos, Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University (1974)
Y. Xu, J. Du, L.R. Dai, C.H. Lee, A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(1), 7–19 (2015)
Article Google Scholar
N. Yang, R. Muraleedharan, J. Kohl, I. Demirkol, W. Heinzelman, M. Sturge-Apple, Speech-based emotion classification using multiclass SVM with hybrid kernel and thresholding fusion. In spoken language technology workshop (slt) (2012), pp. 455–460
R. Zazo Candil, T.N. Sainath, G. Simko, C. Parada, Feature learning with raw-waveform CLDNNs for voice activity detection, in Interspeech (2016), pp. 3668–3672
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in European Conference on Computer Vision (Springer, Cham, 2014), pp. 818–833
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing fpga-based accelerator design for deep convolutional neural networks, in Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (ACM, 2015), pp. 161–170
Z. Zhao, H. Liu, T. Fingscheidt, Convolutional neural networks to enhance coded speech (2018). arXiv preprint arXiv:1806.09411

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Santa Clara University, Santa Clara, USA
Tokunbo Ogunfunmi
Department of Electrical and Computer Engineering, Rowan University, Glassboro, USA
Ravi Prakash Ramachandran
Department of EEC Engineering, The University of Western Australia, Perth, Australia
Roberto Togneri, Yuanjun Zhao & Xianjun Xia

Authors

Tokunbo Ogunfunmi
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Prakash Ramachandran
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Togneri
View author publications
You can also search for this author in PubMed Google Scholar
Yuanjun Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xianjun Xia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tokunbo Ogunfunmi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ogunfunmi, T., Ramachandran, R.P., Togneri, R. et al. A Primer on Deep Learning Architectures and Applications in Speech Processing. Circuits Syst Signal Process 38, 3406–3432 (2019). https://doi.org/10.1007/s00034-019-01157-3

Download citation

Received: 22 December 2018
Revised: 27 May 2019
Accepted: 27 May 2019
Published: 11 June 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s00034-019-01157-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Primer on Deep Learning Architectures and Applications in Speech Processing

Abstract

Access this article

Similar content being viewed by others

Machine learning and deep learning

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Primer on Deep Learning Architectures and Applications in Speech Processing

Abstract

Access this article

Similar content being viewed by others

Machine learning and deep learning

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation