Latest Trends in Deep Learning for Automatic Speech Recognition System

Kaur, Amritpreet; Sachdeva, Rohit; Singh, Amitoj

doi:10.1007/978-3-030-95711-7_6

Amritpreet Kaur⁸,
Rohit Sachdeva⁹ &
Amitoj Singh¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1546))

Included in the following conference series:

International Conference on Artificial Intelligence and Speech Technology

1013 Accesses

Abstract

In the field of Computer Learning and Intelligent Systems research, Deep Learning is one of the latest development projects. It’s also one of the trendiest areas of study right now. Computational vision and pattern recognition have benefited greatly from the dramatic advances made possible by deep learning techniques. New deep learning approaches are already being suggested, offering performance that outperforms current state-of-the-art methods and even surpasses them. There has been much significant advancement in this area in the last few years. Deep learning is developing at an accelerated rate, making it difficult for new investigators to keep pace of its many kinds. We will quickly cover current developments in Deep Learning in the last several years in this article.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020)
Google Scholar
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Google Scholar
Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)
Article Google Scholar
Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011)
Google Scholar
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge”
Google Scholar
Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-3210-1
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
Google Scholar
Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)
Google Scholar
Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:06864 (2018)
Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020)
Google Scholar
Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:03288 (2019)
Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:09452 (2017)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Google Scholar
Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)
Article Google Scholar
Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001)
Google Scholar
Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)
Google Scholar
Balyan, A., Agrawal, S.S., Dev, A.: Automatic phonetic segmentation of Hindi speech using hidden Markov model. AI Soc. 27, 543–549 (2012). https://doi.org/10.1007/s00146-012-0386-2
Article Google Scholar
Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019)
Google Scholar
Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)
Google Scholar
Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015)
Google Scholar
Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017)
Google Scholar
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:02720 (2017)
Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)
Google Scholar
Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017)
Google Scholar
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Article Google Scholar
Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Punjabi University, Patiala, India
Amritpreet Kaur
Department of Computer Science, MM Modi College, Patiala, India
Rohit Sachdeva
Department of Computational Science, MRS PTU, Bathinda, India
Amitoj Singh

Authors

Amritpreet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Sachdeva
View author publications
You can also search for this author in PubMed Google Scholar
Amitoj Singh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indira Gandhi Delhi Technical University for Women, Delhi, India
Amita Dev
Kamrah Institute of Information Technology, Gurgaon, India
S. S. Agrawal
Indira Gandhi Delhi Technical University for Women, Delhi, India
Arun Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaur, A., Sachdeva, R., Singh, A. (2022). Latest Trends in Deep Learning for Automatic Speech Recognition System. In: Dev, A., Agrawal, S.S., Sharma, A. (eds) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol 1546. Springer, Cham. https://doi.org/10.1007/978-3-030-95711-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-95711-7_6
Published: 29 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95710-0
Online ISBN: 978-3-030-95711-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics