Abstract
In the field of Computer Learning and Intelligent Systems research, Deep Learning is one of the latest development projects. It’s also one of the trendiest areas of study right now. Computational vision and pattern recognition have benefited greatly from the dramatic advances made possible by deep learning techniques. New deep learning approaches are already being suggested, offering performance that outperforms current state-of-the-art methods and even surpasses them. There has been much significant advancement in this area in the last few years. Deep learning is developing at an accelerated rate, making it difficult for new investigators to keep pace of its many kinds. We will quickly cover current developments in Deep Learning in the last several years in this article.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bhatt, S., Jain, A., Dev, A.: Acoustic modeling in speech recognition: a systematic review. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 11(4), 397–412 (2020)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Kaur, J., Singh, A., Kadyan, V.: Automatic speech recognition system for tonal languages: state-of-the-art survey. Arch. Comput. Meth. Eng. 28(3), 1039–1068 (2021)
Seide, F., Li, G., Chen, X., Yu, D.:Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 24–29. IEEE (2011)
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Bansal, P., Kant, A., Kumar, S., Sharda, A., Gupta, S.: Improved hybrid moda of HMM/GMM for speech recognition. Inf. Sci. Comput. 2, 69–74 (2008). Supplement to international Journal; “Information Technologies and Knowledge”
Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Springer, Boston (2012). https://doi.org/10.1007/978-1-4615-3210-1
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Deng, L., et al.: Recent advances in deep learning for speech research at Microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
Zhang, Y., Chan, W., Jaitly, N.: Very deep convolutional networks for end-to-end speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4845–4849. IEEE (2017)
Zeghidour, N., Xu, Q., Liptchinsky, V., Usunier, N., Synnaeve, G., Collobert, R.: Fully convolutional speech recognition. arXiv preprint arXiv:06864 (2018)
Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1d time-channel separable convolutions. In: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2020, pp. 6124–6128. IEEE (2020)
Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:03288 (2019)
Pascual, S., Bonafonte, A., Serra, J.: SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:09452 (2017)
Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2794–2802 (2017)
Sun, C., Ma, M., Zhao, Z., Chen, X.: Sparse deep stacking network for fault diagnosis of motor. IEEE Trans. Industr. Inf. 14(7), 3261–3270 (2018)
Medsker, L.R., Jain, L.: Recurrent neural networks. Des. Appl. 5, 64–67 (2001)
Mohamed, A., Hinton, G., Penn, G.: Understanding how deep belief networks perform acoustic modelling. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4273–4276. IEEE (2012)
Balyan, A., Agrawal, S.S., Dev, A.: Automatic phonetic segmentation of Hindi speech using hidden Markov model. AI Soc. 27, 543–549 (2012). https://doi.org/10.1007/s00146-012-0386-2
Manohar, V., Chen, S.-J., Wang, Z., Fujita, Y., Watanabe, S., Khudanpur, S.:Acoustic modeling for overlapping speech recognition: JHU CHiME-5 challenge system. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ICASSP 2019, pp. 6665–6669. IEEE (2019)
Deng, L., Hinton, G., Kingsbury, B.:New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8599–8603. IEEE (2013)
Peddinti, V., Povey, D., Khudanpur, S.:A time delay neural network architecture for efficient modeling of long temporal contexts. In: 16th Annual Conference of the International Speech Communication Association (2015)
Palaz, D., Doss, M.M., Collobert, R.:Convolutional neural networks-based continuous speech recognition using raw speech signal. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4295–4299. IEEE (2015)
Narasimhan, R., Fern, X.Z., Raich, R.:Simultaneous segmentation and classification of bird song using CNN. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 146–150. IEEE (2017)
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.:Towards end-to-end speech recognition with deep convolutional neural networks. arXiv preprint arXiv:02720 (2017)
Liu, P., Zhang, H., Zhang, K., Lin, L., Zuo, W.:Multi-level wavelet-CNN for image restoration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 773–782 (2018)
Hennequin, R., Royo-Letelier, J., Moussallam,M.: Codec independent lossy audio compression detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 726–730. IEEE (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Wang, H., Wang, D.: Towards robust speech super-resolution. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 2058–2066 (2021)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Kaur, A., Sachdeva, R., Singh, A. (2022). Latest Trends in Deep Learning for Automatic Speech Recognition System. In: Dev, A., Agrawal, S.S., Sharma, A. (eds) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol 1546. Springer, Cham. https://doi.org/10.1007/978-3-030-95711-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-95711-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95710-0
Online ISBN: 978-3-030-95711-7
eBook Packages: Computer ScienceComputer Science (R0)