Abstract
Artificial Intelligence revolutionizes the industrial sector to the greater extent towards the era of smart world. Real time automatic speech recognition system is on greater demand for the past few years in most of the embedded devices and smart phone applications. Research on automatic speech recognition is quite challenging due to the complication of environmental noises especially with the non stationary one. Machine learning based robust models are developed widely for speech recognition applications in the past decades. Now the researches mostly focused on deep learning approaches in order to improve the performance and better results. The complexity in designing separate feature extraction steps and classification models in the earlier models are eliminated in the deep learning models. This research article presents the detailed view of various research models developed for the application of automatic speech recognition, its advantages and also the various deep learning frame works for exploring future works.
Similar content being viewed by others
References
Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., & Chen, J. (2016, June). Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning (pp. 173–182).
An, N. N., Thanh, N. Q., & Liu, Y. (2019). Deep CNNs with self-attention for speaker identification. IEEE Access, 7, 85327–85337.
Blunt, P., & Haskins, B. (2019, November). A model for incorporating an automatic speech recognition system in a noisy educational environment. In 2019 International multidisciplinary information technology and engineering conference (IMITEC) (pp. 1–7). IEEE.
Brems, D. J., & Schoeffler, M. S. (1996). U.S. Patent No. 5,566,272. Washington, DC: U.S. Patent and Trademark Office.
Bunrit, S., et al. (2019). Text-independent speaker identification using deep learning model of convolution neural network. International Journal of Machine Learning and Computing, 9, 2.
Deng, L., & Li, X. (2013). Machine learning paradigms for speech recognition: An overview. IEEE Transactions on Audio, Speech and Language Processing, 21(5), 1060–1089.
Graves, A., Mohamed, A.R., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. 2013 IEEE international conference on acoustics, speech and signal processing. IEEE.
Gupta, K., & Gupta, D. (2016, January). An analysis on LPC, RASTA and MFCC techniques in Automatic Speech recognition system. In 2016 6th international conference-cloud system and big data engineering (confluence) (pp. 493–497). IEEE.
Gupta, A., Patel, N., & Khan, S. (2014, November). Automatic speech recognition technique for voice command. In 2014 international conference on science engineering and management research (ICSEMR) (pp. 1–5). IEEE.
Kavitha, S., Veena, S., & Kumaraswamy, R. (2015, December). Development of automatic speech recognition system for voice activated Ground Control system. In 2015 international conference on trends in automation, communications and computing technology (I-TACT-15) (pp. 1–5). IEEE.
Khosravani, A., & Homayounpour, M. M. (2017). A PLDA approach for language and text independent speaker recognition. Computer Speech & Language, 45, 457–474.
Koo, M. W., Choi, J. K., & Kim, Y. M. (2008, February). The development of automatic speech recognition software for portable devices. In First international conference on advances in computer–human interaction (pp. 59–62). IEEE.
Kumar, Y., & Singh, N. (2019, April). A comprehensive view of automatic speech recognition system-A systematic literature review. In 2019 international conference on automation, computational and technology management (ICACTM) (pp. 168–173). IEEE.
Lee, T., Liu, Y., Huang, P. W., Chien, J. T., Lam, W. K., Yeung, Y. T…. Law, S. P. (2016, March). Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 6475–6479). IEEE.
Londhe, N. D., Ahirwal, M. K., & Lodha, P. (2016, April). Machine learning paradigms for speech recognition of an Indian dialect. In 2016 international conference on communication and signal processing (ICCSP) (pp. 0780–0786). IEEE.
Makhmudov, A. Z., & Abdukarimov, S. S. (2016). Speech recognition using deep learning algorithms. Инфopмaтикa: пpoблeмы, мeтoдoлoгия, тexнoлoгии.
Mokgonyane, T. B., Sefara, T. J., Modipa, T. I., Mogale, M. M., Manamela, M. J., & Manamela, P. J. (2019, January). Automatic speaker recognition system based on machine learning algorithms. In 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA) (pp. 141–146). IEEE.
Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: A systematic review. IEEE Access, 7, 19143–19165.
Park, J., Boo, Y., Choi, I., Shin, S., & Sung, W. (2018). Fully neural network based speech recognition on mobile and embedded devices. In Advances in neural information processing systems (pp. 10620–10630).
Pramanik, A., & Raha, R. (2012, October). Automatic speech recognition using correlation analysis. In 2012 World congress on information and communication technologies (pp. 670–674). IEEE.
Richardson, F., Reynolds, D., & Dehak, N. (2015). Deep neural network approaches to speaker and language recognition. IEEE Signal Processing Letters, 22(10), 1671–1675.
Rubi, C. R. (2015). A review: Speech recognition with deep learning methods. International Journal of Computer Science and Mobile Computing, 4(5), 1017–1024.
Sahu, P. K., & Ganesh, D. S. (2015, December). A study on automatic speech recognition toolkits. In 2015 international conference on microwave, optical and communication engineering (ICMOCE) (pp. 365–368). IEEE.
Song, W., & Cai, J. (2015). End-to-end deep neural network for automatic speech recognition. Standford CS224D Reports.
Sztahó, D., Szaszák, G., & Beke, A. (2019). Deep learning methods in speaker recognition: A review. arXiv:1911.06615.
Tirumala, S. S., & Shahamiri, S. R. (2016, November). A review on Deep Learning approaches in Speaker Identification. In Proceedings of the 8th international conference on signal processing systems (pp. 142–147).
Trivedi, A., et al. (2018). Speech to text and text to speech recognition systems-A review. IOSR Journal of Computer Engineering, 20(2), 39.
Valin, J. M. (2018, August). A hybrid DSP/deep learning approach to real-time full-band speech enhancement. In 2018 IEEE 20th international workshop on multimedia signal processing (MMSP) (pp. 1–5). IEEE.
Variani, E., Lei, X., McDermott, E., Moreno, I. L., & Gonzalez-Dominguez, J. (2014). Deep neural networks for small footprint text-dependent speaker verification. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), (pp. 4052–4056). IEEE.
Wu, C. (2018). Structured deep neural networks for speech recognition. PhD diss., University of Cambridge.
Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A. E. D., Jin, W., & Schuller, B. (2018). Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Transactions on Intelligent Systems and Technology (TIST), 9(5), 1–28.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors declared no conflict of Interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Swamidason, I.T.J., Tatiparthi, S., Arul Xavier, V.M. et al. Exploration of diverse intelligent approaches in speech recognition systems. Int J Speech Technol 26, 1–10 (2023). https://doi.org/10.1007/s10772-020-09769-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-020-09769-w