Analyzing Multilingual Automatic Speech Recognition Systems Performance

Adegbegha, Yetunde E.; Minocha, Aarav; Balyan, Renu

doi:10.1007/978-981-99-7587-7_16

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1946))

Included in the following conference series:

International Conference on AI-generated Content

762 Accesses

Abstract

Understanding spoken language, or transcribing the spoken words into text, was one of the earliest goals of computer language processing and falls under the realm of speech processing. Speech processing in itself predates the computer by many decades. Speech being the most important and most common means of communication for most people, is always in need of necessary technology advances. Therefore, in the recent decades there has been great interest in techniques including automatic speech recognition (ASR), text to speech etc. This research is focused around English and the scope needs to be expanded to other languages as well. In this study we explore several open-source ASR systems that offer multilingual (English and Spanish) models. We discuss various models these ASR systems offer, evaluate their performance. Based on our manual observations and using automatic evaluation metrics (the word error rate) we find that Whisper models perform the best for both English and Spanish. In addition, it supports a multilingual model that has the ability to process audio that consists of words from both English and Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Garza-Ulloa, J.: Introduction to cognitive science, cognitive computing, and human cognitive relation to help in the solution of artificial intelligence biomedical engineering problems. In: Applied Biomedical Engineering Using Artificial Intelligence and Cognitive Models, pp. 39–111 (2022)
Google Scholar
Kong, X., Choi, J.Y., Shattuck-Hufnagel, S.: Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5810–5814. IEEE (2017)
Google Scholar
Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Procedia Comput. Sci. 128, 32–37 (2018)
Article Google Scholar
Juang, B.H., Rabiner, L.R.: Automatic Speech Recognition–a Brief History of the Technology Development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara (2005). https://doi.org/10.1016/B0-08-044854-2/00906-8
Topaz, M., Schaffer, A., Lai, K.H., Korach, Z.T., Einbinder, J., Zhou, L.: Medical malpractice trends: errors in automated speech recognition. J. Med. Syst. 42(8), 153–154 (2018)
Article Google Scholar
Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J., Tuennerman, E.: “I don’t think these devices are very culturally sensitive.”—impact of automated speech recognition errors on African Americans. Front. Artif. Intell. 4, 169. (2021)
Google Scholar
Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., et al.: Racial disparities in automated speech recognition. Natl. Acad. Sci. 117(14), 7684–7689 (2020)
Article Google Scholar
Harwell, D.: “The Accent Gap”. The Washington Post. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/ (2018). Last accessed 14 Aug 2023
Tatman, R.: Gender and dialect bias in YouTube’s automatic captions. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pp. 53–59 (2017)
Google Scholar
Zea, J.A., Aguiar, J.: “Spanish Políglota”: an automatic Speech Recognition system based on HMM. In: 2021 Second International Conference on Information Systems and Software Technologies (ICI2ST), pp. 18–24. IEEE (2021)
Google Scholar
Hernández-Mena, C.D., Meza-Ruiz, I.V., Herrera-Camacho, J.A.: Automatic speech recognizers for Mexican Spanish and its open resources. J. Appl. Res. Technol. 15(3), 259–270 (2017)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems (2017)
Google Scholar
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust Speech Recognition via Large-Scale Weak Supervision. ArXiv (2022)
Google Scholar
Vosk Documentation. https://alphacephei.com/vosk/. Last accessed 14 Aug 2023
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society (2011)
Google Scholar
Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp.131–137. Asia-Pacific Signal and Information Processing Association (2009)
Google Scholar
Hannun, A., et al.: Deep Speech: Scaling up end-to-end speech recognition (2014)
Google Scholar
DeepSpeech Documentation. https://deepspeech.readthedocs.io. Last accessed 14 Aug 2023
DeepSpeech Python Library. https://pypi.org/project/deepspeech. Last accessed 14 Aug 2023
Maier, V.: Evaluating ril as basis for evaluating automated speech recognition devices and the consequences of using probabilistic string edit distance as input. 3rd year project. Sheffield University (2002)
Google Scholar
Szyma´nski, P., et al.: WER we are and WER we think we are. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3290–3295. Online. Association for Computational Linguistics (2020)
Google Scholar
Morris, A.C.: Maier, V., Green, P.: From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. Interspeech (2004)
Google Scholar
TouchMetrics Homepage. https://torchmetrics.readthedocs.io. Last accessed 14 Aug 2023
Morris, A.C.: An information theoretic measure of sequence recognition performance. IDIAP (2003)
Google Scholar
Kincaid,J: Challenges in Measuring Automatic Transcription Accuracy. https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f (2018)

Download references

Acknowledgment

This work was supported by grants from the National Science Foundation (NSF; award# 2131052 and award# 2219587). The opinions and findings expressed in this work do not necessarily reflect the views of the funding institution. Funding agency had no involvement in the conduct of any aspect of the research.

Author information

Authors and Affiliations

State University of New York, Old Westbury, NY, 11568, USA
Yetunde E. Adegbegha & Renu Balyan
Great Neck South High School, Great Neck, NY, 11020, USA
Aarav Minocha

Authors

Yetunde E. Adegbegha
View author publications
You can also search for this author in PubMed Google Scholar
Aarav Minocha
View author publications
You can also search for this author in PubMed Google Scholar
Renu Balyan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Renu Balyan .

Editor information

Editors and Affiliations

University of Science and Technology of China, Hefei, China
Feng Zhao
Tongji University, Shanghai, China
Duoqian Miao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adegbegha, Y.E., Minocha, A., Balyan, R. (2024). Analyzing Multilingual Automatic Speech Recognition Systems Performance. In: Zhao, F., Miao, D. (eds) AI-generated Content. AIGC 2023. Communications in Computer and Information Science, vol 1946. Springer, Singapore. https://doi.org/10.1007/978-981-99-7587-7_16

Download citation

DOI: https://doi.org/10.1007/978-981-99-7587-7_16
Published: 02 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7586-0
Online ISBN: 978-981-99-7587-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics