Skip to main content

Analyzing Multilingual Automatic Speech Recognition Systems Performance

  • Conference paper
  • First Online:
AI-generated Content (AIGC 2023)

Abstract

Understanding spoken language, or transcribing the spoken words into text, was one of the earliest goals of computer language processing and falls under the realm of speech processing. Speech processing in itself predates the computer by many decades. Speech being the most important and most common means of communication for most people, is always in need of necessary technology advances. Therefore, in the recent decades there has been great interest in techniques including automatic speech recognition (ASR), text to speech etc. This research is focused around English and the scope needs to be expanded to other languages as well. In this study we explore several open-source ASR systems that offer multilingual (English and Spanish) models. We discuss various models these ASR systems offer, evaluate their performance. Based on our manual observations and using automatic evaluation metrics (the word error rate) we find that Whisper models perform the best for both English and Spanish. In addition, it supports a multilingual model that has the ability to process audio that consists of words from both English and Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Garza-Ulloa, J.: Introduction to cognitive science, cognitive computing, and human cognitive relation to help in the solution of artificial intelligence biomedical engineering problems. In: Applied Biomedical Engineering Using Artificial Intelligence and Cognitive Models, pp. 39–111 (2022)

    Google Scholar 

  2. Kong, X., Choi, J.Y., Shattuck-Hufnagel, S.: Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5810–5814. IEEE (2017)

    Google Scholar 

  3. Errattahi, R., El Hannani, A., Ouahmane, H.: Automatic speech recognition errors detection and correction: a review. Procedia Comput. Sci. 128, 32–37 (2018)

    Article  Google Scholar 

  4. Juang, B.H., Rabiner, L.R.: Automatic Speech Recognition–a Brief History of the Technology Development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara (2005). https://doi.org/10.1016/B0-08-044854-2/00906-8

  5. Topaz, M., Schaffer, A., Lai, K.H., Korach, Z.T., Einbinder, J., Zhou, L.: Medical malpractice trends: errors in automated speech recognition. J. Med. Syst. 42(8), 153–154 (2018)

    Article  Google Scholar 

  6. Mengesha, Z., Heldreth, C., Lahav, M., Sublewski, J., Tuennerman, E.: “I don’t think these devices are very culturally sensitive.”—impact of automated speech recognition errors on African Americans. Front. Artif. Intell. 4, 169. (2021)

    Google Scholar 

  7. Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., et al.: Racial disparities in automated speech recognition. Natl. Acad. Sci. 117(14), 7684–7689 (2020)

    Article  Google Scholar 

  8. Harwell, D.: “The Accent Gap”. The Washington Post. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/ (2018). Last accessed 14 Aug 2023

  9. Tatman, R.: Gender and dialect bias in YouTube’s automatic captions. In: Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, pp. 53–59 (2017)

    Google Scholar 

  10. Zea, J.A., Aguiar, J.: “Spanish Políglota”: an automatic Speech Recognition system based on HMM. In: 2021 Second International Conference on Information Systems and Software Technologies (ICI2ST), pp. 18–24. IEEE (2021)

    Google Scholar 

  11. Hernández-Mena, C.D., Meza-Ruiz, I.V., Herrera-Camacho, J.A.: Automatic speech recognizers for Mexican Spanish and its open resources. J. Appl. Res. Technol. 15(3), 259–270 (2017)

    Article  Google Scholar 

  12. Vaswani, A., et al.: Attention is all you need. Advances in neural information processing systems (2017)

    Google Scholar 

  13. Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., Sutskever, I.: Robust Speech Recognition via Large-Scale Weak Supervision. ArXiv (2022)

    Google Scholar 

  14. Vosk Documentation. https://alphacephei.com/vosk/. Last accessed 14 Aug 2023

  15. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society (2011)

    Google Scholar 

  16. Lee, A., Kawahara, T.: Recent development of open-source speech recognition engine Julius. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp.131–137. Asia-Pacific Signal and Information Processing Association (2009)

    Google Scholar 

  17. Hannun, A., et al.: Deep Speech: Scaling up end-to-end speech recognition (2014)

    Google Scholar 

  18. DeepSpeech Documentation. https://deepspeech.readthedocs.io. Last accessed 14 Aug 2023

  19. DeepSpeech Python Library. https://pypi.org/project/deepspeech. Last accessed 14 Aug 2023

  20. Maier, V.: Evaluating ril as basis for evaluating automated speech recognition devices and the consequences of using probabilistic string edit distance as input. 3rd year project. Sheffield University (2002)

    Google Scholar 

  21. Szyma´nski, P., et al.: WER we are and WER we think we are. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 3290–3295. Online. Association for Computational Linguistics (2020)

    Google Scholar 

  22. Morris, A.C.: Maier, V., Green, P.: From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition. Interspeech (2004)

    Google Scholar 

  23. TouchMetrics Homepage. https://torchmetrics.readthedocs.io. Last accessed 14 Aug 2023

  24. Morris, A.C.: An information theoretic measure of sequence recognition performance. IDIAP (2003)

    Google Scholar 

  25. Kincaid,J: Challenges in Measuring Automatic Transcription Accuracy. https://medium.com/descript/challenges-in-measuring-automatic-transcription-accuracy-f322bf5994f (2018)

Download references

Acknowledgment

This work was supported by grants from the National Science Foundation (NSF; award# 2131052 and award# 2219587). The opinions and findings expressed in this work do not necessarily reflect the views of the funding institution. Funding agency had no involvement in the conduct of any aspect of the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renu Balyan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Adegbegha, Y.E., Minocha, A., Balyan, R. (2024). Analyzing Multilingual Automatic Speech Recognition Systems Performance. In: Zhao, F., Miao, D. (eds) AI-generated Content. AIGC 2023. Communications in Computer and Information Science, vol 1946. Springer, Singapore. https://doi.org/10.1007/978-981-99-7587-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7587-7_16

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7586-0

  • Online ISBN: 978-981-99-7587-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics